Compare commits

...

91 Commits

Author SHA1 Message Date
Bentlybro
7ea8d2a35e fix(frontend): cover platform linking page 2026-04-30 05:14:26 +01:00
Bentlybro
0644ba59c7 feat(frontend): platform linking page (/link/{token})
Frontend for the platform linking flow introduced by #12615 (backend) and
#12618 (bot). When a user clicks a one-time link from the bot, this page
fetches the link's display info, gates on auth, and confirms the link via
the appropriate server-link or user-link mutation.

Architecture:
- page.tsx — render-only switch over PageStatus
- usePlatformLinkingPage.ts — single hook owning all data fetching, auth
  gating, mutation selection (SERVER vs USER), and status derivation
- helpers.ts — pure utilities (token regex validation, login redirect URL,
  display-name normalization, link-type discrimination)
- components/{LoadingView, NotAuthenticatedView, ReadyView, SuccessView,
  ErrorView} — render-only views, one per PageStatus

Behavior:
- Token format validated up-front via TOKEN_PATTERN; malformed tokens
  short-circuit to ErrorView without hitting the backend.
- Info query gated on Boolean(token) && Boolean(user) so the GET (which
  requires auth) doesn't fire a guaranteed 401 before sign-in.
- Status === 200 narrow on the response (the discriminated union of
  200/401/422 doesn't narrow through okData; explicit literal compare
  gives the right type).
- Mutation picks server-link or user-link based on info.link_type, fires
  via mutate() (not mutateAsync) so React Query owns the failure surface
  via isError instead of an unhandled promise rejection.
- Switch-account flow signs out and redirects back to the link URL so
  the user lands on the linking page as the new identity.

History squashed against current dev (the previous branch state had 82
ahead-commits, most of which were the #12615 backend work that has since
merged independently, plus 109 commits of dev drift that broke CI).
2026-04-27 16:47:23 +01:00
Abhimanyu Yadav
3c08b90500 feat(frontend): preferences v2 page (SECRT-2279) (#12925)
### Why / What / How

**Why** — Settings v2 needs a dedicated Preferences page covering
account info (email, password reset), time zone, and notification
preferences with a single, predictable save flow. The existing legacy
page at `/profile/(user)/settings` mixes concerns and uses `__legacy__`
UI primitives we are migrating away from. SECRT-2279.

**What** — A new preferences page at `/settings/preferences` built from
atomic / molecular design-system components. Three cards (Account, Time
zone, Notifications) share one inline Save / Discard bar at the bottom.
The page replaces the legacy settings page from a UX standpoint while
keeping the same backend mutations.

<img width="1511" height="899" alt="Screenshot 2026-04-27 at 6 43 09 PM"
src="https://github.com/user-attachments/assets/5762fc41-1654-4764-8fbf-d5dd262e031a"
/>

**How**
- **Page composition (`page.tsx`)** uses a single `usePreferencesPage`
hook that owns dirty/saved/form state. Renders `PreferencesHeader`,
`AccountCard`, `TimezoneCard`, `NotificationsCard`, and the inline
`SaveBar` below the cards.
- **Account card** — Email row shows the current address with a compact
pencil button that opens a width-constrained `Dialog` for editing
(Cancel / Update). Password row is a `NextLink` button that routes to
`/reset-password`.
- **Time zone card** — Single row inside a card: label + info-icon
tooltip on the left; small-size `Select` and the GMT offset chip on the
right. The "auto-detect" prompt shows up only when the saved tz differs
from the browser tz, rendered as a pill on the right side of the card.
- **Notifications card** — Tabs for Agents / Marketplace / Credits;
toggling a switch flips a flag in formState and enables Save.
- **Save flow** — `usePreferencesPage` keeps a `savedState` snapshot
(mirrors the old `react-hook-form` `defaultValues` capture-once
semantics) so the dirty check is fully decoupled from any backend GET
refetch. After a successful mutation, `savedState ← formState`, and the
timezone query cache gets an optimistic `setQueryData` write so the
value isn't snapped back by the (cached) GET endpoint.
- **Skeleton** — `PreferencesSkeleton` mirrors the real layout — header,
Account card with the two row shapes, Time zone single row,
Notifications tabs + toggle rows, and the Save / Discard buttons.
- **Sidebar** — Renames the entry "Settings" → "Preferences" with
`SlidersHorizontalIcon`. Profile and Creator Dashboard get clearer
affordances (`UserIcon`, `ChartLineUpIcon`).

### Changes 🏗️

- New page: `src/app/(platform)/settings/preferences/page.tsx` and
`usePreferencesPage.ts`
- New components: `AccountCard`, `TimezoneCard`, `NotificationsCard`,
`PreferencesHeader`, `PreferencesSkeleton`, `SaveBar`
- Helpers: `helpers.ts` (timezones list, GMT-offset formatter, dirty
utilities, notification-group definitions)
- Sidebar: rename "Settings" → "Preferences", swap to
`SlidersHorizontalIcon` + cleaner Profile / Creator Dashboard icons
(`SettingsSidebar/helpers.ts`)
- Tests:
- `preferences/__tests__/main.test.tsx` — page render, edit-email dialog
open/cancel, time zone info trigger, Save/Discard disabled-on-clean,
notification toggle → save submission, discard revert
- Updated `SettingsSidebar.test.tsx` and `SettingsMobileNav.test.tsx`
for the "Preferences" label

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Sidebar shows "Preferences" with the new icon; clicking routes to
`/settings/preferences`
- [ ] Account card: pencil button opens a 420px-wide dialog; Update
button stays disabled until the email differs and is valid; Cancel
closes the dialog
- [ ] Password row: "Reset password" button navigates to
`/reset-password`
- [ ] Time zone: changing the select enables Save; clicking Save
persists the value and the form keeps showing the saved value (does not
snap back), the inline GMT offset chip updates, info tooltip appears on
hover
- [ ] Auto-detect prompt appears only when saved tz ≠ browser tz;
clicking it sets the select to the browser tz
- [ ] Notifications: toggling any switch enables Save; saving flips the
flag in the request body; switching tabs preserves toggles; Discard
reverts unsaved toggles
- [ ] Save / Discard render below the cards on the right and stay
disabled until any field is dirty
  - [ ] Loading state shows the new skeleton that mirrors the layout
- [ ] `pnpm test:unit` passes (covers the page-level integration tests
above)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-04-27 15:13:51 +00:00
Abhimanyu Yadav
599f370206 feat(frontend): add settings v2 profile page (#12924)
### Why / What / How

**Why:** SECRT-2278 — settings v2 needs a Profile page so users can
manage how they appear on the marketplace (display name, handle, bio,
links, avatar) without hopping into the legacy settings UI.

**What:** Adds the `/settings/profile` page end-to-end:
- Form fields for display name, handle, bio, avatar, and up to 5 links —
wired to the `getV2GetUserProfile`, `postV2UpdateUserProfile`, and
`postV2UploadSubmissionMedia` endpoints.
- Bio editor gets a markdown toolbar (bold / italic / strikethrough /
link / bulleted list) with a live preview toggle that renders via
`react-markdown` + `remark-gfm`.
- Save/Discard bar with full validation (handle regex, bio length, dirty
tracking) and toast feedback for success and failure paths.
- Forward refs through the `Input` atom so consumers can target the
underlying `<textarea>` / `<input>` (needed for the toolbar's
selection/cursor manipulation).
- Comprehensive integration tests (Vitest + RTL + MSW) at the page level
plus pure-helper unit tests, in line with the project's
"integration-first" testing strategy. Coverage is reported via
`cobertura` for Codecov.

**How:**
- The toolbar applies markdown syntax by reading `selectionStart` /
`selectionEnd` from a forwarded textarea ref. To avoid the textarea
jumping to the top on click: buttons `preventDefault` on `mousedown` (so
the textarea keeps focus), and the handler captures `scrollTop` before
mutation and restores it (with `focus({ preventScroll: true })`) after
React commits the new value in the next animation frame.
- The preview pane styles markdown elements via Tailwind arbitrary child
selectors (`[&_ul]:list-disc` etc.) instead of pulling in
`@tailwindcss/typography`, since the plugin isn't installed and the
project's `prose` usage was a no-op.
- Profile data hydration tolerates nullish API fields by mapping through
`profileToFormState`, padding `links` to 3 slots so the UI always has
the initial layout.
- Tests use Orval-generated MSW handlers from `store.msw.ts`, mock
`useSupabase` to inject an authenticated user, and assert UI behavior
via Testing Library queries.

### Changes 🏗️

- New: `app/(platform)/settings/profile/__tests__/helpers.test.ts`,
`__tests__/main.test.tsx`
- Updated: `settings/profile/page.tsx`, `useProfilePage.ts`,
`helpers.ts`, plus `ProfileForm`, `ProfileHeader`, `ProfileSkeleton`,
`LinksSection`, `SaveBar`
- Updated: `settings/layout.tsx` (settings v2 chrome adjustments to host
the profile page)
- Atom change: `components/atoms/Input/Input.tsx` now forwards refs
(`HTMLInputElement | HTMLTextAreaElement`) — backward-compatible for
existing consumers

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open `/settings/profile` and confirm the page hydrates the
existing display name, handle, bio, avatar, and links
- [x] Edit each field and verify validation messages (empty name,
invalid handle with spaces, bio over 280 chars)
- [x] Bio markdown toolbar: select text and click Bold / Italic / Strike
— selection wraps, cursor stays in place, textarea does not scroll to
top
- [x] Bio toolbar with no selection: each button inserts the markdown
placeholder template
- [x] Click Bulleted list twice on the same line — line is prefixed with
`- ` only once
- [x] Toggle Preview — bio renders bullets, bold, italic, strikethrough,
links correctly; toolbar buttons dim and become inert
  - [x] Toggle Edit — textarea returns with the same content
- [x] Add link → 4th and 5th slots appear; the 6th attempt is blocked by
the "Limit of 5 reached" button label
  - [x] Remove link 1 — the rest reorder correctly
- [x] Avatar upload — happy path replaces the avatar; failure path
surfaces a destructive toast
- [x] Save with valid data → success toast, query invalidates, save
button disables until next edit
  - [x] Save with a server 422 → destructive toast, no state corruption
  - [x] Discard reverts every field back to the loaded profile
- [x] `pnpm test:unit` passes locally; `coverage/cobertura-coverage.xml`
shows ≥ 80% line coverage for `src/app/(platform)/settings/profile/**`
2026-04-27 10:34:09 +00:00
Otto
8786c00f9c feat(blocks): add Claude Opus 4.7 model support (#12826)
Requested by @Bentlybro

Anthropic released [Claude Opus
4.7](https://www.anthropic.com/news/claude-opus-4-7) today. This PR adds
it to the platform's supported model list.

## Why

Users and developers need access to `claude-opus-4-7` via the platform's
LLM block and API. The model is available on Anthropic's API today.

## What

- Adds `CLAUDE_4_7_OPUS = "claude-opus-4-7"` to the `LlmModel` enum
- Adds corresponding `ModelMetadata` entry: 200k context, 128k output,
price tier 3 ($5/M input, $25/M output — same as Opus 4.6)

## How

Two lines added to `llm.py`, following the exact same pattern as all
other Anthropic model additions. No migrations, no frontend changes
needed — the frontend reads model metadata from the backend's JSON
schema endpoint automatically.

Closes SECRT-2248

---------

Co-authored-by: Bentlybro <Github@bentlybro.com>
2026-04-27 07:00:00 +00:00
SymbolStar
384cbd3ccd fix(frontend): redirect www to non-www with 308 to preserve request method (#9188) (#12920)
## Summary
Fixes #9188 — redirects `www.` to non-www to prevent cookie/auth domain
mismatch.

Uses **308** (Permanent Redirect) instead of 301, which preserves the
HTTP method and request body. This is important because the middleware
matcher runs on `/auth/authorize` and `/auth/integrations/*`, where
OAuth callbacks may use POST.

Split from #12895 per reviewer request.

## Changes
- Added www→non-www redirect in Next.js middleware using
`NextResponse.redirect(url, 308)`

---------

Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-27 06:35:56 +00:00
SymbolStar
8be9cf70af fix(frontend): filter null query params in buildUrlWithQuery (#11237) (#12921)
## Summary
Fixes #11237 — `buildUrlWithQuery` now filters out `null` values in
addition to `undefined`, preventing them from being serialized as
literal `"null"` strings in URL query parameters.

Split from #12895 per reviewer request.

## Changes
- Added `value !== null` check alongside existing `value !== undefined`
in `buildUrlWithQuery`

---------

Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-27 06:31:34 +00:00
Abhimanyu Yadav
a723966e0b feat(platform): settings v2 integrations page + provider description SDK (#12911)
### Why / What / How

**Why:** The settings v2 integrations surface renders from a hardcoded
`MOCK_PROVIDERS` array — no real user credentials, no delete, no way to
connect a new service. Provider display metadata (descriptions +
supported auth types) was scattered across frontend maps with no backend
source of truth, leaving each new provider to be manually registered in
two places.

**What:** Full-featured settings v2 integrations page driven by live
backend data, plus a backend SDK extension so every provider carries a
description **and** declares its supported auth types. Settings UI uses
both to render the connect-a-service dialog: descriptions in the list,
auth types to pick the right tabs in the detail view.

**How:**
- **Credentials list** — single-fetch via `useGetV1ListCredentials`,
grouped client-side by provider, debounced (250 ms) Unicode-normalized
in-memory search (no roundtrip per keystroke), managed/system creds
filtered via the shared `filterSystemCredentials` helper from
`CredentialsInput`. Loading → skeletons that mirror the real accordion
shape, error → `ErrorCard` with retry, empty → `IntegrationsListEmpty`
with custom marquee illustration.
- **Delete flow** — `useDeleteIntegration` returns per-target `succeeded
/ failed / needsConfirmation` so the UI can name failed items and keep
them selected for one-click retry. Single + bulk both gated by
`DeleteConfirmDialog`. Per-row delete button disables + shows a spinner
via `isDeletingId` so double-clicks can't fire two requests. Success
toast names the credential ("Removed GitHub key").
- **Connect-a-service dialog** — backend-driven (`useGetV1ListProviders`
returns `ProviderMetadata[]` with description + supported_auth_types),
Emil-spec animations (150 ms ease-out step swap, 200 ms ease-out height
resize, 180 ms entry fade+slide+blur on tab swap, all respecting
`prefers-reduced-motion`). Detail view picks tab order via deterministic
`TAB_PRIORITY` (oauth → api_key → user_password → host_scoped) and
remembers last-selected tab per provider for the session.
- **OAuth tab** → `openOAuthPopup` + `getV1InitiateOauthFlow` →
`postV1ExchangeOauthCodeForTokens`
- **API key tab** → zod-validated form with
`autoComplete="new-password"` + `spellCheck=false` so browsers don't
autofill the wrong stored key → `postV1CreateCredentials`
- **Provider metadata SDK** — chainable
`ProviderBuilder.with_description(...)` +
`.with_supported_auth_types(...)` (the latter populated automatically by
`with_oauth` / `with_api_key` / `with_managed_api_key` /
`with_user_password`; explicit form reserved for legacy providers whose
auth lives outside the builder chain). `GET /integrations/providers`
upgraded from `List[str]` → `List[ProviderMetadata]` carrying both
fields.
- **Backward compat** — `BackendAPI.listProviders()` maps the new
`ProviderMetadata[]` shape down to `string[]` so the deprecated
`CredentialsProvider` (used by the builder/library credential pickers)
keeps working without ripple changes.
- **Routing** — page lives at `/settings/integrations` directly. No
feature flag gate (settings v2 layout is already on dev).

### Changes 🏗️

**Backend**
- `backend/sdk/builder.py` — `with_description()` +
`with_supported_auth_types()` chain methods; the latter is
auto-populated by every existing auth-method chain call so explicit
declaration is only needed for legacy providers.
- `backend/sdk/provider.py` — `description` + `supported_auth_types`
fields on `Provider`.
- `backend/api/features/integrations/router.py` — `GET /providers` now
returns `List[ProviderMetadata]`; calls `load_all_blocks()` (cached
`@cached(ttl_seconds=3600)`) before reading `AutoRegistry`.
- `backend/api/features/integrations/models.py` — `ProviderMetadata`
with `name + description + supported_auth_types`;
`get_provider_description` + `get_supported_auth_types` helpers reading
from `AutoRegistry`.
- 13 existing `_config.py` files updated with `.with_description(...)`:
agent_mail, airtable, ayrshare, baas, bannerbear, dataforseo, exa,
firecrawl, linear, stagehand, wordpress, wolfram, generic_webhook.
- 20 new `_config.py` files (one per provider block dir): apollo,
compass, discord, elevenlabs, enrichlayer, fal, github, google, hubspot,
jina, mcp, notion, nvidia, replicate, slant3d, smartlead, telegram,
todoist, twitter, zerobounce. Each declares
`with_supported_auth_types(...)` because their auth handlers live in
`backend/integrations/oauth/` (legacy) or are block-level
`CredentialsMetaInput` declarations — outside the builder chain.
- 1 new `backend/blocks/_static_provider_configs.py` registering
description + auth types for ~24 providers that live in shared files
(openai, anthropic, groq, ollama, open_router, v0, aiml_api, llama_api,
reddit, medium, d_id, e2b, http, ideogram, openweathermap, pinecone,
revid, screenshotone, unreal_speech, webshare_proxy, google_maps, mem0,
smtp, database). Comment documents the migration path (each entry
retires when the provider graduates to its own `_config.py`).

**Frontend**
- `src/app/(platform)/settings/integrations/page.tsx` — replaces mock
page; composes header + list + connect dialog.
-
`src/app/(platform)/settings/integrations/components/IntegrationsList/`
— list + skeleton + selection (Record<string, true> instead of Set) +
delete orchestration hook.
-
`src/app/(platform)/settings/integrations/components/ConnectServiceDialog/`
— split per the 200-line house rule into `ConnectServiceDialog`,
`ListView`, `ProviderRow`, `useMeasuredHeight`. DetailView's nested
helpers extracted to siblings: `MethodPanel`, `UnsupportedNotice`,
`ProviderAvatar`. Tabs render in deterministic priority order;
last-selected tab persisted per provider in module-scope.
-
`src/app/(platform)/settings/integrations/components/DeleteConfirmDialog/`
— new confirm dialog gating single + bulk deletes (shows up to 3 names +
remaining count for bulk).
-
`src/app/(platform)/settings/integrations/components/IntegrationsListEmpty/components/IntegrationsMarquee.tsx`
— switched from `next/image unoptimized` to plain `<img loading="lazy"
decoding="async">` for decorative logos (no LCP candidate, avoids Next
Image runtime overhead).
-
`src/app/(platform)/settings/integrations/components/hooks/useDeleteIntegration.ts`
— bulk delete now returns per-target
`succeeded/failed/needsConfirmation`; failed items stay selected for
retry; per-id pending tracking via `isDeletingId`; toast names the
credential.
-
`src/app/(platform)/settings/integrations/components/hooks/useDebouncedValue.ts`
— small reusable debounce hook (250 ms, used by both list + dialog
search).
- `src/app/(platform)/settings/integrations/helpers.ts` —
`formatProviderName` guarded against non-string input; `filterProviders`
now Unicode-normalized (NFKD + strip combining marks) so accented
queries match.
- `src/providers/agent-credentials/helper.ts` — `toDisplayName` same
`typeof string` guard.
- `src/components/contextual/CredentialsInput/helpers.ts` — loosened
`filterSystemCredentials` / `getSystemCredentials` generic constraint to
accept `title?: string | null` so it consumes `CredentialsMetaResponse`
directly.
- `src/lib/autogpt-server-api/client.ts` — `listProviders()` maps the
new shape to `string[]` for backward compat.
- `src/app/api/openapi.json` — regenerated spec includes
`ProviderMetadata` with `supported_auth_types`.

### PR feedback addressed 🛠️

Round of fixes after the first review pass:
- Bulk delete: per-item names in failure toast, failed items kept
selected for retry.
- Confirmation dialog before any delete (single or bulk) —
`DeleteConfirmDialog`.
- Per-row delete button disabled + spinner while pending (no
double-click double-fire).
- Toast names the credential ("Removed GitHub key") instead of generic
copy.
- API key input: `autoComplete="new-password"` + `spellCheck=false`;
title field `autoComplete="off"`.
- Search debounced 250 ms on both list + dialog; Unicode-normalized so
"Açai" matches "acai".
- `toDisplayName` / `formatProviderName` guarded against non-string
input (`provider.split is not a function` was reproducible).
- Skeleton mirrors the real accordion shape — no layout shift on data
load.
- Selection bar sticky position fixed for <375 px (`top-2 sm:top-0`).
- Last-selected auth tab persisted per provider for the session.
- Tabs ordered deterministically (oauth → api_key → user_password →
host_scoped) instead of insertion order.
- `useMemo` removed from `useIntegrationsList` per project rule (no
measured perf need).
- Selection state migrated from `Set<string>` to `Record<string, true>`
(idiomatic React state shape).
- ConnectServiceDialog 288 LoC → ~130 (extracted `ListView`,
`ProviderRow`, `useMeasuredHeight`); DetailView helpers → siblings.
- `next/image unoptimized` → plain `<img>` for decorative logos in
marquee + provider rows + avatar.
- `with_supported_auth_types(...)` pruned in 11 `_config.py` files where
it was redundant with `with_oauth` / `with_api_key` /
`with_managed_api_key` / `with_user_password`. Kept in legacy ones
(github, discord, google, notion, ...) where the docstring says it's
required because auth handlers live outside the builder chain.
- Tab swap + dialog step animations re-tuned vs Emil Kowalski's
animation rules: ease-out default, under 300 ms,
transform+opacity+filter only, blur-bridge to soften swap, willChange to
dodge the 1px shift, reduced-motion fallbacks via `useReducedMotion`.
- Merged latest `dev` (api-keys SECRT-2273 took dev's version, no
api-keys diff in this PR; settings layout took dev's version, no
`SETTINGS_V2` feature flag in this PR; `scroll-area` took dev's
version).

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Navigate to `/settings/integrations` — loading skeletons appear,
then user's real credentials grouped by provider.
- [ ] Type in the search box — filters client-side after 250 ms, no new
network requests (DevTools → Network stays quiet). Try "açai" with
diacritics — matches "acai" providers.
- [ ] Connect a service → dialog loads provider list with backend
descriptions; search matches name, slug, and description.
- [ ] Click a provider → dialog tweens height smoothly (no jump); header
shows provider avatar + name + description; tabs render in oauth →
api_key priority; last-used tab restored on reopen.
- [ ] Open `linear` (oauth + api_key) — switching tabs animates with a
quick fade+slide+blur entry, no flash.
- [ ] OAuth tab → "Continue with X" opens popup, completes consent,
popup closes, dialog closes, new credential appears with success toast.
- [ ] API key tab → paste a key (browser does NOT offer to autofill any
stored password), Save → toast names the credential, dialog closes.
- [ ] Delete (single) via trash icon → confirmation dialog → button
disables with spinner during the request → toast names the credential.
- [ ] Delete (bulk) via selection bar → confirmation lists up to 3 names
→ if some fail, failed ones stay selected for retry; toast lists which
failed.
  - [ ] Double-click a delete button rapidly — only one request fires.
- [ ] Managed credentials (e.g. "Use Credits for AI/ML API") do **not**
appear in the list.
- [ ] Test on a fresh account (no credentials) — `IntegrationsMarquee`
empty state renders.
- [ ] Throttle network to Slow 3G — skeleton (mirroring real shape)
visible, then list slides in.
  - [ ] Block `/api/integrations/credentials` → `ErrorCard` with retry.
- [ ] `curl /api/integrations/providers` returns `[{ name, description,
supported_auth_types }, ...]` with every provider carrying both fields.
- [ ] `prefers-reduced-motion: reduce` set → all motion collapses to
opacity-only fades.
  - [ ] On <375 px viewport — selection bar clears the mobile nav.
- [ ] `pnpm format && pnpm lint && pnpm types && pnpm test:unit` all
pass.

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**) — no flag changes in this PR.
2026-04-27 06:28:31 +00:00
Zamil Majdy
5b1d9763ed fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit (#12918)
## Background

[SECRT-2275](https://linear.app/autogpt/issue/SECRT-2275). User report:
when a copilot ("autopilot") turn is interrupted by a usage-limit,
tool-call-limit, or other run interruption, the user's recent work
disappears. User described it as: "my initial message was lost 3 times
and it disappeared, then when I would say 'continue' it would do a
random old task."

Investigation surfaced two distinct failure modes. This PR addresses
both.

- **Mode 1** — rate-limit (or other pre-stream rejection) at turn start:
the user's text only ever lives in the optimistic `useChat` bubble; the
backend rejects before the message is persisted, so the bubble is a lie
and a refresh / retry would lose the text.
- **Mode 2** — long-running turn interrupted mid-stream: the entire
turn's progress (assistant text, tool calls, reasoning) vanishes on
interruption — what users describe as "the turn is gone."

## Mode 1 — frontend: restore unsent text on 429

Backend can't recover this on its own: `check_rate_limit` raises before
`append_and_save_message`, so by the time the 429 surfaces there is no
DB row to roll forward. See
`autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save).

Frontend fix in
`autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts`:
when `useChat`'s `onError` reports a usage-limit error, we

- drop the optimistic user bubble (DB has no record of it, so leaving it
would be a phantom),
- push `lastSubmittedMsgRef.current` back into the composer via the
existing `setInitialPrompt` slot — the same slot URL pre-fills use, so
`useChatInput`'s `consumeInitialPrompt` effect picks it up
automatically,
- clear `lastSubmittedMsgRef` so the dedup guard doesn't block re-send.

In-memory only; surviving a hard refresh while rate-limited is a
separate follow-up (would need localStorage persistence with TTL).

Test:
`autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts`
— verifies the composer is repopulated and the optimistic bubble is
dropped on a 429.

## Mode 2 — backend: preserve interrupted partial in DB

### Root cause

The SDK retry loop in `stream_chat_completion_sdk` always rolls back
`session.messages` to the pre-attempt watermark on any exception. That
rollback is correct **before a retry** so attempt #2 doesn't duplicate
attempt #1's content. But it runs **before the retry decision is made**,
so when retries are exhausted (or no retry is attempted) the partial
work is discarded too.

Three branches of the retry loop ended in a final-failure state with
side effects worse than just losing the partial:
- `_HandledStreamError` non-transient: rollback then add error marker —
partial gone
- `Exception` with `events_yielded > 0`: rollback then break — **no
error marker added either**, so on refresh the chat looks like nothing
happened even though the user just watched tokens stream live
- `Exception` non-context-non-transient + the while-`else:` exhaustion
path: same, no marker
- Outer except (cancellation, GeneratorExit cleanup): didn't restore
captured partial

### Fix

`autogpt_platform/backend/backend/copilot/sdk/service.py`:

1. **`_InterruptedAttempt` dataclass** — holds the rolled-back `partial:
list[ChatMessage]` + optional `handled_error: _HandledErrorInfo`. Three
methods drive the contract:
- `capture(session, transcript_builder, transcript_snap,
pre_attempt_msg_count)` — slices `session.messages`, restores the
transcript, strips trailing error markers to prevent duplicate markers
after restore.
- `clear()` — drops captured state on a successful retry so outer
cleanup paths don't replay pre-retry content.
- `finalize(session, state, display_msg, retryable=...) ->
list[StreamBaseResponse]` — re-attaches partial, synthesizes
`tool_result` rows for orphan `tool_use` blocks, appends the canonical
error marker, and returns the flushed events so the caller can yield
them to the client (no double-flush).
2. **`_flush_orphan_tool_uses_to_session(session, state) ->
list[StreamBaseResponse]`** — synthesizes `tool_result` rows for any
`tool_use` that never resolved before the error so the next turn's LLM
context stays API-valid (Anthropic rejects orphan tool_use). Uses the
public `adapter.flush_unresolved_tool_calls` and returns the events for
the caller to yield.
3. **`_classify_final_failure(...) -> _FinalFailure | None`** — picks
the display message + stream code + retryable flag for the final-failure
exit. One source of truth for the in-history error marker and the
client-facing `StreamError` SSE yield so they can't drift.
4. **Consolidated post-loop emit**: the former three scattered blocks
(partial restore + redundant re-flush + two separate `yield StreamError`
sites) collapsed to one block driven by `_classify_final_failure` →
`_FinalFailure` → `finalize()` → yield events + single `StreamError`.
5. **Adapter `flush_unresolved_tool_calls`** (renamed from
`_flush_unresolved_tool_calls` to drop the `# noqa: SLF001` suppressors
on cross-module callers).

Each retry-loop rollback site calls `interrupted.capture(...)`; the
success break calls `interrupted.clear()`; the post-loop failure block
calls `interrupted.finalize(...)` exactly once.

The baseline service already preserves partial work via its existing
finally block — no change needed there.

## Tests

Backend (`backend/copilot/sdk/interrupted_partial_test.py`, new, 18
tests):

- `TestInterruptedAttemptCapture` — slice semantics + stale-marker
stripping
- `TestInterruptedAttemptFinalize` — appends partial then marker,
handles empty partial, no-op on `None` session, flushes unresolved tools
between partial and marker, returns flushed events for caller to yield
- `TestFlushOrphanToolUses` — synthesizes `tool_result` rows, returns
events, no-op on None state / no unresolved
- `TestClassifyFinalFailure` — handled_error wins, attempts_exhausted,
transient_exhausted, stream_err fallback, returns None on success path
- `TestRetryRollbackContract` — end-to-end: capture + finalize yields
the exact content the user saw streaming live plus the error marker

1022 total SDK tests pass (baseline + new).

Frontend (`useCopilotStream.test.ts`): 1 new test — `restores the unsent
text and drops the optimistic user bubble on 429 usage-limit`.

## Out of scope

- Frontend rendering tweaks for the interrupted-turn marker (existing
error-marker rendering already works).
- Refresh-survival of the unsent text in Mode 1 (would require
localStorage persistence with TTL) — separate follow-up.
- Hard process-kill / OOM where Python `finally` doesn't run — needs a
different mechanism (pod-level checkpoint sweeper).

## Checklist

- [x] My code follows the style guidelines of this project
(black/isort/ruff via `poetry run format`)
- [x] I have performed a self-review of my own code
- [x] I have added relevant unit tests
- [x] I have run lint and tests locally (1022 SDK tests pass)

## Test plan

- [ ] Verify a long-running turn that hits transient-retry exhaustion
preserves partial assistant text + tool results in chat history after
refresh
- [ ] Verify the next user message after an interrupted turn carries
enough context that the model can continue the prior task instead of
inventing a new one
- [ ] Verify a successful retry (attempt #1 fails, attempt #2 succeeds)
shows ONLY attempt #2's content (no leaked partial from #1)
- [ ] Verify hitting daily usage limit at turn start re-populates the
composer with the unsent text and removes the optimistic user bubble

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-25 14:21:18 +07:00
Zamil Majdy
10ea46663f fix(backend/notifications): atomic upsert + drop eager include (#12919)
## Problem

`create_or_add_to_user_notification_batch` does:
1. `find_unique(..., include={"Notifications": True})` — loads ALL
notifications for the batch (thousands for heavy AGENT_RUN users),
causing Postgres `statement_timeout` in dev.
2. Find-then-update is non-atomic — concurrent invocations either hit
`@@unique([userId, type])` violations or drop notifications.

Real Sentry: `canceling statement due to statement timeout` on this
exact query, traced to `database-manager` pod.

## Fix

Use Prisma `upsert` (atomic) and skip the eager include. Only load
notifications if the caller actually needs them (audited — the sole
caller `NotificationManager._should_batch` ignores the returned DTO and
separately fetches the oldest message via
`get_user_notification_oldest_message_in_batch`).

## Tests

- Single + existing-batch upsert paths
- Concurrent race regression

## Out of scope

Unrelated to PR #12900 (Redis Cluster migration). Separate change.
2026-04-25 13:28:28 +07:00
Zamil Majdy
06188a86a6 refactor(platform/copilot): consolidate 4 model-routing LD flags into 1 JSON flag (#12917)
## What

Replaces 4 string-valued LaunchDarkly flags with a single JSON-valued
flag for copilot model routing:

- ~~`copilot-fast-standard-model`~~
- ~~`copilot-fast-advanced-model`~~
- ~~`copilot-thinking-standard-model`~~
- ~~`copilot-thinking-advanced-model`~~

**New:** `copilot-model-routing` (JSON), keyed `{mode: {tier: model}}`:
```json
{
  "fast":     { "standard": "anthropic/claude-sonnet-4-6", "advanced": "anthropic/claude-opus-4-6" },
  "thinking": { "standard": "moonshotai/kimi-k2.6",         "advanced": "anthropic/claude-opus-4-6" }
}
```

## Why

Same pattern as the sibling consolidation in #12915 (pricing /
cost-limits flags) and the merged #12910 (tier-multipliers):

- One flag per config domain — less LD UI clutter, easier audit trail.
- Atomic updates — rotating fast.standard + thinking.standard is a
single save.
- Fewer LD entities to name, version, target, explain.
- Mirrors the now-uniform copilot-* JSON-flag shape.

## How

- `backend/util/feature_flag.py`: drop the four `COPILOT_*_MODEL` enum
values, add `COPILOT_MODEL_ROUTING`.
- `backend/copilot/model_router.py`: rewrite `resolve_model` to fetch
the JSON flag once per call and walk `payload[mode][tier]`. Missing
mode, missing tier-within-mode, non-string cell value, non-dict payload,
or LD failure all fall back to the corresponding `ChatConfig` default
(same user-visible semantics as before). `_FLAG_BY_CELL` removed
entirely; `_config_default` / `ModelMode` / `ModelTier` unchanged.
- Per-user LD targeting preserved — cohorts can still receive different
routing.
- No caching added (preserves existing uncached behaviour).
- Docstring references in `copilot/config.py` + `copilot/sdk/service.py`
updated to point at the new nested key path; one docstring in
`service_test.py` likewise.

## Operator action required BEFORE merging

This PR removes 4 LD flags and introduces 1 replacement.

1. In LaunchDarkly, create `copilot-model-routing` (type: **JSON**,
server-side only). Default variation = union of the current four string
flags, shaped as:
   ```json
   {
"fast": { "standard": "<current copilot-fast-standard-model>",
"advanced": "<current copilot-fast-advanced-model>" },
"thinking": { "standard": "<current copilot-thinking-standard-model>",
"advanced": "<current copilot-thinking-advanced-model>" }
   }
   ```
Omit any cell that's currently unset (its `ChatConfig` default will be
used).

2. Merge this PR.

3. After deploy + smoke, delete the four legacy flags:
   - `copilot-fast-standard-model`
   - `copilot-fast-advanced-model`
   - `copilot-thinking-standard-model`
   - `copilot-thinking-advanced-model`

## Testing

- `backend/copilot/model_router_test.py` rewritten — 27 tests pass:
  - LD unset / `None` payload → fallback for every cell.
  - Full JSON → each cell maps to its value (parametrized).
- Partial JSON (missing mode, missing tier-within-mode, mode value not a
dict).
  - Non-dict payloads (str / list / int / bool) → fallback + warning.
- Non-string cell values (number, list, bool, dict) → fallback +
'non-string' warning.
- Empty-string cell → fallback + 'empty string' warning (not
'non-string').
  - LD raises → fallback + warning with `exc_info`.
  - `user_id=None` → skip LD entirely.
- Single-LD-call regression guard against re-introducing per-cell flag
fan-out.
- `backend/copilot/sdk/service_test.py`: 61 tests still pass (it mocks
`_resolve_thinking_model_for_user`, so the inner flag change is
transparent).
- `black --check` / `ruff check` / `isort --check` all clean.

## Sibling

- #12915 — same consolidation pattern for stripe-price / cost-limits
flags.

## Checklist

- [x] I have read the project's contributing guide.
- [x] I have clearly described what this PR changes and why.
- [x] My code follows the style guidelines of this project.
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] New and existing unit tests pass locally with my changes (CI will
confirm).
2026-04-25 08:15:36 +07:00
Zamil Majdy
2deac2073e fix(block_cost_config): audit + correct stale LLM/block rates + migrate generic ReplicateModelBlock to COST_USD (#12912)
## Why

PR #12909's pricing refresh was sourced from aggregators (pricepertoken,
blog mirrors) instead of provider pricing pages. Follow-up audit against
**official provider docs** caught **22 stale entries** — 9 LLM token
rates + 12 non-LLM block rates + 1 block that needed a code refactor to
bill dynamically. Also flagged by Sentry: Mistral models were sitting on
the wrong provider's rate table.

Cross-verified JS-rendered pages (docs.x.ai, DeepSeek, Kimi) via
agent-browser.

## Corrections applied

### LLM TOKEN_COST (9 entries)

| Model | Old | New | Reason |
|---|---|---|---|
| `GPT5` | 94/750 | **188/1500** | Was OpenAI Batch API rate; Standard
is $1.25/$10 |
| `DEEPSEEK_CHAT` | 42/63 | **21/42** | Unified to deepseek-v4-flash at
$0.14/$0.28 (Sept 2025) |
| `DEEPSEEK_R1_0528` | 82/329 | **21/42** | Same v4-flash routing |
| `MISTRAL_LARGE_3` | 300/900 | **300/900** (restored after brief 75/225
detour) | Routes via OpenRouter ($2/$6), not Mistral direct |
| `MISTRAL_NEMO` | 3/6 → 23/23 | **5/5** | Routes via OpenRouter
($0.035/$0.035); Mistral-direct $0.15 doesn't apply |
| `KIMI_K2_0905` | 82/330 | **90/375** | Matches K2 family $0.60/$2.50 |
| `KIMI_K2_5` | 90/450 | **66/300** | OpenRouter pass-through $0.44/$2 |
| `KIMI_K2_6` | 143/600 | **112/698** | OpenRouter pass-through
$0.7448/$4.655 |
| `META_LLAMA_4_MAVERICK` | 30/90 | **75/116** | Groq $0.50/$0.77
(deprecated 2026-02-20) |

### Non-LLM BLOCK_COSTS — rate corrections (11 entries)

Under-billing fixes:
- `AIVideoGeneratorBlock` (FAL) SECOND 3 → **15 cr/s**
- `CreateTalkingAvatarVideoBlock` (D-ID) RUN 15 → **100 cr**
- Nano Banana Pro/2 across 3 blocks: RUN 14 → **21 cr**
- `UnrealTextToSpeechBlock` RUN 5 → **COST_USD 150 cr/$** (block now
emits `chars × $0.000016`)

Over-billing fixes:
- `IdeogramModelBlock` default 16 → **12**, V_3 18 → **14**
- `AIImageEditorBlock` FLUX_KONTEXT_MAX 20 → **12**
- `ValidateEmailsBlock` 250 → **150 cr/$**
- `SearchTheWebBlock` 100 → **150 cr/$**
- `GetLinkedinProfilePictureBlock` 3 → **1 cr**

### Non-LLM BLOCK_COSTS — block refactored for dynamic billing (1 entry)

- **`ReplicateModelBlock`** (the generic "run any Replicate model"
wrapper) migrated from flat RUN 10 cr → **COST_USD 150 cr/$**. Block now
uses `client.predictions.async_create + async_wait` instead of
`async_run(wait=False)` so it can read `prediction.metrics.predict_time`
and bill `predict_time × $0.0014/s` (Nvidia L40S mid-tier, where most
popular public models run).

Additionally (addressing CodeRabbit's critical review on this refactor):
`async_wait()` returns normally regardless of terminal status — it
doesn't raise on `failed`/`canceled` like the old `async_run` did. The
block now explicitly checks `prediction.status` after `async_wait()` and
raises `RuntimeError` on `failed` (with `prediction.error` as context)
or `canceled` **before** `merge_stats`, so failed runs are never billed
for partial compute time.

**Why this matters:** flat 10 cr was 10–500× under-billing long
video/LLM runs (users could wire in a $50/hr A100 Llama inference and
pay us $0.10). It was also 20× over-billing trivial SDXL runs. Now
scales with real compute time AND no longer bills failed predictions.

### Documentation-only

- **Grok legacy models** (grok-3, grok-4-0709, grok-4-fast,
grok-code-fast-1): dropped from docs.x.ai's public pricing page but
still callable via the API. Added inline comment noting this; rates kept
at their verified launch pricing.
- **Mistral routing**: added comment explaining why TOKEN_COST for
MISTRAL_* is the OpenRouter safety floor (not Mistral-direct) since
`ModelMetadata.provider = "open_router"` for all Mistral entries.

## How

- For each entry, opened the **official provider pricing page** directly
and computed `our_cr = round(1.5 × provider_usd × 100)`.
- For JS-rendered pages (docs.x.ai, api-docs.deepseek.com), used
agent-browser headless to render + extract rates from the DOM.
- Migrated 2 blocks (`UnrealTextToSpeechBlock`, `ReplicateModelBlock`)
from flat RUN to COST_USD — the Replicate migration touched the block's
SDK interaction.
- Updated 2 FAL-video unit tests that asserted the old `3 cr/s` rate.
- Updated 3 stale test assertions: 2 for Unreal TTS (still on
`characters` cost_type) + 1 for ZeroBounce (old 250 cr).

## Known remaining risk (explicitly out of scope)

- **`ReplicateFluxAdvancedModelBlock`** not migrated — bounded to Flux
models ($0.04–$0.08), flat 10 cr stays within 1.25–2.5× margin. Separate
PR if desired.
- **AgentMail** on free tier (1 RUN). When paid pricing publishes,
revisit.
- **Live Replicate API verification**: mitigated via 9 unit tests
covering the refactored path (`async_create` version-vs-model branching,
metrics-based billing emission, failed/canceled raises,
zero/missing-metrics no-emission, `async_wait` ordering), and SDK
signature confirmed via `inspect.signature` — but no real API call
executed. A smoke test on a cheap model before merge is still
recommended.

## Test plan

- [x] `poetry run pytest backend/data/block_cost_config_test.py
backend/executor/block_usage_cost_test.py
backend/blocks/claude_code_cost_test.py
backend/blocks/cost_leak_fixes_test.py
backend/blocks/block_cost_tracking_test.py
backend/copilot/tools/helpers_test.py
backend/blocks/replicate/replicate_block_cost_test.py -q` — all passing
(80+ tests).
- [x] Sources: openai.com/api/pricing, claude.com/pricing,
api-docs.deepseek.com, mistral.ai/pricing,
platform.kimi.ai/docs/pricing, docs.x.ai, groq.com/pricing,
replicate.com, fal.ai, d-id.com, ideogram.ai, zerobounce.net, jina.ai,
unrealspeech.com, enrichlayer.com.
- [ ] Live Replicate API call to verify `predictions.async_create +
async_wait + metrics.predict_time` path.
2026-04-25 06:10:16 +07:00
Zamil Majdy
24406dfcec refactor(platform): consolidate 6 LD flags into 2 JSON flags (#12915)
## What

Consolidates two groups of LaunchDarkly flags into single JSON-valued
flags, matching the pattern established by `copilot-tier-multipliers`
(merged in #12910):

**Stripe prices** — 4 string flags → 1 JSON flag:
- ~~`stripe-price-id-basic`~~ / ~~`-pro`~~ / ~~`-max`~~ /
~~`-business`~~
- **New:** `copilot-tier-stripe-prices` (JSON)
  ```json
  { "PRO": "price_xxx", "MAX": "price_yyy" }
  ```

**Cost limits** — 2 number flags → 1 JSON flag:
- ~~`copilot-daily-cost-limit-microdollars`~~ /
~~`copilot-weekly-cost-limit-microdollars`~~
- **New:** `copilot-cost-limits` (JSON)
  ```json
  { "daily": 625000, "weekly": 3125000 }
  ```

## Why

- One flag to manage per config domain (LD UI less cluttered, easier
audit trail).
- Atomic updates — e.g., rotating Pro + Max prices happens in a single
save.
- Fewer LD entities to name, version, target, and explain.
- Mirrors the just-merged `copilot-tier-multipliers` shape so the whole
pricing/limits config is uniform.

## How

- `get_subscription_price_id(tier)` now parses
`copilot-tier-stripe-prices` and looks up `tier.value` — returns `None`
when the flag is unset, non-dict, tier key missing, or value isn't a
non-empty string.
- `get_global_rate_limits` uses a new sibling
`_fetch_cost_limits_flag()` helper (60s cache, `cache_none=False`) that
extracts `daily` / `weekly` int keys independently and falls back to the
existing `ChatConfig` defaults when any key is missing / non-int /
negative. A broken `daily` doesn't wipe out `weekly` (or vice versa).
- Tests rewritten to mock the new JSON shapes + cover partial / invalid
/ missing-key fallbacks.

## ⚠️ Operator action required BEFORE merging

This PR **removes 6 LD flags** and introduces 2 replacements. To avoid a
pricing/rate-limit outage, do this in LaunchDarkly first:

1. Create `copilot-tier-stripe-prices` (type: **JSON**). Default
variation = union of the current `stripe-price-id-*` values:
   ```json
{ "PRO": "<current stripe-price-id-pro>", "MAX": "<current
stripe-price-id-max>" }
   ```
   Omit BASIC / BUSINESS if those flags are unset today.

2. Create `copilot-cost-limits` (type: **JSON**). Default variation =
the current two flags' values:
   ```json
{ "daily": <current daily microdollars>, "weekly": <current weekly
microdollars> }
   ```

3. Merge this PR.

4. After deploy + smoke test, delete the six legacy flags:
   - `stripe-price-id-{basic,pro,max,business}`
   - `copilot-daily-cost-limit-microdollars`
   - `copilot-weekly-cost-limit-microdollars`

## Testing

- Backend unit tests: `pytest backend/copilot/rate_limit_test.py
backend/data/credit_subscription_test.py
backend/api/features/subscription_routes_test.py` — rewritten to
exercise the JSON flag shapes + fallback paths; passes locally.
- `black --check` / `ruff check` / `isort --check` — all clean.

## Checklist

- [x] I have read the project's contributing guide.
- [x] I have clearly described what this PR changes and why.
- [x] My code follows the style guidelines of this project.
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] New and existing unit tests pass locally with my changes (CI will
confirm).
2026-04-25 06:08:09 +07:00
Zamil Majdy
000ddb007a dx: use $REPO_ROOT in pr-test skill instead of hardcoded absolute path (#12914)
## Summary
- `.claude/skills/pr-test/SKILL.md` referenced
`/Users/majdyz/Code/AutoGPT/.ign.testing.{lock,log}` in 5 places, which
breaks the skill for anyone else who clones the repo.
- Replaced with `$REPO_ROOT`, which is already defined in Step 0 as `git
-C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'`. That
resolves to the main/primary worktree from any sibling worktree,
preserving the original "always pin the lock to the root checkout so all
siblings see the same file" semantics.
- No behavior change for the existing user; repo becomes portable for
everyone else.

## Test plan
- [x] `grep -n "/Users/majdyz" .claude/skills/pr-test/SKILL.md` returns
only the two intentional mentions in the "never paste absolute paths
into PR comments" warning.
- [x] `$REPO_ROOT` is defined in Step 0 before any Step 3.0 usage.
2026-04-24 23:10:20 +07:00
Zamil Majdy
408b205515 feat(platform): LD-configurable rate-limit multipliers + relative UI display (#12910)
## Summary

- **Backend (`copilot/rate_limit`)** — ``TIER_MULTIPLIERS`` is now
float-typed and resolvable through a new LaunchDarkly flag
``copilot-tier-multipliers``. The integer defaults live on as
``_DEFAULT_TIER_MULTIPLIERS`` and are merged with whatever LD returns
(missing / invalid keys inherit defaults; LD failures fall back to
defaults without raising). ``get_global_rate_limits`` now honours the
flag per-user and casts ``int(base * multiplier)`` so downstream
microdollar math stays integer even when LD hands back a fractional
multiplier (e.g. 8.5×). Cached for 60 s via ``@cached(ttl_seconds=60,
maxsize=8, cache_none=False)`` to match the pattern in
``get_subscription_price_id``.
- **Backend (`api/features/v1`)** — ``SubscriptionStatusResponse`` gains
``tier_multipliers: dict[str, float]``, populated for the same set of
tiers that make it into ``tier_costs`` so hidden tiers never get a
rendered badge.
- **Frontend (`SubscriptionTierSection`)** — drops the hard-coded ``"5x"
/ "20x"`` strings from ``TIERS`` and introduces
``formatRelativeMultiplier(tierKey, tierMultipliers)``: the lowest
*visible* multiplier becomes the baseline (no badge), every other tier
renders ``"N.Nx rate limits"`` relative to it. Fractional LD values like
8.5× round to one decimal.

The admin rate-limit page (``/admin/rate-limits``) keeps the static
``TIER_MULTIPLIERS`` defaults — it's admin-facing, infrequently viewed,
and fine to lag the LD value until next deploy (noted in-code).

Related upstream: this PR stacks logically after #12903 (which added the
``MAX`` tier + LD-configurable prices) but does **not** require it —
each PR can merge in either order. No schema changes, no migration.

## Test plan

- [x] ``poetry run black backend/... --check`` + ``poetry run ruff check
backend/...`` pass
- [x] ``pnpm format`` pass (modified files unchanged)
- [x] New backend tests: ``TestGetTierMultipliers`` (defaults, LD
override, invalid JSON, unknown tier / non-positive values, LD failure)
— **5 / 5 pass**
- [x] New backend test:
``TestGetGlobalRateLimitsWithTiers::test_ld_override_applies_fractional_multiplier``
— **pass**
- [x] ``backend/copilot/rate_limit_test.py`` — non-DB subset **72 / 72
pass**; ``TestGetUserTier`` / ``TestSetUserTier`` require the full
test-server fixture (Redis + Prisma) and are not run in this worktree —
same behaviour on clean ``dev``
- [x] ``backend/api/features/subscription_routes_test.py`` — **40 / 40
pass** (includes new
``test_get_subscription_status_tier_multipliers_ld_override``)
- [x] Frontend vitest targeted suite — **51 / 51 pass**
- ``helpers.test.ts`` — new ``formatRelativeMultiplier`` cases
(lowest-tier null, integer ratio, fractional ratio, hidden-tier null,
fractional LD)
- ``SubscriptionTierSection.test.tsx`` — three new cases for relative
badges, rebasing when the lowest tier is hidden, fractional LD overrides
2026-04-24 22:05:55 +07:00
Zamil Majdy
f8c123a8c3 feat(blocks): dynamic COST_USD billing + close 8 cost-leak surfaces (#12909)
## Why

`ClaudeCodeBlock` was a flat `RUN, 100 cr/run` entry when real cost is
**$0.02–$1.50/run**. Plugging that leak surfaced the question "are other
blocks doing the same?" — an audit found **7 more cost-leak surfaces**.
This PR closes all of them atomically so the cost pipeline is uniform
post-#12894.

## What

### 1. ClaudeCodeBlock → COST_USD 150 cr/$ (the headline)

Claude Code CLI's `--output-format json` already returns
`total_cost_usd` on every call, rolling up Anthropic LLM + internal
tool-call spend. Block now emits it via `merge_stats`:
```python
total_cost_usd = output_data.get("total_cost_usd")
if total_cost_usd is not None:
    self.merge_stats(NodeExecutionStats(
        provider_cost=float(total_cost_usd),
        provider_cost_type="cost_usd",
    ))
```
Registered as `COST_USD, 150 cr/$` — matches the 1.5× margin baked into
every `TOKEN_COST` entry.

### 2. Exa websets — ~40 blocks instrumented

Registered as `COST_USD 100 cr/$` but **never emitted `provider_cost`**
→ ran wallet-free. Added `extract_exa_cost_usd` + `merge_exa_cost`
helpers in `exa/helpers.py` and threaded `merge_exa_cost(self,
response)` through every Exa SDK call across 14 files (59 call sites).
Future-proof: lights up as soon as `exa_py` surfaces `cost_dollars` on
webset response types.

### 3. AIConditionBlock — registered under LLM_COST

Full LLM block with token-count instrumentation already in place, but
**no `BLOCK_COSTS` entry at all** → wallet-free. One-line fix: added to
the LLM_COST group next to AIConversationBlock.

### 4. Pinecone × 3 — added BLOCK_COSTS

- `PineconeInitBlock` + `PineconeQueryBlock`: 1 cr/run RUN (platform
overhead; user pays Pinecone directly).
- `PineconeInsertBlock`: ITEMS scaling with `len(vectors)` emitted via
`merge_stats`.

### 5. Perplexity Sonar (all 3 tiers) → COST_USD 150 cr/$

Block already extracted OpenRouter's `x-total-cost` header into
`execution_stats.provider_cost`; just tagged it `cost_usd` and flipped
the registry. **Deep Research was under-billing up to 30×** ($0.20–$2.00
real vs flat 10 cr).

### 6. CodeGenerationBlock (Codex / GPT-5.1-Codex) → COST_USD 150 cr/$

Block computes USD from `response.usage.input_tokens / output_tokens`
using GPT-5.1-Codex rates ($1.25/M in + $10/M out) and emits `cost_usd`.
Was flat 5 cr for arbitrary-length generations.

### 7. VideoNarrationBlock (ElevenLabs) → COST_USD 150 cr/$

Block computes USD from `len(script) × $0.000167` (Starter tier per-char
price) and emits `cost_usd`. **Was under-billing ~25–30× on long
scripts** (5K-char narration: flat 5 cr vs ~$0.83 real = 125 cr).

### 8. Meeting BaaS FetchMeetingData → COST_USD 150 cr/$

Join block keeps its flat 30 cr commit. FetchMeetingData now extracts
`duration_seconds` from the response metadata, computes USD via
`duration × $0.000192/sec`, and emits `cost_usd`. Long meetings (hours)
no longer fit inside the 30 cr deposit.

## Why 150 cr/$

Matches the **1.5× margin already baked into `TOKEN_COST` for every
direct LLM block**:

| Model | Real | Our rate (per 1M) | Markup |
|---|---|---|---|
| Claude Sonnet 4 | $3/$15 | 450/2250 cr | 1.5× |
| GPT-5 | $2.50/$10 | 375/1500 cr | 1.5× |
| Gemini 2.5 Pro | $1.25/$5 | 187/750 cr | 1.5× |

Applying the same ratio to `total_cost_usd` ≡ `cost_amount=150` (1 cr ≈
$0.01 → 100 cr/$ pass-through × 1.5× = 150).

## Test plan

- [x] **Unit**: new `claude_code_cost_test.py` (9 tests) + existing
`exa/cost_tracking_test.py` (16 tests) + full cost pipeline. **119/119
pass**.
- [x] `poetry run ruff format` + `poetry run ruff check backend/` —
clean.
- [ ] Live E2E: real ClaudeCode / Perplexity Deep Research / Codex run
with balance delta verification (post-merge).

## Follow-ups (not in this PR)

- `exa_py` SDK update to surface `cost_dollars` on Webset response types
(upstream) — unlocks real billing for the 40 webset blocks.
- Replicate suite: migrate per-model RUN entries to COST_USD via
`prediction.metrics["predict_time"] × per-model $/sec`.
2026-04-24 22:05:42 +07:00
Abhimanyu Yadav
34374dfd55 feat(frontend): Settings v2 API keys page (SECRT-2273) (#12907)
### Why / What / How

**Why:** The Settings v2 API keys page was a UI-only stub with 100 mock
rows, a noop "Create Key" button, noop delete buttons, and no
empty/loading states. Users couldn't actually manage their keys from the
new Settings UI. Ships SECRT-2273.

**What:** Replaces the mock with a working page: paginated list
(15/page) with infinite scroll, create flow with one-time plaintext
reveal, single + batch revoke with confirmation dialogs, per-key details
dialog, skeleton loader, animated empty state, toast + mutation-loading
feedback, and responsive header.



https://github.com/user-attachments/assets/bc576de3-0369-4e73-b945-c66c142ebfe5

<img width="397" height="860" alt="Screenshot 2026-04-24 at 11 26 53 AM"
src="https://github.com/user-attachments/assets/ed8681ea-7d16-40cc-96f7-72d798857229"
/>


**How:**
- **Backend** adds a new `GET /api/api-keys/paginated` route returning
`{ items, total_count, page, page_size, has_more }`. The legacy `GET
/api/api-keys` is untouched so the existing profile page keeps working.
The list fn runs `find_many` + `count` in parallel and filters to
`ACTIVE` status by default so revoked keys stay hidden.
- **Frontend** fetches via TanStack Query. Right now the hook consumes
the legacy endpoint with client-side slicing (15/page) so the page works
against staging today; once the paginated route ships we swap to the
generated `useGetV1ListUserApiKeysPaginatedInfinite` hook that's already
in the regenerated client.
- All new UI lives in `src/app/(platform)/settings/api-keys/components/`
— no legacy components reused. Shared primitives (Dialog, Form, Toast,
Skeleton, InfiniteScroll, BaseTooltip) come from the atoms/molecules
design system.
- Empty state uses a vertical marquee of ghost key-cards (framer-motion,
translateY 0→-50% on a duplicated stack, linear easing, symmetric mask
fade). Respects `prefers-reduced-motion`.
- Settings layout ScrollArea switched to `h-full` on mobile and
`md:h-[calc(100vh-60px)]` on desktop to remove a double scrollbar that
appeared when the mobile nav took space above the fixed-height scroll
region.

### Changes 🏗️

**Backend**
- `GET /api/api-keys/paginated` — new route, page + page_size query
params, `ListAPIKeysPaginatedResponse`.
- `list_user_api_keys_paginated` — new data fn, gathers find_many +
count, default ACTIVE-only filter.
- Existing `/api/api-keys` routes untouched.

**Frontend (settings/api-keys)**
- `page.tsx` + `components/APIKeyList/`, `APIKeyRow/`, `APIKeysHeader/`,
`APIKeySelectionBar/` — real-data wiring, drop mock array.
- `components/hooks/` — `useAPIKeysList`, `useCreateAPIKey`,
`useRevokeAPIKey`.
- `components/CreateAPIKeyDialog/` — zod-validated form + success view
with copy.
- `components/DeleteAPIKeyDialog/` — confirm with loading state; single
+ batch.
- `components/APIKeyInfoDialog/` — shows masked key, scopes,
description, created/last_used.
- `components/APIKeyListEmpty/` +
`APIKeyListEmpty/components/APIKeyMarquee.tsx` — animated empty state.
- `components/APIKeyListSkeleton/` — 6-row skeleton.

**Other**
- `settings/layout.tsx` — responsive ScrollArea height (fixes
double-scrollbar on mobile).
- `components/ui/scroll-area.tsx` — optional `showScrollToTop` FAB.
- `__tests__/placeholder-pages.test.tsx` — drop api-keys from
placeholder list.
- `AGENTS.md` — Phosphor `-Icon` suffix convention note.
- `api/openapi.json` — regenerated with new paginated endpoint.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [ ] Page loads → skeleton → list with real keys
- [ ] Empty state renders with the vertical marquee (and stays static
with `prefers-reduced-motion`)
- [ ] Create key dialog: name + description + permissions validates;
success view shows plaintext once + copy works; closing resets state
- [ ] Revoke single key via row trash icon → confirm dialog → toast on
success → row disappears
  - [ ] Batch-revoke via selection bar → confirm dialog → all revoked
- [ ] Info icon next to each key opens the details dialog (scopes,
timestamps, masked key)
- [ ] Infinite scroll loads more rows when scrolling past page 1 (≥16
keys)
- [ ] Mobile (<640px): single scrollbar, Create Key button below title
at size=small
- [ ] Desktop (md+): same layout as before, scroll-to-top FAB appears
after scrolling

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-04-24 14:08:18 +00:00
Abhimanyu Yadav
2cb52e5d19 feat(frontend): add Settings v2 page layout behind SETTINGS_V2 flag (SECRT-2272) (#12885)
### Why / What / How

**Why:** The Settings area is getting a redesign (per Figma
[Settings-Page](https://www.figma.com/design/YGck0Hb0GEgFzwbX47kSNs/Settings-Page?node-id=1-2)).
Ticket SECRT-2272 covers just the shell so content/forms for each
section can land in follow-up PRs without blocking on the nav
restructure. v1 at `/profile/settings` must stay intact for end users
during the rollout.

**What:** Adds a new parallel Settings hub at `/settings` (dedicated
sidebar + 7 placeholder sub-routes) behind a new `SETTINGS_V2`
LaunchDarkly flag. Default `false` so nothing changes for users until
the flag flips. Backend is untouched.


https://github.com/user-attachments/assets/dd680eaf-3d41-4a9a-87f3-d06d536a2503


**How:**
- New `Flag.SETTINGS_V2 = "settings-v2"` added to `use-get-flag.ts` with
`defaultFlags[Flag.SETTINGS_V2] = false`. Gate the whole route group at
`layout.tsx` via existing `FeatureFlagPage` HOC which redirects to
`/profile/settings` when the flag is off.
- `SettingsSidebar` replicates the Figma spec (237px, 7 items at 217×38,
`gap-[7px]`, rounded-[8px], active `bg-[#EFEFF0]` + text `#1F1F20` Geist
Medium, inactive text `#505057` Geist Regular, icon 16px Phosphor
light/regular at `#1F1F20`). Colors + typography use the canonical
tokens exported by Figma (zinc-50 `#F9F9FA`, zinc-200 `#DADADC` for the
right-border, etc.).
- `SettingsNavItem` is extracted as its own component and owns its
per-item entrance variant.
- Per-link loading indicator uses Next.js 15's `useLinkStatus()` hook —
spinner appears on the right of the clicked item and clears
automatically once the target page renders.
- `SettingsMobileNav` (< md breakpoint): sidebar hides; a pill trigger
with the current section's icon + label opens a Radix Popover listing
all 7 sections.
- Entrance animations via framer-motion, tuned to Emil Kowalski's
guidelines — `cubic-bezier(0, 0, 0.2, 1)` ease-out, all durations ≤
280ms, only `transform` and `opacity`, `useReducedMotion` disables
movement but keeps fade. Sidebar items stagger in (40ms offset). Main
content re-animates on every route change via `key={pathname}`.
- All 7 placeholder pages render the section title (Poppins Medium 22/28
via `variant="h4"`, `#1F1F20`) + "Coming soon" copy; they are
intentionally client components to avoid hook-order issues with the
client-side flag gate in the layout.

### Changes 🏗️

- `src/services/feature-flags/use-get-flag.ts`: register
`Flag.SETTINGS_V2` + default `false`
- `src/app/(platform)/settings/layout.tsx`: flag gate + responsive shell
+ route-keyed content animation
- `src/app/(platform)/settings/page.tsx`: client-side redirect to
`/settings/profile`
- `src/app/(platform)/settings/components/SettingsSidebar/`:
  - `SettingsSidebar.tsx` — aside with staggered entrance
- `SettingsNavItem.tsx` — per-item Link + icon + label + loader
(extracted)
- `useSettingsSidebar.ts` — hook mapping nav items with `isActive` from
`usePathname`
- `helpers.ts` — typed nav item config (label / href / Phosphor icon) ×
7
-
`src/app/(platform)/settings/components/SettingsMobileNav/SettingsMobileNav.tsx`:
mobile Popover trigger
- 7 placeholder pages: `profile`, `creator-dashboard`, `billing`,
`integrations`, `preferences`, `api-keys`, `oauth-apps`

**Follow-up PRs will migrate real content into each tab.** LaunchDarkly
flag key `settings-v2` must be created in the LD dashboard before
enabling for users.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `NEXT_PUBLIC_FORCE_FLAG_SETTINGS_V2=true` → `/settings` redirects
to `/settings/profile`, sidebar renders 7 items with "Profile" active
- [x] Click each nav item → URL changes, active item highlights, content
pane re-animates, per-link spinner shows during navigation
- [x] Viewport < 768px → sidebar hides, mobile pill trigger opens
Popover with all 7 items; selecting one navigates and closes
- [x] Without the flag env override, `/settings` redirects to
`/profile/settings` (v1 unchanged)
  - [x] `pnpm types` clean; prettier clean on touched files
- [x] Manual a11y pass with `prefers-reduced-motion` enabled — fade
remains, translations disabled

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
*(no new env vars required; existing `NEXT_PUBLIC_FORCE_FLAG_*` pattern
covers local override)*
- [x] `docker-compose.yml` is updated or already compatible with my
changes *(no docker changes)*
- [x] I have included a list of my configuration changes in the PR
description *(LaunchDarkly dashboard must have `settings-v2` flag
created before enabling; no other config changes)*
2026-04-24 10:39:54 +00:00
Zamil Majdy
ab88d03b13 refactor(backend/integrations): clearer naming + docs for managed-cred sweep (#12908)
## Why

Review comments on #12883 (thanks @Pwuts) surfaced a few spots where the
managed-credential plumbing's names and docstrings didn't match what the
code actually does:

- `_read_or_create_profile_key` suggests "read from any source or create
new", but only migrates the legacy
`managed_credentials.ayrshare_profile_key` side-channel — it doesn't
read an existing managed credential. (That check lives in the outer
`_provision_under_lock`.)
- Docstrings refer to "the startup sweep" in several places — there's no
startup hook; the sweep runs on `/credentials` fetches.
- `is_available` / `auto_provision` relationship wasn't explicit;
readers couldn't tell whether `is_available` was a config check or a
liveness check, or which of the two gates the sweep checks first.

## What

Naming + docstring cleanup. **Zero behavior changes.**

- Rename `_read_or_create_profile_key` →
`_migrate_legacy_or_create_profile_key` with docstring explaining why it
doesn't re-check the managed cred.
- Replace "startup sweep" → "credentials sweep" everywhere.
- `ManagedCredentialProvider` class docstring now names the two gates:
1. `auto_provision` — does this provider participate in the sweep at
all?
  2. `is_available` — are the required env vars / secrets set?
- `is_available` docstring now spells out: what it checks (env vars),
what it does NOT check (upstream health), and that it's only consulted
when `auto_provision=True`.
- `ensure_managed_credentials` docstring defines "credentials sweep",
when it fires, how the per-user in-memory cache works.
- Module-level docstring drops the stale "non-blocking background task"
wording (#12883 made the sweep bounded-await).

## How

4 files, all backend:
- `backend/integrations/managed_credentials.py`
- `backend/integrations/managed_providers/ayrshare.py`
- `backend/integrations/managed_providers/ayrshare_test.py`
- `backend/api/features/integrations/router.py`

Tests: 13/13 Ayrshare tests pass against the rename.

## Checklist

- [x] Follows style guide
- [x] Existing tests still pass (no functional change)
- [x] No new tests needed — pure rename + docstring change
2026-04-24 16:22:09 +07:00
An Vy Le
3aa72b4245 feat(backend/copilot): inline picker-backed inputs via run_block + accept AgentInputBlock subclasses (#12880)
### Why / What / How

**Why:** Resolves #12875. CoPilot's agent-builder was hardcoding Google
Drive file IDs into consuming blocks' `input_default` instead of wiring
an `AgentGoogleDriveFileInputBlock`. A beta user hit this across **13
saved versions** of one agent. Root causes:

1. `validate_io_blocks` only accepted the literal base `AgentInputBlock`
/ `AgentOutputBlock` IDs, so even when CoPilot used a specialized
subclass like `AgentGoogleDriveFileInputBlock` as the only input, the
validator forced it to keep a throwaway base alongside — entrenching the
anti-pattern.
2. Running a Drive consumer directly via CoPilot's `run_block` silently
failed because the auto-credentials flow (picker attaches
`_credentials_id`) existed only in the graph executor, never in
CoPilot's direct-execution path.
3. Drive picker guidance lived in `agent_generation_guide.md` instead of
on the blocks themselves, so it duplicated and drifted from the code.
4. Observed in a live session: when asked to read a private sheet,
CoPilot refused with "share publicly or use the builder" instead of
calling `run_block` and letting the picker render — the prompt rule was
buried and the fallback path (omitted required picker field) returned a
generic schema preview.

**What:** Four coordinated platform + CoPilot improvements. No
block-specific validator rules, no Drive-specific code in UI or prompt.

**How:**

#### 1. `validate_io_blocks` subclass support

Accepts any block with `uiType == "Input"` / `"Output"` (populated from
`Block.block_type` at registration). `AgentGoogleDriveFileInputBlock`,
`AgentDropdownInputBlock`, `AgentTableInputBlock`, etc. stand alone.
Base-ID fallback preserved for call sites that pass a minimal blocks
list.

#### 2. Inline picker via `run_block`

- Extracted `_acquire_auto_credentials` from
`backend/executor/manager.py` into shared
`backend/executor/auto_credentials.py` (exports
`acquire_auto_credentials` + `MissingAutoCredentialsError`).
- Wired it into `backend/copilot/tools/helpers.py::execute_block`. When
`_credentials_id` is present, the block executes with creds injected
(chained flows work). When missing/null, `execute_block` returns the
existing `SetupRequirementsResponse` — frontend's `FormRenderer` renders
the picker inline via the existing
`GoogleDrivePickerField`/`GoogleDrivePickerInput`. On pick, the LLM
re-invokes `run_block` with the populated input — same continuation
pattern as OAuth-missing-credentials. No new response types, no new
continuation tool, no new frontend component.
- `run_block` now short-circuits to `SetupRequirementsResponse` when
missing required fields include a picker-backed field, skipping the
schema-preview round trip the LLM would otherwise take.
- `get_inputs_from_schema` spreads the full property schema (`**schema`)
instead of whitelisting — any `format` / `json_schema_extra` / custom
widget config flows through to the generic custom-field dispatch on the
frontend. Future picker formats (date pickers, file pickers, etc.) work
without backend changes.
- Frontend `SetupRequirementsCard/helpers.ts` uses index-signature
passthrough for arbitrary schema keys — no widget-specific code in that
layer.

#### 3. `validate_only` parameter on `run_block`

`run_block(id, {})` is not always a safe probe — for blocks with zero
required inputs, it executes. New `validate_only: true` parameter
returns `BlockDetailsResponse` (schema + missing-input list) without
executing, rendering picker cards, or charging credits. Same response
shape as the existing schema preview — no new branch, just an extra
condition on the existing one. LLM uses this for pre-flight when it's
unsure whether a block has required inputs.

#### 4. Block-local picker guidance

Agent-generation picker guidance relocated from the guide onto the
blocks themselves — surfaced at `find_block` time, exactly when the LLM
decides to wire a picker-backed consumer:

- `GoogleDriveFileField` (shared factory for every Drive field on
Sheets/Docs/etc.) appends a standard hint to the caller's description
covering: feed from the specialized input block, never hardcode (even
one parsed from a URL), picker is the only credential source.
- `AgentGoogleDriveFileInputBlock`'s block description now covers when
it's required, the `allowed_views` mapping, wiring direction, and a
concrete link-shape example.
- `agent_generation_guide.md` loses the dedicated 71-line Drive section.
The IO-blocks section now tells the LLM specialized subclasses satisfy
the requirement and carry their own usage guidance in block/field
descriptions — read them when `find_block` surfaces a match.
- New "Picker-backed inputs via `run_block`" section in the CoPilot
prompt, written generically (picker fields detected via `format` /
`auto_credentials` schema hints, no provider names hardcoded) — covers:
don't ask the user for URLs/IDs, don't refuse private-resource asks,
chained picker objects pass through as-is.
- Sharpened `MissingAutoCredentialsError` message so when a bare ID
reaches execution, the error explicitly tells the LLM the picker renders
inline (not "ask the user for something").

### Changes 🏗️

- `backend/copilot/tools/agent_generator/validator.py` —
`_collect_io_block_ids` + subclass-aware `validate_io_blocks`.
- `backend/executor/auto_credentials.py` (new) — shared
`acquire_auto_credentials` + `MissingAutoCredentialsError`.
- `backend/executor/manager.py` — imports from the shared module, drops
the local copy.
- `backend/copilot/tools/helpers.py` — `execute_block` calls
`acquire_auto_credentials`, merges kwargs, releases locks in `finally`,
returns `SetupRequirementsResponse` on missing creds.
`get_inputs_from_schema` spreads the full property schema.
- `backend/copilot/tools/run_block.py` — picker-field short-circuit +
`validate_only` parameter.
- `backend/copilot/prompting.py` — "Picker-backed inputs via
`run_block`" + "Pre-flight with `validate_only`" sections.
- `backend/blocks/google/_drive.py` — `GoogleDriveFileField` appends the
agent-builder hint to every Drive consumer's description.
- `backend/blocks/io.py` — `AgentGoogleDriveFileInputBlock` description
expanded.
- `backend/copilot/sdk/agent_generation_guide.md` — Drive section
removed, IO-blocks subclass note expanded.
- `frontend/.../SetupRequirementsCard/helpers.ts` — index-signature
passthrough for arbitrary schema keys; schema fields propagate into the
generated RJSF schema.
- Tests: new `TestExecuteBlockAutoCredentials` (4 cases) +
`validate_only` + picker-short-circuit cases in `run_block_test.py`;
`manager_auto_credentials_test.py` moved to new import path; 6 new
frontend cases in `SetupRequirementsCard/__tests__/helpers.test.ts`
covering schema passthrough.
- Also: one-line hoist of `import secrets` in
`backend/integrations/managed_providers/ayrshare.py` — ruff E402
introduced by #12883 was blocking our lint post-merge.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Backend unit suites: validator_test (48), helpers_test (40),
run_block_test (19), manager_auto_credentials_test (15) — **all green**
- [x] Frontend `SetupRequirementsCard` helpers — **75/75 pass**
(including 6 new passthrough cases)
- [x] `poetry run format` (ruff + isort + black) clean on touched files
(pre-existing pyright errors in unrelated `graphiti_core` /
`StreamEvent` / etc. files not introduced by this PR)
- [x] Live CoPilot chat on dev-builder confirmed the setup card renders
`custom/google_drive_picker_field` for a Drive consumer block called via
`run_block`
- [x] Live agent-generation confirmed CoPilot creates a subclass-only
agent (`AgentGoogleDriveFileInputBlock` → `GoogleSheetsReadBlock` →
`AgentOutputBlock`) with no throwaway base `AgentInputBlock`

#### For configuration changes:
- [x] N/A — no config changes

---------

Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-24 13:05:11 +07:00
Zamil Majdy
cc1f692fec feat(platform): add MAX tier + LD-configurable pricing + hide unconfigured tiers (#12903)
## What

Introduces a new `MAX` tier slot between `PRO` and `BUSINESS`
(self-service $320/mo at 20× capacity), routes every self-service tier's
Stripe price ID through LaunchDarkly, and hides tiers from the UI when
their price isn't configured. `BUSINESS` stays in the enum at 60× as a
reserved/future self-service slot (hidden by default until its LD price
flag is set). ENTERPRISE stays admin-managed.

## Tier shape after this PR

| Enum | UI label | Multiplier | LD price flag | Surfaced in UI by
default |
|---|---|---|---|---|
| `FREE` | Basic | 1× | `stripe-price-id-basic` | no (flag unset) |
| `PRO` | Pro | 5× | `stripe-price-id-pro` | yes (already live) |
| `MAX` **(new)** | Max | 20× | `stripe-price-id-max` | no (flag unset
until $320 price ready) |
| `BUSINESS` | Business | 60× | `stripe-price-id-business` | no
(reserved / future) |
| `ENTERPRISE` | — | 60× | — (admin-managed) | no (Contact-Us only) |

## Prisma

- Added `MAX` between `PRO` and `BUSINESS` in `SubscriptionTier`.
- Migration `add_subscription_tier_max/migration.sql` uses `ALTER TYPE
... ADD VALUE IF NOT EXISTS 'MAX' BEFORE 'BUSINESS'` (transactional
since PG 12). No data migration — no rows currently on BUSINESS via
self-service flows.

## Backend

- `get_subscription_price_id` flag map covers
`FREE`/`PRO`/`MAX`/`BUSINESS`. ENTERPRISE returns `None`.
- `GET /credits/subscription.tier_costs` only includes tiers whose LD
price ID is set. Current tier always present as a safety net.
- `POST /credits/subscription` routes by LD-resolved prices instead of
hard-coding `tier == FREE`:
- Target `FREE` + `stripe-price-id-basic` unset → legacy
cancel-at-period-end (unchanged behaviour).
- Target has LD price → modify in-place when user has an active sub,
else Checkout Session.
- Priced-FREE users with no sub fall through to Checkout (admin-granted
DB-flip shortcut gated on `current_tier != FREE`).
- `sync_subscription_from_stripe` + `get_pending_subscription_change`
cover FREE/PRO/MAX/BUSINESS in the price-to-tier map so every tier's
Stripe webhook reconciles cleanly.
- Pending-tier mapping collapsed into a single membership check.
- `TIER_MULTIPLIERS`: `FREE=1, PRO=5, MAX=20, BUSINESS=60,
ENTERPRISE=60`.

## Frontend

- UI labels: FREE→"Basic", MAX→"Max", BUSINESS→"Business" (PRO
unchanged). `TIER_ORDER` now `[FREE, PRO, MAX, BUSINESS, ENTERPRISE]`.
- `SubscriptionTierSection` filters by `tier_costs` — any tier without a
backend-provided price is hidden (current tier always visible).
- `formatCost` surfaces "Free" only when `FREE` is actually `$0`;
non-zero `stripe-price-id-basic` renders `$X.XX/mo`.
- Admin rate-limit display lists all five tiers with multiplier badges.

## LaunchDarkly flag actions (operator)

- **New:** `stripe-price-id-basic` → FREE tier. Set to `""` or a `$0`
Stripe price.
- **New:** `stripe-price-id-max` → MAX tier. Point at the `$320` Stripe
price when you launch the Max tier.
- **Unchanged:** `stripe-price-id-pro` (PRO), `stripe-price-id-business`
(BUSINESS — leave unset until you're ready for the 60× Business tier).
- Base rate limits stay on `copilot-daily-cost-limit-microdollars` /
`copilot-weekly-cost-limit-microdollars` (Basic's limit; everything else
= × tier multiplier).

## Out of scope

- Subscription-required onboarding screen / middleware gating (separate
PR).
- "Pricing available soon" vs Stripe-failure disambiguation in the UI
(follow-up).

## Testing

- Backend: 213 tests across `subscription_routes_test.py`,
`credit_subscription_test.py`, `rate_limit_test.py`,
`admin/rate_limit_admin_routes_test.py` — all passing.
- Frontend: 91 tests across `credits/` + `admin/rate-limits/` — all
passing.
- Fresh-backend manual E2E on the pre-MAX commit confirmed tier-hiding
works (`tier_costs` returns only the current tier when LD flags are
unset).

## Checklist

- [x] I have read the project's contributing guide.
- [x] I have clearly described what this PR changes and why.
- [x] My code follows the style guidelines of this project.
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] New and existing unit tests pass locally with my changes (CI will
confirm).
2026-04-24 11:11:33 +07:00
Zamil Majdy
be61dc4304 fix(backend): use {schema_prefix} in raw SQL migrations instead of hardcoded 'platform.' (#12905)
### Why / What / How

**Why.** Backend CI was failing at startup with `relation
"platform.AgentNode" does not exist`. Prisma's `migrate deploy` uses the
`schema.prisma` datasource, which doesn't declare a schema, so when
`DATABASE_URL` has no `?schema=platform` query param (as in CI / raw
Supabase), Prisma creates tables in `public` — but the lifespan
migration `backend.data.graph.migrate_llm_models` hardcoded
`platform."AgentNode"` in its raw SQL and crashed the boot.

**What.** Switched `migrate_llm_models` to use the
`execute_raw_with_schema` helper and the `{schema_prefix}` placeholder —
the same pattern already used by the sibling
`fix_llm_provider_credentials` migration in the same file. The helper in
`backend/data/db.py` reads the schema from `DATABASE_URL` at runtime and
substitutes `"platform".` or an empty prefix, so the query works in both
dev (schema=platform) and CI / raw Supabase (public).

**How.**
- Template change: `UPDATE platform."AgentNode"` → `UPDATE
{{schema_prefix}}"AgentNode"` (f-string double-brace escape so
`{schema_prefix}` survives to `.format()` inside
`execute_raw_with_schema`).
- Replace `db.execute_raw(...)` with `execute_raw_with_schema(...)`;
drop the now-unused `prisma as db` import.
- Regression test: mocks `execute_raw_with_schema` and asserts every
emitted query contains `{schema_prefix}` and no longer contains
`platform."AgentNode"`.

### Audit

Audited the other three lifespan migrations in
`backend/api/rest_api.py::lifespan_context`:
- `backend.data.user.migrate_and_encrypt_user_integrations` — uses
Prisma ORM, no raw SQL. OK.
- `backend.data.graph.fix_llm_provider_credentials` — already uses
`query_raw_with_schema` + `{schema_prefix}`. OK.
- `backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs`
— uses Prisma ORM, no raw SQL. OK.

Also grepped the whole backend for `platform."` in Python files —
`migrate_llm_models` was the only offender; the other hits were
unrelated string content (docstrings, error messages, test data).

### Changes

- `autogpt_platform/backend/backend/data/graph.py`: `migrate_llm_models`
now uses `execute_raw_with_schema` with the `{schema_prefix}`
placeholder; unused `prisma as db` import dropped.
- `autogpt_platform/backend/backend/data/graph_test.py`: added
`test_migrate_llm_models_uses_schema_prefix_placeholder` regression
test.

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ran `migrate_llm_models` under mocked `execute_raw_with_schema` —
all 7 emitted UPDATE queries contain `{schema_prefix}` and none hardcode
`platform."AgentNode"`.
- [x] Verified the f-string double-brace escape by evaluating the
template and running `.format(schema_prefix=...)` — substitution is
correct for both `"platform".` and empty-prefix (public-schema) cases.
- [x] `poetry run pyright backend/data/graph.py` clean (pre-existing
pyright error on `backend/api/features/v1.py:834` on `origin/dev` is
unrelated).
- [x] Grepped the whole backend for other hardcoded `platform."..."`
raw-SQL occurrences — none found.

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
(N/A — no config changes)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (N/A — no config changes)
2026-04-24 10:00:22 +07:00
Zamil Majdy
575f75edf4 refactor(platform): migrate Ayrshare to standard managed-credential flow (#12883)
## Why

Beta user report: AutoPilot told them to sign up for Ayrshare themselves
— which AutoGPT actually manages — because AutoPilot inferred the
requirement from the block description string rather than any structured
schema. Root cause: Ayrshare was the only block family whose
"credential" lived in a bespoke
`UserIntegrations.managed_credentials.ayrshare_profile_key` side channel
and whose blocks declared **no** `credentials` field. `find_block` /
`resolve_block_credentials` had nothing to show the LLM, so the LLM
guessed.

(An initial commit added a runtime `gh` CLI bootstrap for a separate "gh
isn't installed in the sandbox" report — that work was empirically
verified unnecessary and reverted; see the commit history for the bench
results.)

## What

**Ayrshare now goes through the standard managed-credential flow:**

- New `AyrshareManagedProvider` alongside the existing
`AgentMailManagedProvider`. Provisions the per-user profile as
`APIKeyCredentials(provider="ayrshare", is_managed=True)` via the shared
`add_managed_credential` path. Reuses any legacy
`managed_credentials.ayrshare_profile_key` value on first provision so
existing users keep their linked social accounts.
- `AyrshareManagedProvider.is_available()` returns `False` so the
`ensure_managed_credentials` startup sweep **never** auto-provisions
Ayrshare (profile quota is a real per-user subscription cost). New
public `ensure_managed_credential(user_id, store, provider)` helper lets
the `/api/integrations/ayrshare/sso_url` route provision on demand,
reusing the same distributed Redis lock + upsert path as AgentMail.
- New `ProviderBuilder.with_managed_api_key()` method registers
`api_key` as a supported auth type without the env-var-backed default
credential that `with_api_key()` creates — so the org-level Ayrshare
admin key cannot leak to blocks as a "profile key".
- `BaseAyrshareInput` gains a shared `credentials` field; all 13 social
blocks inherit it. Each `run()` now takes `credentials:
APIKeyCredentials`; the inline `get_profile_key` guard + "please link a
social account" error is gone. Standard `resolve_block_credentials`
pre-run check owns the "not connected" path, returning a normal
`SetupRequirementsResponse`.
- **Migration-ordering safety:** `post_provision` hook on
`ManagedCredentialProvider` clears the legacy `ayrshare_profile_key`
field **only after** `add_managed_credential` has durably stored the
managed credential. If persistence fails, the legacy key stays intact so
a retry can reuse it — covered by `TestMigrationOrderingSafety`.
- New public `IntegrationCredentialsStore.get_user_integrations()` —
reads no longer have to reach past the `_get_user_integrations` privacy
fence or abuse `edit_user_integrations` as a pseudo-read.
- `/api/integrations/ayrshare/sso_url` collapses from a 60-line
provision-then-sign dance to: pre-flight `settings_available()`,
`ensure_managed_credential`, fetch the credential, sign a JWT.
- `IntegrationCredentialsStore.set_ayrshare_profile_key` removed — the
managed credential is now the only write path.
- Legacy `UserIntegrations.ManagedCredentials.ayrshare_profile_key`
field is retained so the managed provider can migrate existing users on
first provision; removing the field is a follow-up once rollout has
propagated.

## How

After this PR, `find_block` returns Ayrshare blocks with a structured
`credentials_provider: ['ayrshare']`. AutoPilot sees the credential
requirement the same way it sees GitHub's or AgentMail's, calls
`run_block`, and gets a plain `SetupRequirementsResponse` when the
managed credential has not been provisioned yet. No more
description-string speculation; the whole Ayrshare flow is the normal
flow.

The Builder's `AyrshareConnectButton` (`BlockType.AYRSHARE`) still works
— it hits the same endpoint, now a thin wrapper over the managed
provider — so users still get the "Connect Social Accounts" popup for
OAuth'ing individual social networks.

## Test plan

- [x] `poetry run pytest backend/blocks/test/test_block.py -k "ayrshare
or PostTo"` — 26/26 pass.
- [x] `poetry run pytest
backend/integrations/managed_providers/ayrshare_test.py` — 10/10 pass.
- [x] `poetry run pytest
backend/api/features/integrations/router_test.py` — 21/21 pass.
- [x] `poetry run pyright` on all touched backend files — 0 errors.
- [x] Runtime sanity: `find_block` on `PostToXBlock` lists
`credentials_provider: ['ayrshare']` in the JSON schema.
- [ ] Manual QA in preview: connect social account via Builder's
"Connect Social Accounts" button → post to X via CoPilot end-to-end.
- [ ] Verify existing users with
`managed_credentials.ayrshare_profile_key` continue to work without
re-linking.
2026-04-24 09:37:38 +07:00
Zamil Majdy
0f6eea06c4 feat(platform/backend): dynamic BlockCostType (SECOND/ITEMS/COST_USD/TOKENS) + E2B/FAL migration (#12894)
## Why

PR #12893 shipped flat-floor credit charges so no provider sits
wallet-free. This PR is the next step: make dynamic pricing actually
dynamic. Blocks that scale with walltime, item count, provider-reported
USD, or token volume now get billed based on captured execution stats
instead of a fixed floor.

Before this PR `BlockCostType` only had `RUN` / `BYTE` / `SECOND`, and
`SECOND` was dead code — no caller ever passed `run_time > 0`, so every
per-second entry evaluated to 0. This PR wires the stats plumbing
through, adds the cost-type variants that cover the real billing models
our providers charge on, and migrates blocks across the codebase to use
them.

## What

### Machinery

- `BlockCostType` gains `ITEMS`, `COST_USD`, `TOKENS`. `BlockCost` gains
`cost_divisor: int = 1` so SECOND/ITEMS/TOKENS can express "1 credit per
N units" without fractional amounts.
- `block_usage_cost(..., stats: NodeExecutionStats | None = None)` —
pre-flight (no stats) dynamic types return 0 so the balance check isn't
blocked on unknown-future cost; post-flight (stats populated) they
consume captured execution stats.
- `TokenRate` model + `TOKEN_COST` table (~60 models: Claude family,
GPT-5 family, Gemini 2.5, Groq/Llama, Mistral, Cohere, DeepSeek, Grok,
Kimi, Perplexity Sonar). Rates are credits per 1M tokens with input /
output / cache-read / cache-creation split.
- `compute_token_credits(input_data, stats)` — reads
`stats.input_token_count / output_token_count / cache_read_token_count /
cache_creation_token_count`, multiplies by `TOKEN_COST[model]`, ceils to
integer credits. Falls back to flat `MODEL_COST[model]` for unmapped
models (no silent under-billing).
- `billing.charge_reconciled_usage(node_exec, stats)` — runs
post-flight, charges positive delta / refunds negative delta. RUN-only
blocks produce zero delta (no-op). Swallows `InsufficientBalanceError` +
unexpected errors so reconciliation never poisons the success path.
- Pre-flight balance guard — dynamic-cost blocks (0 pre-flight charge)
are blocked when the wallet is non-positive. Closes Sentry `r3132206798`
(HIGH).
- Reconciliation fires `handle_low_balance` on positive delta so users
still get alerted after post-flight reconciliation.

### Block migrations — cost-type changes

| Provider / block family | Old | New | Cost type |
|---|---|---|---|
| All LLM blocks (Anthropic / OpenAI / Groq / Open Router / Llama API /
v0 / AIML, via `LLM_COST` list) | RUN, flat per-model from `MODEL_COST`
| `TOKEN_COST` per-token rate table (input / output / cache-read /
cache-creation) | **TOKENS** |
| Jina `SearchTheWebBlock` | RUN, 1 cr | 100 cr / $ (≈ 1 cr per $0.01
call) | **COST_USD** |
| ZeroBounce `ValidateEmailsBlock` | RUN, 2 cr | 250 cr / $ (≈ 2 cr per
$0.008 validation) | **COST_USD** |
| Apollo `SearchOrganizationsBlock` | RUN, 2 cr flat | 1 cr / 2 orgs
(divisor=2) | **ITEMS** |
| Apollo `SearchPeopleBlock` (no enrich) | RUN, 10 cr flat | 1 cr /
person | **ITEMS** |
| Apollo `SearchPeopleBlock` (enrich_info=true) | RUN, 20 cr flat | 2 cr
/ person | **ITEMS** |
| Firecrawl (all blocks — Crawl, MapWebsite, Search, Extract, Scrape,
via `ProviderBuilder.with_base_cost`) | RUN, 1 cr | 1000 cr / $ (1 cr
per Firecrawl credit ≈ $0.001) | **COST_USD** |
| DataForSEO (KeywordSuggestions, RelatedKeywords, via `with_base_cost`)
| RUN, 1 cr | 1000 cr / $ | **COST_USD** |
| Exa (~45 blocks, via `with_base_cost`) | RUN, 1 cr | 100 cr / $ (Deep
Research $0.20 → 20 cr) | **COST_USD** |
| E2B `ExecuteCodeBlock` / `InstantiateCodeSandboxBlock` /
`ExecuteCodeStepBlock` | RUN, 2 cr flat | 1 cr / 10 s walltime
(divisor=10) | **SECOND** |
| FAL `AIVideoGeneratorBlock` | RUN, 10 cr flat | 3 cr / walltime s |
**SECOND** |

### Cost-leak fixes — interim values (flagged 🔴 CONSERVATIVE INTERIM in
Notion)

Separate from the type migrations above, these 3 providers had real API
costs but were under-billed (or wallet-free):

| Provider / block | Old | New | Cost type | Plan for proper fix |
|---|---|---|---|---|
| Stagehand (`StagehandObserve` / `Act` / `Extract`, via
`with_base_cost`) | RUN, 1 cr | 1 cr / 3 walltime s (divisor=3) |
**SECOND** | Have blocks emit `provider_cost` USD (session_seconds ×
$0.00028 + real LLM USD) → migrate to `COST_USD 100 cr/$`. |
| Meeting BaaS `BaasBotJoinMeetingBlock` (via `@cost` decorator
override) | RUN, 5 cr | RUN, 30 cr | RUN | Surface meeting duration on
`FetchMeetingData` response → migrate Join to `SECOND` or `COST_USD`
post-flight. |
| AgentMail (~37 blocks, via `with_base_cost`) | **0 cr (unbilled)** |
RUN, 1 cr | RUN | Revisit when AgentMail publishes paid-tier pricing
(currently beta). |

### UI

- `NodeCost.tsx` dynamic labels: RUN → `N /run`, SECOND → `~N /sec` (or
`~N / Xs` with divisor), ITEMS → `~N /item` (or `/ X items`), COST_USD →
`~N · by USD`, TOKENS → `~N · by tokens` (tooltip explains cache
discount).
- Floor amounts prefixed with `~` for dynamic types so users see an
estimate, not a hard guarantee.

## How

The resolver split is the key design decision. Instead of charging the
"true" cost entirely post-flight (which would let a user burn credits
they don't have), pre-flight returns a safe estimate:
- RUN: full `cost_amount` (same as before — backwards compatible).
- SECOND/ITEMS/COST_USD: `0` when stats aren't populated yet.
- TOKENS: `MODEL_COST[model]` as a flat floor from the existing rate
table.

Post-flight, the executor calls `charge_reconciled_usage`, which
evaluates the same resolver with stats and charges the positive delta
(or refunds the negative delta). RUN blocks get a 0-delta no-op; dynamic
blocks get their actual charge. Failure modes are bounded: insufficient
balance is logged (not raised; reconciliation must never poison a
success), unexpected errors are swallowed and alerted via Discord.

TOKENS routes through a dedicated `compute_token_credits` helper so the
rate table (`TOKEN_COST`) can grow organically without touching resolver
logic. Models not yet in `TOKEN_COST` fall back to the flat `MODEL_COST`
tier.

Migration for providers with a real USD spend (Exa, Firecrawl,
DataForSEO, Jina Search, ZeroBounce) is a one-line `_config.py` change
via the extended `ProviderBuilder.with_base_cost`. Each block's `run()`
populates `provider_cost` from the response (Exa's `cost_dollars.total`,
Firecrawl's `credits_used`, etc.) via `merge_stats`, and the post-flight
resolver multiplies by `cost_amount` credits/$.

## Test plan

- [x] 92/92 cost-pipeline tests pass — `block_usage_cost_test.py`,
`billing_reconciliation_test.py`, `manager_cost_tracking_test.py`,
`block_cost_config_test.py`.
- [x] Deep E2E against live stack (real DB, `database_manager` RPC): 8/8
scenarios pass — RUN pre-flight, dry-run no-charge, TOKENS refund, ITEMS
scaling, ITEMS zero-items short-circuit, COST_USD exact + ceil
semantics, pre-flight balance guard. Report:
https://github.com/Significant-Gravitas/AutoGPT/pull/12894#issuecomment-4307672357
- [x] `poetry run ruff check` / `ruff format` / `pnpm format` / `pnpm
lint` / `pnpm types` — clean.
- [x] Manual UI: `NodeCost.tsx` renders `~N · by tokens` for
AITextGeneratorBlock, `~N · by USD` for Jina/Exa/Firecrawl.

## Follow-ups (not in this PR)

- Stagehand / Meeting BaaS / Ayrshare: expose provider-side unit cost
(session-seconds, meeting duration, platform analytics credits) to
migrate from interim flat/walltime to fully dynamic `COST_USD`.
- Replicate / Revid: walltime-based billing once response cost is piped
through.
- AgentMail: final rate once paid tier is published.
2026-04-24 08:45:39 +07:00
Zamil Majdy
43b38f6989 fix(backend/copilot): surface non-zero E2B exits as real results, not sandbox errors (#12904)
## Why

`gh auth status` looked flaky in the E2B sandbox. Not actually flaky: it
fails deterministically when the user has not connected GitHub (or the
token is missing/expired), and our wrapper disguises that legitimate
exit-1 as a sandbox infrastructure failure.

Root cause: E2B's `sandbox.commands.run()` raises `CommandExitException`
for **any** non-zero exit. We caught it as a generic `Exception` and
returned an `ErrorResponse` with message:

```
E2B execution failed: Command exited with code 1 and error:
{stderr}
```

When the model runs `gh auth status 2>&1`, stderr is redirected to
stdout — so `exc.stderr` is empty **and** `exc.stdout` (which carries
the real info, e.g. "You are not logged into any GitHub hosts") is
discarded. The model sees a generic infra failure, can't tell it's an
auth-check signal, and prompts the user with broken-looking errors
instead of calling `connect_integration(provider="github")`.

Compare: the local bubblewrap path already handles non-zero exits
correctly by returning a `BashExecResponse` with `exit_code` set. The
E2B path was asymmetric.

## What

- Import `CommandExitException` and catch it explicitly in
`_execute_on_e2b` before the generic handler.
- Return a `BashExecResponse` with the real `exit_code`, `stdout`, and
`stderr` from the exception (scrubbed of injected secret values, same as
the success path).
- Extract shared scrub/build logic into `_build_response` to avoid
duplicating it across the success and exit-exception branches.
- Keep `TimeoutException` and the catch-all `except Exception` for real
infra failures.

## How

Result shape now matches bubblewrap: non-zero exit is a valid result,
not an error. The model sees:

```
message: "Command executed with status code 1"
exit_code: 1
stdout: "You are not logged into any GitHub hosts. ..."
stderr: ""
```

instead of the prior cryptic "E2B execution failed" message.

## Test plan

- [x] New unit test `test_nonzero_exit_returned_as_bash_exec_response`
in `bash_exec_test.py` — mocks `sandbox.commands.run` to raise
`CommandExitException`, asserts `BashExecResponse` with correct
`exit_code`, and verifies secret scrubbing on both `stdout` and
`stderr`.
- [x] `poetry run pytest backend/copilot/tools/bash_exec_test.py` — 5
passed.
- [x] `poetry run pyright` on changed files — 0 errors.
- [x] `poetry run ruff` — clean.
2026-04-24 07:49:57 +07:00
Nicholas Tindle
10e421cd3e fix(platform): resolve autopilot beta blockers (SECRT-2266/2267/2268/2269) (#12874)
### Why / What / How

**Why:** A beta user spent significant time trying to build and run
agents that read Google Sheets. Four separate failures compounded on
their session — all already open in Linear as SECRT-2266 through
SECRT-2269. Three in-flight PRs each addressed a piece but conflicted on
the same files (`backend/data/model.py`, `backend/blocks/_base.py`,
`autogpt_libs/.../types.py`), so landing them individually would have
been churn. One of the four reported issues (the credential-delete
crash) is also the top unresolved Sentry issue `AUTOGPT-SERVER-6HB` with
100+ events going back to 2025-10-20 — it was archived as "ignored" but
is a real regression. Bug #4 required new work; the others we got by
adopting the existing open PRs and addressing a pending review comment.

**What:** This PR consolidates the three in-flight PRs, adds the two
pieces of new work needed to fully close the beta blockers, and
addresses the pending review on one of the three PRs so it doesn't
require a second round.

- **Closes PR #12004** — Google Drive auto-credentials handling (merged
in)
- **Closes PR #12748** — Incremental OAuth for scope upgrades (merged
in)
- **Closes PR #12588** — superseded by the systemic None-guard here (see
"How" below)
- **Adds Bug 2 fix** — Google credential deletion no longer crashes on
`revoke_tokens`
- **Adds Bug 4 validator** — the agent builder can no longer save a
graph with a hardcoded Drive file ID

**How:**

1. **Adopt PR #12004 (Bug 1 — auto-credentials resolution).** Tags
Drive-file fields as `is_auto_credential` on `CredentialsFieldInfo`,
exposes `BlockSchema.get_auto_credentials_fields()` and
`Graph.regular_credentials_inputs` / `auto_credentials_inputs`, extracts
`_acquire_auto_credentials()` in the executor to resolve embedded
`_credentials_id` at run time, clears `_credentials_id` on agent fork so
cloned agents don't inherit the original author's credential, and fixes
the Firefox referrer policy on the Google Drive picker script load.

2. **Adopt PR #12748 (Bug 3 — credential accumulation).** OAuth callback
now merges scopes into an existing credential (explicit via
`credential_id` in OAuth state, or implicit via `provider + username`
match) instead of appending a new row on every reconnect. GitHub's
non-incremental OAuth path requests the union of existing + new scopes
at login so the upgrade path works there too.

3. **Replace PR #12588 with a systemic None-guard (addresses reviewer
feedback).** The original PR added a per-block `credentials:
GoogleCredentials | None = None` + early guard pattern that would need
to be repeated across 50+ blocks with `GoogleDriveFileField`. Per the
reviewer's ask, we moved the guard into `Block._execute()` once: after
the `setdefault` loop, if `kwargs[kwarg_name] is None` we raise
`BlockExecutionError` with a clean user-facing message. The per-block
change in `sheets.py` is dropped so `credentials: GoogleCredentials`
stays non-`Optional`. Dry-run path skips the guard (executor
intentionally runs blocks without resolved creds for schema validation).

4. **Fix Bug 2 — Google revoke_tokens (SECRT-2267,
AUTOGPT-SERVER-6HB).** `revoke_tokens()` was handing our Pydantic
`OAuth2Credentials` into google-auth's `AuthorizedSession`, which calls
`self.credentials.before_request(...)` on the object and crashes with
`AttributeError: 'OAuth2Credentials' object has no attribute
'before_request'`. Google's token revoke endpoint doesn't need any auth
header — just `token=<token>` in the form body per [Google's
docs](https://developers.google.com/identity/protocols/oauth2/web-server#tokenrevoke).
Switched to the platform's async `Requests` helper, matching how
`reddit.py` / `github.py` / `todoist.py` / other providers do
revocation. No google-auth objects involved.

5. **Fix Bug 4 — hardcoded Drive file IDs in agent graphs
(SECRT-2269).** Evidence from the beta user's session: CoPilot's
agent-builder produced 13 saved graph versions in one session where each
one stuffed either a bare string (`"1KAv…"`) or a partial object
(`{"id": "1KAv…"}`) into
`GoogleSheetsReadBlock.constantInput.spreadsheet`, never wiring an
`AgentGoogleDriveFileInputBlock` as the intended input. Bare-string
versions failed pydantic validation with `is not of type 'object'`;
object-with-only-`id` versions would have crashed at run time because
`_acquire_auto_credentials` has no `_credentials_id` to resolve. Added a
validator in `GraphModel._validate_graph_get_errors` that flags any
auto-credentials field whose `input_default.<field>` is a bare string OR
a dict missing `_credentials_id`, when there's no upstream link feeding
the field. Remediation text is format-aware: when
`field_schema["format"] == "google-drive-picker"` it names
`AgentGoogleDriveFileInputBlock` specifically; for any other future
auto-credentials format (OneDrive / Dropbox / etc.) the remediation is
generic, so we don't ship a stale Google-specific hint that doesn't
apply.

A companion handoff for the CoPilot agent-builder team is drafted at
`/tmp/agent-builder-ticket-drive-file-input.md` (to be filed in their
tracker). The validator here is a safety net so reviewers and the LLM
both get a clear error with the correct remediation; the agent-builder
itself still needs to learn the correct pattern so it stops trying to
hardcode Drive files in the first place.

### Changes 🏗️

**Backend**

- `backend/data/model.py` — merged `is_auto_credential` +
`input_field_name` (#12004) with `OAuthState.credential_id` (#12748);
kept HEAD's defensive `set()` copy on `discriminator_values`.
- `backend/blocks/_base.py` — `_execute()` runs the auto-credentials
setdefault loop + raises `BlockExecutionError` when a resolved value is
`None`.
- `backend/blocks/google/sheets_test.py` — 2 new tests (systemic
None-guard behaviour).
- `backend/blocks/google/_drive.py`, `_drive_test.py` — unchanged on
this branch (earlier bare-string validator was reverted after feedback;
see "Out of scope" below).
- `backend/data/graph.py` — auto-credentials anti-pattern validator in
`_validate_graph_get_errors`.
- `backend/data/graph_test.py` — 11 new tests for the validator.
- `backend/integrations/oauth/google.py` — `revoke_tokens` swapped to
`Requests().post`, removed `AuthorizedSession` misuse.
- `backend/integrations/oauth/google_test.py` — 3 new tests covering the
revoke happy path, no-access-token, and non-2xx-response.
- `backend/integrations/credentials_store.py` — from #12748.
- `backend/api/features/integrations/router.py` — incremental-OAuth
callback + scope upgrade helpers (from #12748).
- `backend/api/features/integrations/incremental_oauth_test.py` — 15
tests (from #12748).
- `backend/api/features/chat/tools/utils.py` → renamed to
`backend/copilot/tools/utils.py` during merge; now uses
`regular_credentials_inputs` for missing-creds + matching (from #12004).
- `backend/copilot/tools/utils_test.py` — moved from
`api/features/chat/tools/`, import paths updated.
- `backend/api/features/library/db.py` — library preset guard uses
`regular_credentials_inputs` (from #12004).
- `backend/data/graph.py` — `regular_credentials_inputs` /
`auto_credentials_inputs` properties + `_reassign_ids` clears
`_credentials_id` on fork (from #12004).
- `backend/executor/manager.py` — `_acquire_auto_credentials()`
extracted + validation (from #12004).
- `backend/executor/utils.py`, `utils_test.py`,
`manager_auto_credentials_test.py` — auto-credentials tests (from
#12004).

**Frontend**

- `frontend/src/components/contextual/GoogleDrivePicker/helpers.ts` —
Firefox referrer fix (from #12004).
-
`frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts`,
`src/hooks/useCredentials.ts`, `src/lib/autogpt-server-api/client.ts`,
`src/providers/agent-credentials/credentials-provider.tsx`,
`src/app/api/openapi.json` — incremental-OAuth scope upgrade UI (from
#12748).

**Shared libs**

- `autogpt_libs/supabase_integration_credentials_store/types.py` —
merged additions from both #12004 and #12748.

### Test plan 📋

- [x] `poetry run lint` — clean
- [x] `poetry run pytest backend/data/graph_test.py` — 55 passed
including 11 new validator tests
- [x] `poetry run pytest backend/integrations/oauth/google_test.py` — 3
new tests passing
- [x] `poetry run pytest backend/blocks/google/sheets_test.py` — 2 new
tests passing
- [x] `poetry run pytest backend/blocks/google/
backend/integrations/oauth/ backend/executor/ backend/data/graph_test.py
backend/api/features/integrations/ backend/copilot/tools/utils_test.py`
— 250 passed, 6 pre-existing failures that require the docker stack
(RabbitMQ/Redis/Postgres) and fail identically on `origin/dev`
- [x] `pnpm format` — clean
- [x] `pnpm lint` — 3 pre-existing `<img>` warnings on files I didn't
touch, no errors
- [x] `pnpm types` — pre-existing errors on `AgentActivityDropdown` that
also fail on `origin/dev` (unrelated to this PR; needs a separate fix on
dev)
- [x] Live repro on dev verified Bug 2 fires against current prod code —
two fresh Sentry events in `AUTOGPT-SERVER-6HB` at 2026-04-21T21:35:54Z
on `app:dev-behave:cloud` matching the exact `DELETE
/api/integrations/google/credentials/{cred_id}` path. Airtable OAuth2
delete as a control worked cleanly, confirming Google-specific.
- [x] Live repro on dev verified Bug 4 (CoPilot direct-run variant) —
`{"spreadsheet": {"id": "..."}}` → `Cannot use file 'None' (type: None)`
from `_validate_spreadsheet_file` mimeType check, as expected.

Reviewer post-merge verification:
- [ ] Delete a Google OAuth credential via the Integrations UI —
succeeds cleanly, no Sentry event fires
- [ ] Connect Google twice (same account, same scopes) — credential
count stays at 1 (dedup)
- [ ] Save an agent graph with
`GoogleSheetsReadBlock.constantInput.spreadsheet = "bare-id"` via API —
graph validator rejects with `AgentGoogleDriveFileInputBlock`
remediation
- [ ] Save an agent graph with `GoogleSheetsReadBlock` whose
`spreadsheet` is fed by an upstream
`AgentGoogleDriveFileInputBlock.result` — validator accepts, agent runs

### Out of scope (for follow-ups)

- **Bug 1 — "Failed to retrieve Google OAuth credentials"** in
`frontend/src/components/contextual/GoogleDrivePicker/useGoogleDrivePicker.ts:163`.
Zero hits for this string in the beta user's Langfuse traces and we
weren't able to reproduce it from a clean flow. Most likely a
stale-credential race condition (delete in another tab, picker queries a
stale React-Query cache). Tracked as a separate task; not blocking.
- **CoPilot first-attempt mimeType retry loop.** Observed on dev:
CoPilot's first call to `GoogleSheetsReadBlock` sends `{"spreadsheet":
{"id": "..."}}` without `mimeType`, hits `_validate_spreadsheet_file`,
retries with mimeType. Costs a round-trip. Two possible fixes (relax
`_validate_spreadsheet_file` to skip when mimeType is `None` and let
Google's API surface the real error; OR extend
`get_auto_credentials_fields` metadata so CoPilot's tool description
prompts it to always include mimeType). Deliberately deferred — fixing
only one of "API caller sends a bare string" or "CoPilot sends an
incomplete object" risked the same auth-ambiguity the bare-string commit
in this branch history hit.
- **CoPilot agent-builder prompt/guide update.** The validator here
produces the correct error message, but the agent-builder model still
needs to learn to use `AgentGoogleDriveFileInputBlock` upfront rather
than discover it through validator retries. Separate handoff ticket
filed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches OAuth credential issuance/upgrade paths and introduces a new
endpoint that returns raw access tokens (scope-gated), plus broad
changes to execution-time credential resolution/validation; mistakes
could impact auth/security or break integrations.
> 
> **Overview**
> Fixes several Google/Drive agent-builder blockers by **supporting
incremental OAuth scope upgrades** and by hardening how
credential-bearing file inputs (“auto-credentials”) are validated,
resolved, and cleared on graph fork.
> 
> On the integrations API, `/{provider}/login` now accepts
`credential_id` and persists it in `OAuthState` to upgrade an existing
OAuth2 credential on callback (explicit upgrade), with an implicit merge
path for same `provider+username`. The callback path now merges
scopes/metadata, preserves ID/title, preserves existing
`refresh_token`/`username` when missing from incremental responses,
blocks upgrades for managed/system credentials, and adds a **new
`/{provider}/credentials/{cred_id}/picker-token` endpoint** to return a
short-lived access token for provider-hosted pickers (currently
allowlisted to Google Drive scopes).
> 
> For auto-credentials, `CredentialsFieldInfo` gains
`is_auto_credential` + `input_field_name`, graphs now expose
`regular_credentials_inputs` vs `auto_credentials_inputs`, and multiple
callers switch from `aggregate_credentials_inputs()` to
`regular_credentials_inputs` so embedded picker credentials aren’t
treated as user-mapped inputs. Execution-time auto-credential
acquisition is extracted into `_acquire_auto_credentials()` with clearer
error handling and lock cleanup; block execution adds a systemic guard
to surface a clean `Missing credentials` error when auto-credentials are
absent.
> 
> Separately fixes Google credential deletion by rewriting
`GoogleOAuthHandler.revoke_tokens()` to use the platform `Requests`
helper (bounded retries) instead of `AuthorizedSession`, and expands
test coverage across these flows (incremental OAuth, picker-token,
auto-credential validation/acquisition, graph validator, and frontend
diagnostics test stubs).
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
cac36eae9f. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 17:16:30 +00:00
Zamil Majdy
80bfde1ca6 feat(blocks): charge Ayrshare per-post + align Bannerbear/Jina floors (#12893)
## Why

The cost-tracking audit on 2026-04-23 ([Platform System
Credentials](https://www.notion.so/auto-gpt/4d251f343fe146bcb91b6a037d1bfc3c))
surfaced three gaps where the user wallet was silently subsidising
third-party spend:

1. **Ayrshare (13 blocks)** — zero charge on every social post. No
`BLOCK_COSTS` entry, no SDK `.with_base_cost` registration. Platform
absorbs the entire ~$149/mo Business plan.
2. **Bannerbear** — flat 1 credit/call below the ~$0.025/image unit cost
on the Starter tier ($49/mo / 2K images).
3. **JinaChunkingBlock** — wallet-free; siblings (`JinaEmbeddingBlock`,
`SearchTheWebBlock`) are charged.

## What

- New `backend/blocks/ayrshare/_cost.py` with two-tier
`AYRSHARE_POST_COSTS` (5 credits when `is_video=True`, 2 credits
otherwise — first-match wins in `block_usage_cost`).
- All 13 `PostTo*Block` classes decorated with
`@cost(*AYRSHARE_POST_COSTS)`.
- `BannerbearTextOverlayBlock` floor: 1 → 3 credits in
`bannerbear/_config.py`.
- `JinaChunkingBlock` added to `BLOCK_COSTS` with a flat 1-credit floor.
- `cost(...)` decorator generic-ized via `TypeVar`, so pyright retains
`PostToXBlock.Input/Output` narrowing.

## How

Ayrshare uses a decorator-based registration (not a direct `BLOCK_COSTS`
entry) because each `post_to_*.py` block imports from `backend.sdk`, and
`backend.sdk.cost_integration` imports `BLOCK_COSTS` — listing the
blocks in `block_cost_config.py` would create a circular import. The
`@cost` decorator defined in `sdk/cost_integration.py` was already the
approved escape hatch for this exact shape.

cost_filter in `block_usage_cost` already supports boolean-field
matching (see Apollo's `enrich_info` tier), so `{"is_video": True}` and
`{"is_video": False}` select the right tier at execution time.
`is_video` defaults to `False` on `BaseAyrshareInput`, so posts that
omit the field still land on the 2-credit default.

## Test plan

- [x] `poetry run pytest backend/data/block_cost_config_test.py` — new
6-test suite covers Ayrshare video/non-video/default tiers, the
Bannerbear floor, and the Jina chunking floor
- [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py`
— no regressions (45 pre-existing tests still pass)
- [x] `poetry run ruff format` + `poetry run isort` + `poetry run ruff
check --fix`
- [x] `poetry run pyright` on touched files — 0 errors, 0 warnings
(pre-existing `LlmModel.KIMI_K2_*` errors are on dev and unrelated)
- [ ] Manual: run an Ayrshare post through the builder and confirm 2cr
(text/image) vs 5cr (video) charge
2026-04-23 20:39:35 +07:00
Zamil Majdy
81d6e91f37 feat(platform/copilot): message timestamps + accurate thought-for time (#12890)
## Why

The "Thought for 1m 46s" label under assistant replies has been
misleading
because the backend persists the whole-turn wall clock (from turn start
to
stream end) — which includes tool execution, browser sessions, graph
runs,
etc. Users also had no way to see when a message was actually sent /
received.

## What

- **Per-message timestamps** — `ChatMessage.created_at` (already on the
DB row)
is now serialised through the pydantic model and the
`SessionDetailResponse`,
then plumbed into the UI. Hovering the "Thought for X" label now shows
the
  absolute local date/time via a tooltip.
- **Accurate reasoning duration** — new
`ChatMessage.reasoningDurationMs`
  column. Backend accumulates time between `reasoning-start` and
`reasoning-end` SSE events inside `publish_chunk` (via the session meta
hash). `mark_session_completed` reads the total and persists it
alongside
the existing `durationMs`. Frontend prefers `reasoning_duration_ms` when
  present, falls back to `duration_ms` for legacy rows.

## How

- `schema.prisma` gains `reasoningDurationMs Int?`; migration
  `20260423120000_add_reasoning_duration_ms` adds the column.
- `publish_chunk` gains a side-effect that writes `reasoning_started_at`
/
`reasoning_ms_total` into the existing per-session Redis meta hash when
  reasoning events pass through. No extra IO path, no extra Redis key.
- `set_turn_duration` accepts an optional `reasoning_duration_ms` arg
and
  patches both the DB row and the cached session in place, mirroring the
  existing behaviour for `duration_ms`.
- Frontend: `convertChatSessionMessagesToUiMessages` now returns
`durations`, `reasoningDurations`, and `timestamps` maps. `TurnStatsBar`
picks the best available value and wraps the label in the design-system
  `BaseTooltip` so hover reveals the local timestamp.

## Test plan

- [x] `poetry run pytest
backend/copilot/db_test.py::test_set_turn_duration_*`
- [x] `poetry run pytest backend/copilot/stream_registry_test.py`
- [x] `pnpm format` / `pnpm lint` / `pnpm types` (copilot area)
- [x] `pnpm test:unit src/app/\(platform\)/copilot` — 705 tests pass (4
pre-existing `jszip` module resolution failures unrelated to this
change)
- [ ] Manual: open a session with a long tool run and confirm the new
"Thought for X" reflects only reasoning time (falls back for old rows)
      and the tooltip surfaces the local timestamp.
2026-04-23 18:55:34 +07:00
Zamil Majdy
39cdc0a5e0 fix(backend/copilot): tame Kimi compaction storm + tunable threshold + Langfuse cost backfill (#12889)
## Why

Investigation of two reported sessions
([85804387](https://dev-builder.agpt.co/copilot?sessionId=85804387-7708-4fdc-8ec9-64283cdd902d),
[19d69dec](https://dev-builder.agpt.co/copilot?sessionId=19d69dec-210f-4439-a94b-2d7d443b9909))
where Kimi K2.6 via OpenRouter was running ~30 min per turn with no
actions completed (Discord report from Toran). Langfuse traces showed:

- 31 generation calls per turn at p90 = 151s, max = 415s
- 2.57M uncached tokens, `cache_create=0`, ~4% cache_read — Moonshot's
OpenRouter endpoint silently drops Anthropic-style cache writes
- **3 SDK-internal compactions per turn** — each compaction is itself a
slow LLM round-trip
- Reconciled OpenRouter cost was being recorded to a DB row but never
surfaced on the Langfuse trace, leaving operators to grep pod logs

## What

Four commits, split by concern.

### 1. `fix(backend/copilot): skip CLAUDE_AUTOCOMPACT_PCT_OVERRIDE for
Moonshot/Kimi` (`5fd9c5aa`)

`env.py` was unconditionally setting
`CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50` (introduced in #12747 to cap
cache-creation cost on Anthropic where context >200K = 54% of total
cost). On Kimi where `cache_create=0` silently, the cache-cost rationale
doesn't apply — but the 50% threshold still made the bundled CLI
auto-compact at ~100K tokens, triggering 3+ compactions per turn against
Kimi's larger effective window. Each compaction added a slow LLM
round-trip (one in our test ran 166s and burned the budget cap before
the user got any output).

Threads the resolved `sdk_model` (and `fallback_model`) into
`build_sdk_env` and skips the env var when the model matches
`is_moonshot_model(...)`. The CLI then uses its default ~93% threshold,
cutting compaction passes to 0–1.

### 2. `feat(backend/copilot): backfill OpenRouter reconciled cost to
Langfuse trace` (`f3de3624` + follow-ups `5ce3d038`, `d2c1a2cd`,
`d8e08525`, `d243bf6c9`)

`record_turn_cost_from_openrouter` runs as a fire-and-forget task after
the OTel span closes, so the Langfuse trace UI showed the SDK CLI's
rate-card estimate only — for non-Anthropic OpenRouter routes that
estimate is Sonnet pricing on Kimi tokens (~5x too high).

The backfill captures `langfuse.get_current_trace_id()` and threads it
into the reconcile task, which emits an `openrouter-cost-reconcile`
child event with the authoritative cost + token usage. **Bug caught
during /pr-test:** `propagate_attributes` only annotates an existing
OTel span, it doesn't create one — by the time the `finally` block runs,
SDK-emitted spans have ended and `get_current_trace_id()` returns None.
Fixed in `d8e08525` by wrapping the turn in
`langfuse.start_as_current_span(name="copilot-sdk-turn")`. Also tags
fallback-path events with `cost_source` so operators can distinguish
reconciled vs estimated turns.

### 3. `feat(backend/copilot): expose CLAUDE_AUTOCOMPACT_PCT_OVERRIDE as
a config knob` (`72416f73`)

The previously-hardcoded `50` is now
`claude_agent_autocompact_pct_override` (default 50, env
`CHAT_CLAUDE_AGENT_AUTOCOMPACT_PCT_OVERRIDE`). Setting to 0 omits the
env var entirely so the CLI uses its native ~93% threshold — useful when
the post-compact floor (system prompt + tool defs ≈ 65–110K) sits close
to an aggressive trigger and operators see back-to-back compaction
cascades. Moonshot routes still skip the env var unconditionally
regardless of config.

### 4. `fix(backend/copilot): align SDK retry compaction target with CLI
autocompact threshold` (`730ad256`)

`_reduce_context` was calling `compact_transcript` without an explicit
`target_tokens`, so it fell back to `get_compression_target(model) =
context_window - 60K`. For Sonnet 200K that's 140K — well above the
CLI's PCT=50 trigger of 90K — and for Kimi 256K it's 196K, above the
CLI's default 167K trigger. Result: a successful retry compaction landed
at 140K/196K and the CLI immediately re-compacted on the next call →
**two compactions per recovered turn**.

New `_compaction_target_tokens(model)` mirrors the CLI's `i6_()` formula
(`min(window * pct/100, window - 13K)`) with a 20K safety buffer so the
post-compact context sits comfortably below the CLI's trigger.

## How — empirical validation against the actual long Kimi transcript

Replayed the 199-message transcript from session 85804387 through the
bundled CLI in two configurations:

| | Post-fix (no override) | Pre-fix (`PCT_OVERRIDE=50`) |
|---|---|---|
| `autocompact: tokens=` | 126,312 | 126,341 |
| `threshold=` | **167,000** | **90,000** |
| Decision | 126K < 167K → **skip** | 126K > 90K → **COMPACTION FIRES**
|
| Duration | 21s | **166s** (8x slower) |
| Cost | $0.34 | **$0.82** (2.4x more) |
| Output | PONG (success) | empty (hit $0.50 budget cap, exit 1) |

The pre-fix configuration burned $0.82 of compaction work over 166s and
never produced a user response — exactly the failure mode reported.

**Why cascade happens at 50%, not at 93%:** post-compaction context is
`summary (~5–10K) + system_prompt + tool_definitions + skills + active
TodoWrite + memory ≈ 65–110K floor`. With trigger at 90K, post-compact
floor sits AT or above the trigger → next assistant message tips over →
immediate re-compaction → cascade until the CLI's rapid-refill breaker
trips at 3 attempts. With trigger at 167K, the same floor sits
comfortably below trigger → no cascade.

## Considered but not done

- **Force `cache_control` markers to reach Moonshot**: bundled CLI sends
them by default; Moonshot silently drops them per their own docs (uses
`X-Msh-Context-Cache` headers, not body markers). Real fix needs
bypassing OpenRouter — out of scope.
- **Slim the system prompt + tool definitions** to lower the
post-compact floor: real win but separate refactor with tool-use
accuracy A/B.
- **LD-driven auto-fallback to Sonnet on Kimi degradation**:
`claude_agent_fallback_model` already wires `--fallback-model` for
overload (529); auto-flipping on slowness needs latency aggregation
infra that doesn't exist yet.

## Test plan

- [x] `poetry run pytest backend/copilot/sdk/env_test.py
backend/copilot/sdk/openrouter_cost_test.py
backend/copilot/sdk/service_helpers_test.py` — 111 passed (37 env + 23
cost + 51 helpers, including 6 new env tests, 3 backfill tests, 6 new
compaction-target tests)
- [x] `poetry run pytest backend/copilot/sdk/` — 970+ passed
- [x] `poetry run pyright .` — 0 errors
- [x] `poetry run format` — clean
- [x] /pr-test --fix end-to-end against dev — 5/5 scenarios PASS,
including Anthropic route ($0.0174 cost +0.0% delta) and Moonshot route
($0.028 vs $0.018 → +58.2% delta validates reconcile rationale)
- [x] Transcript replay validation: pre-fix vs post-fix on real
126K-token transcript → 8x slower / 2.4x more expensive / fails entirely
on pre-fix; clean PONG on post-fix
2026-04-23 18:46:35 +07:00
Zamil Majdy
4242da79f0 fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint (#12892)
## Why

On prod, longer copilot runs (complex feature implementations, multi-bug
fix chains) error out with `Exceeded 30 tool-call rounds without a final
response`, lose mid-stream assistant output, and the UI appears to
re-dispatch an older prompt. Reported by @itsababseh in #breakage for
session `661ba0cc-a905-4c66-bf11-61eb5423d775`.

Langfuse trace of that session shows 52 turns / 344 LLM calls; **two
turns hit exactly 30 rounds** (Turn 38: implementing kill-cam/headshot
juice pass; Turn 42: fixing multi-bug list). Both were legitimate,
non-looping work that simply needed more rounds to complete. Round 30
fired `bash_exec`, the loop cut off cold, no summary was ever produced,
and the stream surfaced `baseline_tool_round_limit`. Frontend
subsequently re-dispatched the same user message several times (turns
39–41 × 3, turns 43–47 × 5 with identical prompt), which is what the
user perceives as "falling back into acting on an older command."

Root cause: [`_MAX_TOOL_ROUNDS =
30`](https://github.com/Significant-Gravitas/AutoGPT/blob/cf6d7034f/autogpt_platform/backend/backend/copilot/baseline/service.py#L125)
has been unchanged since the baseline path was introduced (#12276).
Modern agent turns with Claude Code / Kimi / Sonnet routinely need more.

## What

- Raise `_MAX_TOOL_ROUNDS` from 30 → 100.
- Pass `last_iteration_message` to `tool_call_loop` so the final round
receives a "stop calling tools, wrap up" system hint. The model now
produces a graceful summary on the last round instead of being cut off
mid-tool.

## How

Two-line change in
[`backend/copilot/baseline/service.py`](https://github.com/Significant-Gravitas/AutoGPT/blob/fix/copilot-baseline-tool-round-limit/autogpt_platform/backend/backend/copilot/baseline/service.py):
- Bump the module-level constant.
- Define `_LAST_ITERATION_HINT` and wire it via the existing
`last_iteration_message` kwarg on
[`tool_call_loop`](https://github.com/Significant-Gravitas/AutoGPT/blob/cf6d7034f/autogpt_platform/backend/backend/util/tool_call_loop.py#L188).
The shared loop already handles appending it only on the final iteration
(see `tool_call_loop_test.py::test_last_iteration_message_appended`).

Frontend retry cascade on `baseline_tool_round_limit` is a separate UX
issue — logging it as a follow-up.

## Checklist

- [x] My code follows the project's style guidelines
- [x] I have performed a self-review
- [x] Existing `tool_call_loop_test.py` covers `last_iteration_message`
behavior (10/10 passing)
- [x] No new migrations
- [x] No breaking changes (constant/kwarg only)
2026-04-23 18:38:52 +07:00
Zamil Majdy
cf6d7034fa fix(backend/copilot): sync safety net for Redis-induced zombie sessions (#12886)
## Why

A 25-min-old copilot turn ended up a zombie in Redis (`status=running`
for 60+ min, queued user messages never drained) after a rolling deploy
of `autogpt-copilot-executor`. Root cause:

1. Cluster churn during the rollout broke a Redis call mid-turn.
2. `_execute_async`'s `finally` tried to publish the failure via
`mark_session_completed` on the same (now-broken) event loop +
thread-local Redis client.
3. That Redis call *also* failed; the exception was caught and logged
but never reached Redis — so the session meta stayed `running`.
4. `on_run_done` then completed the future normally, `active_tasks`
drained, the pod exited.
5. The zombie persisted until the 65-min stale-session watchdog reaped
it. While it was live, queued-message pushes succeeded (HTTP only checks
`status=running`), so the UI showed "Queued" bubbles that never drained.

## What

The fix is **one small addition** in the per-turn lifecycle:

### `sync_fail_close_session` — last line of defense in
`processor.execute`'s `finally`

Invoked from `CoPilotProcessor.execute()`'s `finally` on every turn
exit. Submits the CAS coroutine to the processor's long-lived
`self.execution_loop` via `asyncio.run_coroutine_threadsafe` — the same
pattern `ExecutionProcessor.on_graph_execution` uses at
[executor/manager.py:881-892](autogpt_platform/backend/backend/executor/manager.py#L881-L892)
to bridge sync→async through `node_execution_loop`.

- Calls `mark_session_completed(session_id,
error_message=SHUTDOWN_ERROR_MESSAGE)`, which is a CAS on `status ==
"running"`. If the async path already wrote a terminal state the CAS
no-ops; otherwise we mark `failed` and the UI transitions cleanly.
- Bounded by inner `asyncio.wait_for(timeout=10s)` and outer
`future.result(timeout=12s)` so a genuinely unreachable Redis can't hang
the safety net.
- Reuses the long-lived execution loop (no per-turn TCP connect, no
`@thread_cached` thrashing).

The outer `future.result()` in `_execute()` is bounded by
`_CANCEL_GRACE_SECONDS` (5s) so a wedged event loop can't trap the flow
before the safety net fires.

### `cleanup()` stays aligned with agent-executor

Mirrors the pattern from `backend.executor.manager.cleanup` — a single
method that:

1. Flags + tells the broker to stop consuming.
2. Passively waits for `active_tasks` to drain (up to
`GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS`).
3. Worker / executor / lock teardown.

No pre-emptive cancellation of healthy turns, no fail-close step for
stuck turns. Same proven shape agent-executor uses.

### Timeout alignment

Raised both `COPILOT_CONSUMER_TIMEOUT_SECONDS` and
`GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS` to 6h so a rolling deploy can let
the longest legitimate turn finish via its own lifecycle path. Matched
in infra at `terminationGracePeriodSeconds: 21600`
(Significant-Gravitas/AutoGPT_cloud_infrastructure#311).

### RabbitMQ policy — deploy prep

The `x-consumer-timeout` queue argument is changing from 1h → 6h. Tested
empirically on dev's RabbitMQ 4.1.4: `queue_declare` is tolerant of
`x-consumer-timeout` mismatches, so no queue delete is needed. To make
the new timeout **immediately effective for running consumers** (so pods
mid-shutdown don't have their consumer cancelled at the old 1h limit),
apply a policy before deploying:

```bash
rabbitmqctl set_policy copilot-consumer-timeout \
  "^copilot_execution_queue$" \
  '{"consumer-timeout": 21600000}' \
  --apply-to queues
```

Already applied on dev. Apply on prod before the PR's prod deploy.

### Incidental rename

- `_clear_pending_messages_unsafe` → `clear_pending_messages_unsafe`
(keeps the `_unsafe` warning suffix; importable without the
leading-underscore private marker).

## How

Before: transient Redis failure → async finally silently fails → zombie
session → queued messages never drain.
After: transient Redis failure → `execute()`'s sync finally runs
`mark_session_completed` on the processor's long-lived loop → session
correctly marked failed → UI sees terminal state immediately.

SIGTERM path unchanged from the "let in-flight work finish" design: old
pod stops taking new work, existing turns complete naturally.

## Test plan

- [x] `TestSyncFailCloseSession` unit tests — invokes
`mark_session_completed` with the shutdown error, swallows Redis
failures, bounded timeout fires when Redis hangs.
- [x] `TestExecuteSafetyNet` — verifies the `finally` always fires,
including SIGTERM-interrupted and zombie-Redis scenarios.
- [x] Existing `TestExecuteAsyncAclose` + pending_messages tests still
pass (18 passed).
- [x] `pyright` on touched files: 0 errors.
- [x] Manual E2E on native dev stack: sent a `sleep 300 && echo hewwo`
task, SIGTERMed mid-turn at +40s, observed:
   - `[CoPilotExecutor] [cleanup N] Starting graceful shutdown...`
   - Drain-wait ran for ~4.5 min ("1 tasks still active, waiting...")
- Turn finished with `result=Done! The command finished after 5 minutes
and printed: hewwo`
   - `Cleaned up completed session` → `Graceful shutdown completed`
   - No zombie.
- [x] `poetry run format` applied.
- [x] RabbitMQ policy verified on dev. Apply on prod before prod deploy.
- [ ] Verified behavior on next production rolling deploy.
2026-04-23 06:49:06 +07:00
Zamil Majdy
c56c1e5dd6 fix(backend/copilot): disable ask_question tool pending UX rework (#12887)
### Why / What / How

**Why:** The in-conversation Question GUI is unreliable in production —
users submitting answers can get their messages dropped and the agent
gets stuck on the auto-generated "please proceed" step with no way to
make progress. Discord report:
https://discord.com/channels/1126875755960336515/1496474512966029472/1496537943287005365
(see attached video). Pause/queue semantics still need a rework; until
then, the right call is to stop the model from reaching for this tool.

**What:** Removes `ask_question` from the copilot tool registry so the
model never sees or calls it. Historical sessions that already contain
`ask_question` tool calls still render (frontend renderers + response
model untouched), so this is non-destructive to existing chats.
Re-enabling once UX is reworked is a small revert.

**How:**
- Drop the `AskQuestionTool` import + registry entry from
`backend/copilot/tools/__init__.py`.
- Drop `"ask_question"` from the `ToolName` literal in
`backend/copilot/permissions.py` — required because a runtime
consistency check asserts the literal matches `TOOL_REGISTRY.keys()`.
- Delete the "Clarifying — Before or During Building" section from
`backend/copilot/sdk/agent_generation_guide.md` so the SDK-mode system
prompt no longer instructs the model to call `ask_question`.
- Drop the three `prompting_test.py` tests that asserted the guide
mentions that section.
- Keep `ask_question.py`, its unit test, `ClarificationNeededResponse`,
and the frontend `AskQuestion`/`ClarificationQuestionsCard` components
untouched so old sessions still render and re-enabling is a small
revert.

### Changes 🏗️

- `backend/copilot/tools/__init__.py` — remove `AskQuestionTool` import
and `"ask_question"` entry in `TOOL_REGISTRY`.
- `backend/copilot/permissions.py` — remove `"ask_question"` from the
`ToolName` literal.
- `backend/copilot/sdk/agent_generation_guide.md` — remove the
"Clarifying — Before or During Building" section.
- `backend/copilot/prompting_test.py` — remove
`TestAgentGenerationGuideContainsClarifySection` and the now-unused
`Path` import.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/copilot/tools/
backend/copilot/permissions_test.py backend/copilot/prompting_test.py` —
805+78 tests pass, consistency check between `ToolName` literal and
`TOOL_REGISTRY` still holds.
- [ ] Smoke-test in dev: start a copilot session and confirm the model
no longer lists/calls `ask_question` (its OpenAI tool schema is gone
from `get_available_tools()` and from the SDK `allowed_tools`).
- [ ] Load a historical session that contains an `ask_question` tool
call in its transcript — confirm the frontend still renders the question
card (no regression on legacy sessions).
2026-04-22 23:34:04 +07:00
Bentlybro
6fcbe95645 Merge branch 'master' into dev 2026-04-22 15:36:37 +01:00
Zamil Majdy
9703da3dfd refactor(backend/copilot): Moonshot module + cache_control widening + partial-messages default-on + title cost (#12882)
## Why

Several loose ends from the Kimi SDK-default merge (#12878), plus
follow-ups surfaced during review + E2E testing:

1. **Kimi-specific pricing lived inline in `sdk/service.py`** alongside
unrelated SDK plumbing — any future non-Anthropic vendor would have
piled onto the same file.
2. **Moonshot's Anthropic-compat endpoint honours `cache_control: {type:
ephemeral}`**, but the baseline cache-marking gate
(`_is_anthropic_model`) was narrow enough to exclude it → Moonshot fell
back to automatic prefix caching, which drifts readily between turns.
3. **Kimi reasoning rendered AFTER the answer text** on dev because the
summary-walk hoist only reorders within one `AssistantMessage.content`
list, and Moonshot splits each turn into multiple sequential
AssistantMessages (text-only, then thinking-only).
4. **Title generation's LLM call bypassed cost tracking** — admin
dashboard under-reported total provider spend by the aggregate of those
per-session calls.
5. **Cost override** was using the requested primary model, not the
actually-executed model — when the SDK fallback activates the override
mis-routes pricing.

## What

### Moonshot module
New `backend/copilot/moonshot.py`:
- `is_moonshot_model(model)` — prefix check against `moonshotai/`
- `rate_card_usd(model)` — published Moonshot rates, default `(0.60,
2.80)` per MTok with per-slug override slot
- `override_cost_usd(...)` — moved from `sdk/service.py`, replaces CLI's
Sonnet-rate estimate with real rate card
- `moonshot_supports_cache_control(model)` — narrow gate for cache
markers

Rate card is **not canonical** — authoritative cost comes from the
OpenRouter `/generation` reconcile; this module only improves the
in-turn estimate and the reconcile's lookup-fail fallback. Signal
authority: reconcile >> rate card >> CLI.

### Baseline cache-control widened to Moonshot
- New `_supports_prompt_cache_markers` = `_is_anthropic_model OR
is_moonshot_model`
- Both call sites (system-message cache dict, last-tool cache marker)
switched to the wider gate
- OpenAI / Grok / Gemini still return `false` — those endpoints 400 on
the unknown field

**Measured impact in /pr-test:** baseline Kimi continuation turns jumped
to ~98% cache hit (334 uncached + 12.8K cache_read on a 13.1K prompt).

### SDK partial-messages default-on (fixes the reasoning-order bug)
- `CHAT_SDK_INCLUDE_PARTIAL_MESSAGES` flipped from `default=False` →
`default=True`
- Kimi stream now emits `reasoning-start → reasoning-delta* →
reasoning-end → text-start → text-delta*` in the correct order —
verified in /pr-test
- Kill-switch: set `CHAT_SDK_INCLUDE_PARTIAL_MESSAGES=false` to fall
back to summary-only emission

### SDK cost override scoped to Moonshot
- Call site now explicitly gates `if _is_moonshot_model(active_model)` —
Anthropic turns trust CLI's number directly
- Added `_RetryState.observed_model` populated from
`AssistantMessage.model`, preferred over `state.options.model` so
fallback-model turns bill correctly (addresses CodeRabbit review)

### Title cost capture
- `_generate_session_title` now returns `(title, ChatCompletion)` so the
caller controls cost persistence
- `_update_title_async` runs title-persist and cost-record as
independent best-effort steps
- `_title_usage_from_response` helper reads `prompt_tokens /
completion_tokens / cost_usd` (OR's `usage.cost` off `model_extra`)
- Provider label derived from `ChatConfig.base_url` (`open_router` /
`openai`)
- No exception suppressors — `isinstance(cost_raw, (int, float))` check
replaces the inner `float()` try/except

### Misc
- Kimi tool-name whitespace strip in the response adapter — Kimi
occasionally emits tool names with leading spaces the CLI dispatcher
can't resolve
- TODO marker on the rate-card for post-prod-soak removal

## How

- Detection is **prefix-based** (`moonshotai/`) — future Kimi SKUs
transparently inherit rate card + cache-control gate
- Baseline cache-marking was already structured; only the gate changes
- Partial-messages default-on relies on the adapter's diff-based
reconcile (shipped in #12878) which has soaked stable
- Title cost path mirrors `tools/web_search.py`'s pattern for reading
OR's `usage.cost`

## Test plan

- [x] `pytest backend/copilot/moonshot_test.py` — 21 tests
- [x] `pytest backend/copilot/baseline/service_unit_test.py` — updated
for widened gate
- [x] `pytest backend/copilot/sdk/*_test.py
backend/copilot/service_test.py` — no regressions
- [x] Full E2E on local native stack — 10/10 scenarios pass (see
test-report comment)
- [x] Measured: baseline Kimi ~98% cache hit on continuation, SDK Kimi
~62% (capped by Moonshot's prefix ceiling)

## Deferred

SDK-path Moonshot cache hit rate stays at ~62% on long prompts.
`native_tokens_cached=18432` regardless of turn/session suggests a
Moonshot-side cap on cached prefix size. Not fixable by our code —
requires proxy rewriting requests or upstream Moonshot change.
2026-04-22 20:42:47 +07:00
Zamil Majdy
ebb0d3b95b feat(backend/copilot): LaunchDarkly per-user model routing (#12881)
## Summary

Per-user model routing for the copilot via LaunchDarkly. Replaces the
pure-env-var pick on every `(mode, tier)` cell of the model matrix with
an LD-first resolver that falls back to the `ChatConfig` default. Lets
us roll out non-default routes (e.g. Kimi K2.6 on baseline standard) to
a user cohort without shipping a deploy.

| | standard | advanced |

|----------|------------------------------------|------------------------------------|
| fast | `copilot-fast-standard-model` | `copilot-fast-advanced-model` |
| thinking | `copilot-thinking-standard-model` |
`copilot-thinking-advanced-model` |

All four flags are **string-valued** — the value IS the model identifier
(e.g. `"anthropic/claude-sonnet-4-6"` or `"moonshotai/kimi-k2.6"`).

## What ships

- **New module `backend/copilot/model_router.py`** with a single
`resolve_model(mode, tier, user_id, *, config)` coroutine. That's the
one place both paths consult.
- **4 new `Flag` enum values** in `backend/util/feature_flag.py`
(reusing the existing `get_feature_flag_value` helper which already
supports arbitrary return types).
- **`baseline/service.py::_resolve_baseline_model`** → async, takes
`user_id`.
- **`sdk/service.py::_resolve_sdk_model_for_request`** → takes
`user_id`, consults LD for both standard and advanced thinking cells.
- **Default flip**: `fast_standard_model` default goes back to
`anthropic/claude-sonnet-4-6`. Non-Anthropic routes now ship via LD
targeting — safer rollback, per-user cohort control, no redeploy
required to flip.

## Behavior preserved

- `config.claude_agent_model` explicit override still wins
unconditionally (existing escape hatch for ops).
- `use_claude_code_subscription=true` on the standard thinking tier
still returns `None` so the CLI picks the model tied to the user's
Claude Code subscription.
- All legacy env var aliases (`CHAT_MODEL`, `CHAT_ADVANCED_MODEL`,
`CHAT_FAST_MODEL`) still bind to their cells.
- LD client exceptions / misconfigured (non-string) flag values fall
back silently to config default with a single warning log — never fails
the request.

## Files

| File | Change |
|---|---|
| `backend/copilot/model_router.py` | new — `resolve_model` +
`_config_default` + `_FLAG_BY_CELL` map |
| `backend/copilot/model_router_test.py` | new — 11 cases |
| `backend/util/feature_flag.py` | add 4 string-valued `Flag` entries |
| `backend/copilot/config.py` | flip `fast_standard_model` default to
Sonnet |
| `backend/copilot/baseline/service.py` | `_resolve_baseline_model` →
async + LD resolver |
| `backend/copilot/sdk/service.py` | `_resolve_sdk_model_for_request` →
LD resolver + user_id |
| `backend/copilot/baseline/transcript_integration_test.py` | update
tests for new signature + default |

## Test plan

- [x] `poetry run pytest backend/copilot/model_router_test.py
backend/copilot/baseline/transcript_integration_test.py
backend/copilot/sdk/service_test.py backend/copilot/config_test.py` —
**112 passing**
- [x] 11 resolver cases: missing user → fallback, LD string wins,
whitespace stripped, non-string value → fallback, empty string →
fallback, LD exception → fallback + warn, each of 4 cells routes to its
distinct flag
- [x] Legacy env aliases still bind to their new fields
- [ ] Manual dev-env smoke: flip `copilot-fast-standard-model` LD
targeting to `moonshotai/kimi-k2.6` for one user and confirm baseline
uses Kimi while other users stay on Sonnet
- [ ] Confirm SDK path still honors subscription mode (LD not consulted
when `use_claude_code_subscription=true` + standard tier)

## Rollout

1. Merge this PR → default stays Sonnet / Opus across the matrix, no
behavior change.
2. Create the 4 LD flags as string-typed in the LaunchDarkly console
(defaults matching config, so no drift if targeting empty).
3. Add per-user / per-cohort targeting in LD for the routes we want to
roll out (Kimi on baseline standard for a percentage, etc.).
2026-04-22 20:08:37 +07:00
Zamil Majdy
b98bcf31c8 feat(backend/copilot): SDK fast tier defaults to Kimi K2.6 via OpenRouter + vendor-aware cost + cross-model fix (#12878)
## Summary

Make Kimi K2.6 the default for the SDK (extended-thinking) copilot path,
mirroring the baseline default landed in #12871. The SDK already routes
through OpenRouter (see
[`build_sdk_env`](autogpt_platform/backend/backend/copilot/sdk/env.py) —
`ANTHROPIC_BASE_URL` is set to OpenRouter's Anthropic-compatible
`/v1/messages` endpoint), but the model resolver was unconditionally
stripping the vendor prefix, which prevented routing to anything except
Anthropic models. This PR unblocks Kimi (and any other non-Anthropic
OpenRouter vendor) on the SDK fast tier and flips the default to match
the baseline path.

## Why

After #12871 the baseline (`fast_*`) path runs Kimi K2.6 by default —
~5x cheaper than Sonnet at SWE-Bench parity — but the SDK (`thinking_*`)
path was still pinned to Sonnet because:

1. **Model name normalization stripped the vendor prefix.**
`_normalize_model_name("moonshotai/kimi-k2.6")` returned `"kimi-k2.6"`,
which OpenRouter cannot route — the unprefixed form only resolves for
Anthropic models. The docstring on `thinking_standard_model` claimed
"the Claude Agent SDK CLI only speaks to Anthropic endpoints", but the
env builder shows the CLI happily talks to OpenRouter's `/messages`
endpoint, which routes to any vendor in the catalog.
2. **The default was `anthropic/claude-sonnet-4-6`.** Same model on a
more expensive route.
3. **Cost label was hardcoded to `provider="anthropic"`** on the SDK
path's `persist_and_record_usage` call, making cost-analytics rows
misleading once Kimi runs.

## What

1. **`_normalize_model_name`**
([sdk/service.py](autogpt_platform/backend/backend/copilot/sdk/service.py))
— when `config.openrouter_active` is True, the canonical `vendor/model`
slug is preserved unchanged so OpenRouter can route to the correct
provider. Direct-Anthropic mode keeps the existing strip-prefix +
dot-to-hyphen conversion (Anthropic API requires both) and now **raises
`ValueError`** when paired with a non-Anthropic vendor slug — silent
strip would have sent `kimi-k2.6` to the Anthropic API and produced an
opaque `model_not_found`.
2. **`thinking_standard_model`**
([config.py](autogpt_platform/backend/backend/copilot/config.py)) —
default flipped from `anthropic/claude-sonnet-4-6` to
`moonshotai/kimi-k2.6`. Field description rewritten; rollback to Sonnet
is one env var
(`CHAT_THINKING_STANDARD_MODEL=anthropic/claude-sonnet-4.6`).
3. **`@model_validator(mode="after")`** on `ChatConfig`
([config.py:_validate_sdk_model_vendor_compatibility](autogpt_platform/backend/backend/copilot/config.py))
— fail at config load when `use_openrouter=False` is paired with a
non-Anthropic SDK slug. The runtime guard in `_normalize_model_name` is
kept as defence-in-depth, but the validator turns a per-request 500 into
a boot-time error message the operator sees once, before any traffic
lands. Covers `thinking_standard_model`, `thinking_advanced_model`, and
`claude_agent_fallback_model`. Subscription mode is exempt (resolver
returns `None` and never normalizes). The credential-missing case
(`use_openrouter=True` + no `api_key`) is intentionally NOT a boot-time
error so CI builds and OpenAPI-schema export jobs that construct
`ChatConfig()` without secrets keep working — the runtime guard still
catches it on the first SDK turn.
4. **Cost provider attribution**
([sdk/service.py:stream_chat_completion_sdk](autogpt_platform/backend/backend/copilot/sdk/service.py))
— `persist_and_record_usage` now passes `provider="open_router" if
config.openrouter_active else "anthropic"` instead of hardcoded
`"anthropic"`. The dollar value still comes from
`ResultMessage.total_cost_usd`; this just fixes the analytics label.
5. **Baseline rollback example** ([config.py:fast_standard_model
description](autogpt_platform/backend/backend/copilot/config.py)) — same
dot-vs-hyphen footgun fix (CodeRabbit catch).
6. **Tests** — `TestNormalizeModelName` (sdk/) monkeypatches a
deterministic config per case (the helper-test variants were passing
accidentally based on ambient env). New
`TestSdkModelVendorCompatibility` class in `config_test.py` covers all
five validator shapes (default-Kimi + direct-Anthropic raises, anthropic
override succeeds, openrouter mode succeeds, subscription mode skips
check, advanced+fallback tier also validated, empty fallback skipped).
`_ENV_VARS_TO_CLEAR` extended to all model/SDK/subscription env aliases
so a leftover dev `.env` value can't mask validator behaviour. New
`_make_direct_safe_config` helper for direct-Anthropic tests.

## Test plan

- [x] `poetry run pytest backend/copilot/config_test.py
backend/copilot/sdk/service_test.py
backend/copilot/sdk/service_helpers_test.py
backend/copilot/sdk/env_test.py
backend/copilot/sdk/p0_guardrails_test.py` — 238 pass
- [x] `poetry run pytest backend/copilot/` — 2560 pass + 5 pre-existing
integration failures (need real API keys / browser env, unrelated)
- [x] CI green on `feat/copilot-sdk-kimi-default` (35 pass / 0 fail / 1
neutral)
- [x] Manual: SDK extended_thinking turn against Kimi K2.6 via
OpenRouter on the native dev stack — request lands with
`model=moonshotai/kimi-k2.6`, response streams back, multi-turn
`--resume` recalls facts across turns. Backend log: `[SDK] Per-request
model override: standard (moonshotai/kimi-k2.6)`.
- [x] Manual: rollback path —
`CHAT_THINKING_STANDARD_MODEL=anthropic/claude-sonnet-4.6` resumes
Sonnet routing.

## Known follow-ups (not in this PR)

These surfaced during manual testing and will need separate PRs:

- **SDK CLI cost is wrong for non-Anthropic models.**
`ResultMessage.total_cost_usd` comes from a static Anthropic pricing
table baked into the CLI binary; for Kimi K2.6 it falls back to Sonnet
rates, **over-billing ~5x** ($0.089 vs the real ~$0.018 for ~30K prompt
+ ~80 completion). The `provider` label is now correct but the dollar
value isn't. Needs either a per-model rate card override on our side or
a CLI patch upstream.
- **Mid-session model switch (Kimi → Opus) breaks.** Kimi's
`ThinkingBlock`s have no Anthropic `signature` field; when the user
toggles standard → advanced after a Kimi turn, Opus rejects the replayed
transcript with `Invalid signature in thinking block`. Needs transcript
scrubbing on model switch (similar to existing
`TestStripStaleThinkingBlocks` pattern).
- **Reasoning UI ordering on Kimi.** Moonshot/OpenRouter places
`reasoning` AFTER text in the response; the SDK's
`AssistantMessage.content` reflects that order, and `response_adapter`
emits SSE events in the same order — so reasoning lands BELOW the answer
in the UI instead of above. Needs `ThinkingBlock` hoisting in
`response_adapter.py`.
2026-04-22 18:35:01 +07:00
Zamil Majdy
4f11867d92 feat(backend/copilot): TodoWrite for baseline copilot (#12879)
## Summary

Add `TodoWrite` to baseline copilot so the "task checklist" UI works on
non-Claude models (Kimi, GPT, Grok, etc.) the same way it works on the
SDK path. Baseline previously had no `TodoWrite` tool at all — only SDK
mode did via the Claude Code CLI's built-in — so models on baseline just
couldn't reach for a planning checklist.

This closes the last clear feature gap blocking baseline from being the
primary copilot path without giving up model flexibility.

## What ships

- **New MCP tool `TodoWrite`** in `TOOL_REGISTRY`, schema matching the
one the frontend's `GenericTool.helpers.ts` (`getToolCategory → "todo"`)
already renders as the **Steps** accordion. The tool is a stateless echo
— the canonical list lives in the model's latest tool-call args and
replays from transcript on subsequent turns.
- **Prompt guidance** in `SHARED_TOOL_NOTES` teaching the model when to
use it (3+ step tasks; always send the full list; exactly one
`in_progress` at a time).
- **Sharpened `run_sub_session` guidance** in the same prompt section —
framed explicitly as the context-isolation primitive for baseline.
Clearer for the model, no dual-primitive confusion.

## How the SDK path stays untouched

- SDK mode keeps using the CLI-native `TodoWrite` built-in.
- `BASELINE_ONLY_MCP_TOOLS = {"TodoWrite"}` in `sdk/tool_adapter.py`
filters the baseline MCP wrapper out of SDK's `allowed_tools` — no name
shadowing.
- `SDK_BUILTIN_TOOL_NAMES` is now an explicit allowlist (not
auto-derived from capitalization) so the classification stays coherent
when a capitalized tool is platform-owned.

## Files

| File | Change |
|---|---|
| `backend/copilot/tools/todo_write.py` | new — `TodoWriteTool` |
| `backend/copilot/tools/__init__.py` | register in `TOOL_REGISTRY` |
| `backend/copilot/tools/models.py` | add `TodoItem` +
`TodoWriteResponse` + `ResponseType.TODO_WRITE` |
| `backend/copilot/permissions.py` | explicit `SDK_BUILTIN_TOOL_NAMES`;
`apply_tool_permissions` maps baseline-only tools to CLI name for SDK |
| `backend/copilot/sdk/tool_adapter.py` | `BASELINE_ONLY_MCP_TOOLS`
filter |
| `backend/copilot/prompting.py` | `TodoWrite` + sharpened
`run_sub_session` guidance |
| `backend/api/features/chat/routes.py` | add `TodoWriteResponse` to
`ToolResponseUnion` |
| `backend/copilot/tools/todo_write_test.py` | new — schema + execute
tests |
| `frontend/src/app/api/openapi.json` | regenerated |
| `tools/tool_schema_test.py` | budget bumped `32_800 → 34_000` (actual
33_865, +1_065 headroom) |

## Test plan

- [x] `poetry run pytest backend/copilot/
backend/api/features/chat/routes_test.py` — **1010 passing**
- [x] Tool schema char budget regression gate passes
- [x] `_assert_tool_names_consistent` passes
- [x] **E2E on local native stack (Kimi K2.6 via OpenRouter,
`CHAT_USE_CLAUDE_AGENT_SDK=false`)**: baseline called `TodoWrite` on a
3-step prompt, SSE stream carried the exact `{content, activeForm,
status}` shape the UI expects, "Steps" dialog renders `Task list — 0/3
completed` with all three items (see test-report comment below).
- [x] Negative cases covered: two `in_progress` → rejected, missing
`activeForm` → rejected, non-list `todos` → rejected.
2026-04-22 17:28:15 +07:00
Zamil Majdy
33a608ec78 feat(platform/copilot): live baseline streaming + render flag + Sonar web_search + simulator cost tracking + reconnect fixes (#12873)
### Why / What / How

**Why.** Three problems on the baseline copilot path that compound:
extended-thinking turns froze the UI for minutes because Kimi K2.6
events were buffered in `state.pending_events: list` until the full
`tool_call_loop` iteration finished (reasoning arrived in one lump at
the end); the SSE stream replayed 1000 events on every reconnect and the
frontend opened multiple SSE streams in quick succession on tab-focus
thrash (reconnect storm → UI flickers, tab freezes); the `web_search`
tool hit Anthropic's server-side beta directly via a dispatch-model
round-trip that fed entire page contents back through the model for a
second inference pass (observed $0.072 on a 74K-token call); and the
simulator dry-run path ran on Gemini Flash without any cost tracking at
all, so every dry-run was free on the platform's microdollar ledger.

**What.** Grouped deltas, all targeting reliability, cost, and UX of the
copilot live-answer pipeline:

- **Live per-token baseline streaming.** `state.pending_events` is now
an `asyncio.Queue` drained concurrently by the outer async generator.
The tool-call loop runs as a background task; reasoning / text / tool
events reach the SSE wire during the upstream OpenRouter stream, not
after it. `None` is the close sentinel; inner-task exceptions are
re-raised via `await loop_task` once the sentinel arrives. An
`emitted_events: list` mirror preserves post-hoc test inspection.
Coalescing widened 32/40 → 64/50 ms to halve the React re-render rate on
extended-thinking turns while staying under the ~100 ms perceptual
threshold.
- **Reasoning render flag** — `ChatConfig.render_reasoning_in_ui: bool =
True` wired through both `BaselineReasoningEmitter` and
`SDKResponseAdapter`. When False the wire `StreamReasoning*` events are
suppressed while the persisted `ChatMessage(role='reasoning')` rows
always survive (decoupled from the render flag so audit/replay is
unaffected); the service-layer yield filter does the gating. Tokens are
still billed upstream; operator kill-switch for UI-level flicker
investigations.
- **Reconnect storm mitigations** — `ChatConfig.stream_replay_count: int
= 200` (was hard-coded 1000) caps `stream_registry.subscribe_to_session`
XREAD size. Frontend `useCopilotStream::handleReconnect` adds a 1500 ms
debounce via `lastReconnectResumeAtRef`, so tab-focus thrash doesn't fan
out into 5–6 parallel replays in the same second.
- **web_search rewritten to Perplexity Sonar via OpenRouter** — single
unified credential, real `usage.cost` flows through
`persist_and_record_usage(provider='open_router')`. Two tiers via a
`deep` param: `perplexity/sonar` (~$0.005/call quick) and
`perplexity/sonar-deep-research` (~$0.50–$1.30/call multi-step
research). Replaces the Anthropic-native + server-tool dispatches; drops
the hardcoded pricing constants entirely.
- **Synthesised answer surfaced end-to-end** — Sonar already writes a
web-grounded answer on the same call we pay for; the new
`WebSearchResponse.answer` field passes it through and the accordion UI
renders it above citations so the agent doesn't re-fetch URLs that are
usually bot-protected anyway.
- **Deep-tier cost warning + UI affordances** — `deep` param description
is explicit that it's ~100× pricier; UI labels read "Researching /
Researched / N research sources" when `deep=true` so users know what's
running.
- **Simulator cost tracking + cheaper default** —
`google/gemini-2.5-flash` → `google/gemini-2.5-flash-lite` (3× cheaper
tokens) and every dry-run now hits
`persist_and_record_usage(provider='open_router')` with real
`usage.cost`. Previously each sim was free against the user's
microdollar budget.
- **Typed access everywhere** — cost extractors now use
`openai.types.CompletionUsage.model_extra["cost"]` and
`openai.types.chat.ChatCompletion` / `Annotation` /
`AnnotationURLCitation` with no `getattr` / duck typing. Mirrors the
baseline service's `_extract_usage_cost` pattern; keep in sync.

**How.** Key file touches:

1. `copilot/config.py` — `render_reasoning_in_ui`,
`stream_replay_count`, `simulation_model` default.
2. `copilot/baseline/service.py` — `_BaselineStreamState.pending_events:
asyncio.Queue`, `_emit` / `_emit_all` helpers, outer generator runs
`tool_call_loop` as a background task + yields from queue concurrently.
3. `copilot/baseline/reasoning.py` —
`BaselineReasoningEmitter(render_in_ui=...)`, coalescing bumped to 64
chars / 50 ms.
4. `copilot/sdk/service.py` — `state.adapter.render_reasoning_in_ui`
threaded through every adapter construction.
5. `copilot/sdk/response_adapter.py` — `render_reasoning_in_ui` wiring +
service-layer yield filter gating for wire suppression while persistence
stays intact.
6. `copilot/stream_registry.py` — `count=config.stream_replay_count`.
7. `frontend/.../useCopilotStream.ts::handleReconnect` — 1500 ms
debounce.
8. `copilot/tools/web_search.py` + `models.py` — Sonar quick/deep paths,
`WebSearchResponse.answer` + typed extractors.
9. `frontend/.../GenericTool/*` — `answer` render + deep-aware labels /
accordion titles.
10. `executor/simulator.py` + `executor/manager.py` +
`copilot/config.py` — cost tracking + model swap + `user_id` threading.

### Changes

- `copilot/config.py` — new `render_reasoning_in_ui`,
`stream_replay_count`; `simulation_model` default flipped to Flash-Lite.
- `copilot/baseline/service.py` — `pending_events: asyncio.Queue`
refactor; outer gen runs loop as task, yields from queue live.
- `copilot/baseline/reasoning.py` —
`BaselineReasoningEmitter(render_in_ui=...)` + 64/50 coalesce.
- `copilot/sdk/service.py` + `response_adapter.py` —
`render_reasoning_in_ui` wire suppression (persistence preserved).
- `copilot/stream_registry.py` — replay cap from config.
- `copilot/tools/web_search.py` + `models.py` — Sonar quick/deep +
`answer` field + typed extractors.
- `copilot/tools/helpers.py` — tool description tightens `deep=true`
cost warning.
- `frontend/.../useCopilotStream.ts` — reconnect debounce.
- `frontend/.../GenericTool/GenericTool.tsx` + `helpers.ts` + tests —
render `answer`, deep-aware verbs / titles.
- `executor/simulator.py` + `simulator_test.py` + `executor/manager.py`
— cost tracking + model swap + user_id plumbing.

### Follow-up (deferred to a separate PR)

SDK per-token streaming via `include_partial_messages=True` was
attempted (commits `599e83543` + `530fa8f95`) and reverted here. The
two-signal model (StreamEvent partial deltas + AssistantMessage summary)
needs proper per-block diff tracking — when the partial stream delivers
a subset of the final block content, emit only
`summary.text[len(already_emitted):]` from the summary rather than
gating on a binary flag. Binary gating truncated replies in the field
when the partial stream delivered less than the summary (observed: "The
analysis template you" cut off mid-sentence because partial had streamed
that much and the rest only lived in the summary). SDK reasoning still
renders end-of-phase (as today); this PR's baseline per-token streaming
is unaffected.

### Checklist

For code changes:
- [x] Changes listed above
- [x] Test plan below
- [x] Tested according to the test plan:
- [x] `poetry run pytest backend/copilot/baseline/ backend/copilot/sdk/
backend/copilot/tools/web_search_test.py
backend/executor/simulator_test.py` — all pass (155 baseline + 927 SDK +
web_search + simulator)
- [x] `pnpm types && pnpm vitest run
src/app/(platform)/copilot/tools/GenericTool/` — pass
- [x] Manual: baseline live-streaming — Kimi K2.6 reasoning arrives
token-by-token, coalesced (no end-of-stream burst).
- [x] Manual: quick web_search via copilot UI — ~$0.005/call, answer +
citations rendered, cost logged as `provider=open_router`.
- [x] Manual: deep web_search — dispatched only on explicit research
phrasing; `sonar-deep-research` billed, UI labels say "Researched" / "N
research sources".
- [x] Manual: simulator dry-run — Gemini Flash-Lite, `[simulator] Turn
usage` log entry, PlatformCostLog row visible.
- [x] Manual: reconnect debounce — tab-focus thrash no longer produces
parallel XREADs in backend log.
- [ ] Manual: `CHAT_RENDER_REASONING_IN_UI=false` smoke-check —
reasoning collapse absent, no persisted reasoning row on reload.

For configuration changes:
- [x] `.env.default` — new config knobs fall back to pydantic defaults;
existing `CHAT_MODEL`/`CHAT_FAST_MODEL`/`CHAT_ADVANCED_MODEL` legacy
envs still honored upstream (unchanged by this PR).

### Companion PR

PR #12876 closes the `run_block`-via-copilot cost-leak gap (registers
`PerplexityBlock` / `FactCheckerBlock` in `BLOCK_COSTS`; documents the
credit/microdollar wallet boundary). Separate because the credit-wallet
side is orthogonal to the copilot microdollar / rate-limit surface this
PR ships.
2026-04-22 13:52:18 +07:00
Zamil Majdy
e3f6d36759 feat(backend/blocks): register 13 paid blocks + document credit/microdollar wallet boundary (#12876)
### Why / What / How

**Why.** Audit of `BLOCK_COSTS` against `credentials_store.py` system
credentials revealed **13 paid blocks** running for free from the credit
wallet's perspective — `BLOCK_COSTS.get(type(block))` returned `None`,
`cost = 0`, no `spend_credits` deduction. Users without their own API
key consumed system credentials with zero credit drain. Separately, the
credit wallet (user-facing prepaid balance) and the copilot microdollar
counter (operator-side meter that gates `daily_cost_limit_microdollars`)
were never documented as separate systems, so future readers kept
tripping on the "why isn't this block charging my limit?" question.

**What.** Three deltas, all credit-wallet-side:

- **Register the 13 paid blocks in `BLOCK_COSTS`** with reasonable
per-call credit prices (1 credit = $0.01). Pricing researched against
the providers' published rates with ~2-3x markup.
- **Document the credit/microdollar boundary** in
`copilot/rate_limit.py`: credits = user-facing prepaid wallet with
marketplace-creator charging; microdollars = operator-side meter that
only ticks on copilot LLM turns (baseline / SDK / web_search /
simulator). Block execution bills credits, not microdollars — explicit
contract.
- **Populate `provider_cost`** on PerplexityBlock so PlatformCostLog
rows carry the real OpenRouter `x-total-cost` value via the existing
`executor/cost_tracking.log_system_credential_cost` path (separate flow
from credit deduction).

### Block costs registered

| Provider | Block | Credits | Raw cost / markup |
|---|---|---|---|
| Perplexity (OpenRouter) | PerplexityBlock — Sonar | 1 | $0.001-0.005 /
call |
| | PerplexityBlock — Sonar Pro | 5 | $0.025 / call |
| | PerplexityBlock — Sonar Deep Research | 10 | up to $0.05 / call |
| Jina | FactCheckerBlock | 1 | $0.005 / call |
| Mem0 | AddMemoryBlock | 1 | $0.0004 / call (1c floor) |
| | SearchMemoryBlock | 1 | $0.004 / call |
| | GetAllMemoriesBlock | 1 | $0.004 / call |
| | GetLatestMemoryBlock | 1 | $0.004 / call |
| ScreenshotOne | ScreenshotWebPageBlock | 2 | $0.0085 / call (2.4x) |
| Nvidia | NvidiaDeepfakeDetectBlock | 2 | est $0.005 (no public SKU) |
| Smartlead | CreateCampaignBlock | 2 | $0.0065 send-equivalent (3x) |
| | AddLeadToCampaignBlock | 1 | $0.0065 (1.5x) |
| | SaveCampaignSequencesBlock | 1 | config-only |
| ZeroBounce | ValidateEmailsBlock | 2 | $0.008 / email (2.5x) |
| E2B + Anthropic | ClaudeCodeBlock | **100** | $0.50-$2 / typical
session (E2B sandbox + in-sandbox Claude) |

**Not in scope** — already covered via the SDK
`ProviderBuilder.with_base_cost()` pattern in their respective
`_config.py`: Exa, Linear, Airtable, Bannerbear, Wolfram, Firecrawl,
Wordpress, Baas, Stagehand, Dataforseo.

### How

1. `backend/data/block_cost_config.py` — 13 new `BlockCost` entries (3
Perplexity models + Fact Checker + 11 from this round).
2. `backend/copilot/rate_limit.py` — boundary docstring.
3. `backend/blocks/perplexity.py` — populate
`NodeExecutionStats.provider_cost` so PlatformCostLog rows carry the
real OpenRouter `x-total-cost` value.
4. Tests — `TestUnregisteredBlockRunsFree` regression +
`TestNewlyRegisteredBlockCosts` pinning every new entry by `cost_amount`
so a future refactor can't quietly drop one.

The companion Notion "Platform System Credentials" database has been
updated with a new `Platform Credit Cost` column populated across all 30
provider rows.

### Scope trim

An earlier revision piped block execution cost into the **copilot
microdollar counter** via `_record_block_microdollar_cost` in
`copilot/tools/helpers.py::execute_block`. That was reverted in
`16ae0f7b5` — the microdollar counter stays scoped to copilot LLM turns
only, credit wallet handles block execution. The pipe-through crossed a
boundary we explicitly want to keep.

### Changes

- `backend/data/block_cost_config.py` — 13 × `BlockCost` entries across
7 providers.
- `backend/blocks/perplexity.py` — populate `provider_cost` on the
execution stats (feeds PlatformCostLog).
- `backend/copilot/rate_limit.py` — boundary docstring only (no
behaviour change).
- `backend/copilot/tools/helpers_test.py` —
`TestUnregisteredBlockRunsFree` + `TestNewlyRegisteredBlockCosts` (8 new
regression tests).
- `backend/blocks/block_cost_tracking_test.py` — provider-cost
extraction pins.

### Checklist

For code changes:
- [x] Changes listed above
- [x] Test plan below
- [x] Tested according to the test plan:
- [x] `poetry run pytest backend/copilot/tools/helpers_test.py
backend/copilot/tools/run_block_test.py
backend/copilot/tools/continue_run_block_test.py
backend/blocks/block_cost_tracking_test.py
backend/blocks/test/test_perplexity.py` — passes
- [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py
backend/copilot/rate_limit_test.py
backend/copilot/token_tracking_test.py` — passes (confirms docstring
edits didn't regress the LLM-turn microdollar path)
  - [x] Pyright clean on all touched files
- [ ] Manual: run PerplexityBlock via copilot `run_block` — credits
deduct, PlatformCostLog row visible with `provider_cost`, no
microdollar-counter tick.
- [ ] Manual: run an unregistered block via copilot — no error, no
credit drain, no silent billing.
- [ ] Manual: run ClaudeCodeBlock via builder — 100 credits deducted
from wallet.

### Companion PR

PR #12873 ships the copilot microdollar / rate-limit work (web_search
cost, simulator cost, reasoning / reconnect fixes). This PR is
credit-wallet only.
2026-04-22 12:03:02 +07:00
Nicholas Tindle
c1b9ed1f5e fix(backend/copilot): allow multiple compactions per turn (#12834)
### Why / What / How

**Why:** The old `CompactionTracker` set a `_done` flag after the first
completion and short-circuited every subsequent compaction in the same
turn. That blocked the SDK-internal compaction from running after a
pre-query compaction had already fired, so prompt-too-long errors
couldn't actually recover — retries saw the flag, bailed, and we re-hit
the context limit.

**What:** Drop the `_done` flag, track attempts and completions as
separate lists, and expose counters + an observability metadata builder
so callers can record compaction activity per turn.

**How:**
- Remove `_done` and `_compact_start` short-circuits.
- Track `_attempted_sources` / `_completed_sources` /
`_completed_count`.
- Expose `attempt_count`, `completed_count`, and
`get_observability_metadata()` / `get_log_summary()` for downstream
instrumentation (no caller change required in this PR).

### Changes 🏗️

- `backend/copilot/sdk/compaction.py` — rewritten `CompactionTracker`
internals; adds properties + observability helpers.
- `backend/copilot/sdk/compaction_test.py` — tests for multi-compaction
flow + new counters.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] `poetry run pytest backend/copilot/sdk/compaction_test.py -xvs`
passes
- [ ] Local chat that hits prompt-too-long now recovers via SDK
compaction instead of failing the turn

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes core streaming compaction state transitions and persistence
timing, which could affect UI event sequencing or compaction completion
behavior under concurrency; coverage is improved with new
multi-compaction tests.
> 
> **Overview**
> Fixes `CompactionTracker` so compaction is no longer single-shot per
turn: removes the `_done`/event-gate behavior, queues multiple
`on_compact()` hook firings via a pending transcript-path deque, and
allows subsequent SDK-internal compactions after a pre-query compaction
within the same query.
> 
> Adds lightweight instrumentation by tracking attempt/completion
sources and counts, plus `get_observability_metadata()` and
`get_log_summary()` (including source summaries like `sdk_internal:2`).
Updates/expands tests to cover multi-compaction flows, transcript-path
handling, and the new counters/metadata.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
9bf8cdd367. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-22 02:02:03 +00:00
Zamil Majdy
45bc167184 feat(backend/copilot): Kimi K2.6 fast default + 4-config matrix + coalesced reasoning + web_search tool (#12871)
### Why / What / How

**Why.** Three unrelated but interlocking problems on the baseline
(OpenRouter) copilot path, all blocking us from making Kimi K2.6 the
default fast model:

1. **Cost / capability gap on the default.** Kimi K2.6 prices at $0.60 /
$2.80 per MTok — ~5x cheaper input and ~5.4x cheaper output than Sonnet
4.6 — while tying Opus on SWE-Bench Verified (80.2% vs 80.8%) and
beating it on SWE-Bench Pro (58.6% vs 53.4%). OpenRouter exposes the
same `reasoning` / `include_reasoning` extension on Moonshot endpoints
that #12870 plumbed for Anthropic, so the reasoning collapse lights up
end-to-end without per-provider code.
2. **Kimi reasoning deltas freeze the UI.** K2.6 emits ~4,700
reasoning-delta SSE events per turn vs ~28 on Sonnet — the AI SDK v6
Reasoning UIMessagePart can't keep up and the tab locks. Needs a
coalescing buffer upstream.
3. **Kimi loops on `require_guide_read`.** The guide-guard checks
`session.messages` for a prior `agent_building_guide` call, but tool
calls aren't flushed to `session.messages` until the end of the turn —
mid-turn the check keeps returning False and Kimi calls the guide-load
tool repeatedly in the same turn. Needs an in-flight tracker that lives
on `ChatSession`.
4. **No `web_search` tool on either path.** Kimi doesn't have a native
web-search equivalent and the SDK path's native `WebSearch` (the Claude
Code CLI's built-in) doesn't carry cost accounting. We need one
implementation that both paths share and that reports cost through the
same tracker as every other tool call.

**What.** Five grouped deltas on the baseline service, tool layer, and
config:

- **Kimi K2.6 default.** `fast_standard_model` defaults to
`moonshotai/kimi-k2.6`. Full 2×2 model matrix below. Rollback is one env
var.
- **4-config model matrix.** `fast_standard_model` /
`fast_advanced_model` / `thinking_standard_model` /
`thinking_advanced_model`. Each cell independent so baseline can run a
cheap provider at the standard tier without leaking into the SDK path
(which is Anthropic-only by CLI contract). Legacy env vars
(`CHAT_MODEL`, `CHAT_FAST_MODEL`, `CHAT_ADVANCED_MODEL`) stay aliased
via `validation_alias` so live deployments keep resolving to the same
effective cell.
- **Reasoning delta coalescing.** `BaselineReasoningEmitter` buffers
deltas and flushes on a char-count OR time-interval threshold (32 chars
/ 40 ms). ~4,700 → ~150 SSE events per turn on Kimi; no perceptible
change on Sonnet (which was already well under the threshold).
- **In-flight tool-call tracker.** `ChatSession._inflight_tool_calls`
PrivateAttr is populated when a tool-call block is emitted and cleared
at turn end. `session.has_tool_been_called_this_turn(name)` now returns
True mid-turn, not just after the tool-result lands in
`session.messages` — which is what `require_guide_read` needs to cut the
loop.
- **New `web_search` copilot tool.** Wraps Anthropic's server-side
`web_search_20250305` beta via `AsyncAnthropic` (direct — OpenRouter
can't proxy server-side tool execution). Dispatches through
`claude-haiku-4-5` with `max_uses=1`. Cost estimated from published
rates ($0.010 per search + Haiku tokens) since the Anthropic Messages
API doesn't report cost on the response; reported to
`persist_and_record_usage(provider='anthropic')` on both paths. SDK
native `WebSearch` moved from `_SDK_BUILTIN_ALWAYS` into
`SDK_DISALLOWED_TOOLS` so both paths now dispatch through
`mcp__copilot__web_search`.

**How.**

1. `copilot/config.py` — 2×2 model fields with `AliasChoices` preserving
legacy env var names. `populate_by_name = True` so
`ChatConfig(fast_standard_model=...)` works in tests.
2. `copilot/baseline/service.py::_resolve_baseline_model` — resolves the
active baseline cell from `mode` + `tier`, no longer delegates to the
SDK resolver.
3. `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter` gains
`_pending_delta` / `_last_flush_monotonic` and flushes on
`len(_pending_delta) >= _COALESCE_MIN_CHARS` OR `monotonic() -
_last_flush_monotonic >= _COALESCE_MAX_INTERVAL_MS / 1000`.
`_is_reasoning_route` rewritten as an anchored prefix match covering
`anthropic/`, `anthropic.`, `moonshotai/`, and `openrouter/kimi-` —
split from the narrower `_is_anthropic_model` gate that still governs
`cache_control` markers (which Kimi doesn't support).
4. `copilot/model.py::ChatSession` — `_inflight_tool_calls: set[str] =
PrivateAttr(default_factory=set)` plus `announce_inflight_tool_call` /
`clear_inflight_tool_calls` / `has_tool_been_called_this_turn`.
5. `copilot/tools/helpers.py::require_guide_read` — check
`session.has_tool_been_called_this_turn(_AGENT_GUIDE_TOOL_NAME)` before
falling back to scanning `session.messages`.
6. `copilot/tools/web_search.py` — new `WebSearchTool` +
`_extract_results` + `_estimate_cost_usd`. `is_available` gated on
`Settings().secrets.anthropic_api_key` so the deployment can roll back
just by unsetting the key.
7. `copilot/tools/__init__.py` — registers `web_search` in
`TOOL_REGISTRY` so it becomes `mcp__copilot__web_search` in the SDK
path.
8. `copilot/sdk/tool_adapter.py` — `WebSearch` moves to
`SDK_DISALLOWED_TOOLS`.

### Changes

- `copilot/config.py` — 2×2 model matrix with legacy env alias
preservation; `populate_by_name=True`.
- `copilot/baseline/service.py::_resolve_baseline_model` — resolves
against the new matrix.
- `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter`
coalescing buffer; `_is_reasoning_route` rewritten as anchored prefix
match (covers `anthropic/`, `anthropic.`, `moonshotai/`,
`openrouter/kimi-`).
- `copilot/model.py::ChatSession` — `_inflight_tool_calls` PrivateAttr +
helpers.
- `copilot/baseline/service.py::_baseline_tool_executor` — calls
`announce_inflight_tool_call` after emitting `StreamToolInputAvailable`;
`clear_inflight_tool_calls` in the outer `finally` before persist.
- `copilot/tools/helpers.py::require_guide_read` — reads the new tracker
first.
- `copilot/tools/web_search.py` (new) — Anthropic `web_search_20250305`
wrapper + cost estimator.
- `copilot/tools/web_search_test.py` (new) — extractor / cost / dispatch
/ registry tests (12 total).
- `copilot/tools/models.py` — `WebSearchResponse` + `WebSearchResult` +
`ResponseType.WEB_SEARCH`.
- `copilot/tools/__init__.py` — registers `web_search`.
- `copilot/sdk/tool_adapter.py` — moves native `WebSearch` to
`SDK_DISALLOWED_TOOLS`.

### Checklist

For code changes:
- [x] Changes listed above
- [x] Test plan below
- [ ] Tested according to the test plan:
  - [x] `poetry run pytest backend/copilot/baseline/` — all pass
- [x] `poetry run pytest backend/copilot/sdk/` — all pass (SDK resolver
untouched)
- [x] `poetry run pytest backend/copilot/tools/web_search_test.py` — 12
pass
- [ ] Manual: send a multi-step prompt on fast mode with default config;
confirm backend routes to `moonshotai/kimi-k2.6`, SSE stream carries
`reasoning-start/delta/end` (coalesced), Reasoning collapse renders +
survives hard reload.
- [ ] Manual: 43-tool payload reliability on Kimi — watch for malformed
tool-call JSON or wrong-tool selection.
- [ ] Manual: `CHAT_FAST_STANDARD_MODEL=anthropic/claude-sonnet-4-6`
restarts confirm Sonnet routing (rollback path works).
- [ ] Manual: SDK path (`CHAT_USE_CLAUDE_AGENT_SDK=true`) still selects
the SDK service and uses `thinking_standard_model` = Sonnet (no Kimi
leaked into extended thinking).
- [ ] Manual: prompt that forces `web_search` — confirm results render,
`persist_and_record_usage(provider='anthropic')` runs, cost lands in the
per-user ledger.
- [ ] Manual: ask Kimi a question that would require
`agent_building_guide` — confirm the guide loads exactly once per turn
(no loop).

For configuration changes:
- [x] `.env.default` — all four model fields fall back to the pydantic
defaults; legacy `CHAT_MODEL` / `CHAT_FAST_MODEL` /
`CHAT_ADVANCED_MODEL` remain honored via `AliasChoices`.
2026-04-22 08:47:08 +07:00
Nicholas Tindle
e4f291e54b feat(frontend): add AutoGPT logo to share page and zip download for outputs (#11741)
### Why / What / How

**Why:** The share page was unbranded (no logo/navigation) and images
from workspace files couldn't render because the proxy didn't handle
public share URLs. Zip downloads also had several gaps — no size limits,
no workspace file support, silent failures on data URLs, and single
files got wrapped in unnecessary zips.

**What:** Adds AutoGPT branding to the share page, secure public access
to workspace files via a SharedExecutionFile allowlist, and a hardened
zip download module.

**How:** Backend scans execution outputs for `workspace://` URIs on
share-enable and persists an allowlist in a new `SharedExecutionFile`
table. A new unauthenticated endpoint serves files validated against
this allowlist. Frontend proxy routing is extended (with UUID
validation) to handle the 7-segment public share download path as a
binary response. Download logic is consolidated into a shared module
with size limits, parallel fetches, filename sanitization, and
single-file direct download.

### Changes 🏗️

**Share page branding:**
- AutoGPT logo header centered at top, linking to `/`
- Dark/light mode variants with correct `priority` on visible variant
only

**Secure public workspace file access (backend):**
- New `SharedExecutionFile` Prisma model with `@@unique([shareToken,
fileId])` constraint
- `_extract_workspace_file_ids()` scans outputs for `workspace://` URIs
(handles nested dicts/lists)
- `create_shared_execution_files()` / `delete_shared_execution_files()`
manage allowlist lifecycle
- Re-share cleans up stale records before creating new ones (prevents
old token access)
- `GET /public/shared/{token}/files/{id}/download` — validates against
allowlist, uniform 404 for all failures
- `Content-Disposition: inline` for share page rendering
- Hand-written Prisma migration
(`20260417000000_add_shared_execution_file`)

**Frontend proxy fix:**
- `isWorkspaceDownloadRequest` extended to match public share path
(7-segment)
- UUID format validation on dynamic path segments (file IDs, share
tokens)
- 30+ adversarial security tests: path traversal, SQL injection, SSRF
payloads, unicode homoglyphs, null bytes, prototype pollution, etc.

**Download module (`download-outputs.ts`):**
- Consolidated from two divergent copies into single shared module
- `fetchFileAsBlob` with content-length pre-check before buffering
- `sanitizeFilename` strips path traversal, leading dots, falls back to
"file"
- `getUniqueFilename` deduplicates with counter suffix
- `fetchInParallel` with configurable concurrency (5)
- 50 MB per-file limit, 200 MB aggregate limit
- Data URL try-catch, relative URL support (`/api/proxy/...`)
- Single-file downloads skip zip, go directly to browser download
- Dynamic JSZip import for bundle optimization
- 26 unit tests

**Share page file rendering:**
- `WorkspaceFileRenderer` builds public share URLs when `shareToken` is
in metadata
- `RunOutputs` propagates `shareToken` to renderer metadata

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Share page renders with centered AutoGPT logo
  - [x] Logo links to `/` and shows correct dark/light variant
  - [x] Workspace images render inline on share page
  - [x] Download all produces zip with workspace images included
  - [x] Single-file download skips zip, downloads directly
- [x] Re-sharing generates new token and cleans up old allowlist records
  - [x] Public file download returns 404 for files not in allowlist
  - [x] All frontend tests pass (122 tests across 3 suites)
  - [x] Backend formatter + pyright pass
  - [x] Frontend format + lint + types pass

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

> Note: New Prisma migration required. No env/docker changes needed.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds a new unauthenticated file download path gated by a database
allowlist plus a new Prisma model/migration; mistakes here could expose
workspace files or break sharing. Frontend download behavior also
changes significantly (zipping/fetching), which could impact
large-output performance and edge cases.
> 
> **Overview**
> Enables **public rendering and downloading of workspace files on
shared execution pages** by introducing a `SharedExecutionFile`
allowlist tied to the share token and populating it when sharing is
enabled (and clearing it on disable/re-share).
> 
> Adds `GET /public/shared/{share_token}/files/{file_id}/download` (no
auth) that validates the requested file against the allowlist and
returns a uniform 404 on failure; workspace download responses now
support `inline` `Content-Disposition` via the exported
`create_file_download_response` helper.
> 
> Frontend updates the share page to pass `shareToken` into output
renderers so `WorkspaceFileRenderer` can build public-share download
URLs; the proxy matcher is extended/strictly UUID-validated for both
workspace and public-share download paths with extensive adversarial
tests. Output downloading is consolidated into `download-outputs.ts`
using dynamic `jszip` import, filename sanitization/deduping,
concurrency + size limits, and a single-file non-zip fast path.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
e2f5bd9b5a. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
2026-04-21 16:26:37 +00:00
Bently
6efbc59fd8 feat(backend): platform server linking API for multi-platform CoPilot (#12615)
## Why
AutoPilot (CoPilot) needs to reach users across chat platforms — Discord
first, Telegram / Slack / Teams / WhatsApp next. To make usage and
billing coherent, every conversation resolves to one AutoGPT account.
There are two independent linking flows:

- **SERVER links**: the first person to claim a server (Discord guild,
Telegram group, …) becomes its owner. Anyone in the server can chat with
the bot; all usage bills to the owner.
- **USER links**: an individual links their 1:1 DMs with the bot to
their own AutoGPT account. Independent from server links — a server
owner still has to link their DMs separately.

## What
Backend for platform linking, split cleanly by trust boundary:

- **Bot-facing operations** run over cluster-internal RPC via a new
`PlatformLinkingManager(AppService)`. No shared bearer token; trust is
the cluster network itself.
- **User-facing operations** stay on REST under JWT auth (the same
pattern as every other feature).

### REST endpoints (JWT auth)

- `GET /api/platform-linking/tokens/{token}/info` — non-sensitive
display info for the link page
- `POST /api/platform-linking/tokens/{token}/confirm` — confirm a SERVER
link
- `POST /api/platform-linking/user-tokens/{token}/confirm` — confirm a
USER link
- `GET /api/platform-linking/links` / `DELETE /links/{id}` — manage
server links
- `GET /api/platform-linking/user-links` / `DELETE /user-links/{id}` —
manage DM links

### `PlatformLinkingManager` `@expose` methods (internal RPC)

- `resolve_server_link(platform, platform_server_id) -> ResolveResponse`
- `resolve_user_link(platform, platform_user_id) -> ResolveResponse`
- `create_server_link_token(req) -> LinkTokenResponse`
- `create_user_link_token(req) -> LinkTokenResponse`
- `get_link_token_status(token) -> LinkTokenStatusResponse`
- `start_chat_turn(req) -> ChatTurnHandle` — resolves the owner,
persists the user message, creates the stream-registry session, enqueues
the turn; returns `(session_id, turn_id, user_id, subscribe_from="0-0")`
so the caller subscribes directly to the per-turn Redis stream.

### New DB models
- `PlatformLink` — `(platform, platformServerId)` → owner's AutoGPT
`userId`
- `PlatformUserLink` — `(platform, platformUserId)` → AutoGPT `userId`
(for DMs)
- `PlatformLinkToken` — one-time token with `linkType` discriminator
(SERVER | USER) and 30-min TTL

## How

- **New `backend/platform_linking/` package**: `models.py` (Pydantic
types), `links.py` (link CRUD helpers — pure business logic), `chat.py`
(`start_chat_turn` orchestration), `manager.py`
(`PlatformLinkingManager(AppService)` + `PlatformLinkingManagerClient`).
Pattern matches `backend/notifications/` + `backend/data/db_manager.py`.
- **Exception translation at the edge**. Helpers raise domain exceptions
(`NotFoundError`, `LinkAlreadyExistsError`, `LinkTokenExpiredError`,
`LinkFlowMismatchError`, `NotAuthorizedError` — all `ValueError`
subclasses in `backend.util.exceptions` so they auto-register with the
RPC exception-mapping). REST routes translate to HTTP codes via a 7-line
`_translate()` helper.
- **Independent scopes, no DM fallback**. `find_server_link()` and
`find_user_link()` each query their own table. A user who owns a linked
server does not leak that identity into their DMs.
- **Race-safe token consumption**. Confirm paths do atomic `update_many`
with `usedAt = None` + `expiresAt > now` in the WHERE clause;
`create_*_token` invalidates pending tokens before issuing a new one.
- **Bug fix**: `start_chat_turn` persists the user message via
`append_and_save_message` before enqueueing the executor turn — mirrors
`backend/api/features/chat/routes.py`. The previous `chat_proxy.py`
skipped this and ran the executor with no user message in history.
- **Streaming**. Copilot streaming lives on Redis Streams (persistent,
replayable). The bot subscribes directly with `subscribe_from="0-0"`, so
late subscribers replay the full stream; no HTTP SSE proxy needed.
- **No PII in logs**: logs reference `session_id`, `turn_id`,
`server_id`, and AutoGPT `user_id` (last 8 chars), but never raw
platform user IDs.
- **New pod**. `PlatformLinkingManager` runs as its own `AppProcess` on
port `8009`; client via `get_platform_linking_manager_client()`. The
infra chart lands in
[cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310).

## Tests
- **Models** (`models_test.py`) — Platform / LinkType enums, request
validation (CreateLinkToken / ResolveServer / BotChat), response
schemas.
- **Helpers** (`links_test.py`) — resolve, token create (both flows, 409
on already-linked), token status (pending / linked / expired /
superseded-with-no-link), token info (404 / 410), confirm (404 / wrong
flow / already used / expired / same-user / other-user), delete authz.
- **AppService wiring** (`manager_test.py`) — `@expose` methods delegate
to helpers; client surface covers bot-facing ops and excludes
user-facing ones.
- **Adversarial** (`manager_test.py`, `routes_test.py`):
- `asyncio.gather` double-confirm with same user and with two different
users — exactly one winner, other gets clean `LinkTokenExpiredError`, no
double `PlatformLink.create`.
  - Server- and user-link confirm races.
- `TokenPath` regex guard: rejects `%24`, URL-encoded path traversal,
>64 chars; accepts `secrets.token_urlsafe` shape.
- DELETE `link_id` with SQL-injection-style and path-traversal inputs
returns 404 via `NotFoundError`.

## Stack
- #12618 — bot service (rebased onto this so it can consume
`PlatformLinkingManagerClient`)
- #12624 — `/link/{token}` frontend page
-
[cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310)
— Helm chart for `copilot-bot` + new `platform-linking-manager`

Merge order: this → #12618#12624, infra whenever.

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
2026-04-21 16:01:03 +00:00
Nicholas Tindle
6924cf90a5 fix(frontend/copilot): artifact panel fixes (SECRT-2254/2223/2220/2255/2224/2256/2221) (#12856)
### Why / What / How


https://github.com/user-attachments/assets/ca26e0b0-d35d-4a5b-b95f-2421b9907742


**Why** — The Artifact & Side Task List project
(https://linear.app/autogpt/project/artifact-and-side-task-list-ef863c93da3c)
accumulated seven related bugs in the copilot artifact panel. The user
kept seeing panels stuck open, previews broken, clicks not registering —
each ticket was small but they all lived in the same small surface area,
so one review pass is easier than five.

Closes SECRT-2254, SECRT-2223, SECRT-2220, SECRT-2255, SECRT-2224,
SECRT-2256, SECRT-2221.

**What** — Five independent fixes, each in its own commit, shipped
together:

1. **Fragment-link interceptor + render error boundary** (SECRT-2255
crash when clicking `<a href="#x">` in HTML artifacts). Sandboxed srcdoc
iframes resolve fragment links against the parent's URL, so clicking
`#activation` in a Plotly TOC tried to navigate the copilot page into
the iframe. Inject a click-capture script into every artifact iframe;
also wrap the renderer in `ArtifactErrorBoundary` so any future render
throw surfaces with a copyable error instead of a blank panel.
2. **Close panel on copilot page unmount** (SECRT-2254 / 2223 / 2220 —
panel stays open, reopens on unrelated navigation, opens by default on
session switch). The Zustand store outlived page unmounts, so `isOpen:
true` survived `/profile` → `/home` → back. One `useEffect` cleanup in
`useAutoOpenArtifacts` calls `resetArtifactPanel()` on unmount.
3. **Sync loading flip on Try Again** (SECRT-2224 "try again doesn't do
anything"). Retry was correct but the loading-state flip was deferred to
an effect, so a retry that re-failed was visually indistinguishable from
a no-op. `retry()` now sets `isLoading: true` / `error: null`
synchronously with the click so the skeleton flashes every time.
4. **Pointer capture on resize drag** (SECRT-2256 "can't drag right when
expanded far left, click doesn't stop it"). The sandboxed iframe was
eating `pointermove`/`pointerup` events when the cursor drifted over it,
freezing the drag and never delivering the release. `setPointerCapture`
on the handle routes all subsequent pointer events through it regardless
of what's under the cursor.
5. **Stop size-gating natively-rendered artifacts + cache-bust retry**
(SECRT-2221 "broken hi-res PNG preview"). The blanket >10 MB size gate
pushed large images / videos / PDFs into `download-only`, so clicking a
hi-res PNG offered a download instead of a preview. Split the gate so it
only applies to content we actually render in JS (text/html/code/etc).
Image and video retries also append a cache-bust query so the browser
can't silently reuse a negative-cached failure.

**How** — Five commits, one concern each, preserved in the order they
were written. Every fix lands with a regression test that fails on the
unfixed code and passes after.

### Changes 🏗️

- `iframe-sandbox-csp.ts` + usage sites —
`FRAGMENT_LINK_INTERCEPTOR_SCRIPT` injected into all three srcdoc iframe
templates (HTML artifact, inline HTMLRenderer, React artifact).
- `ArtifactErrorBoundary.tsx` (new) — class error boundary local to the
artifact panel with a copyable error fallback.
- `useAutoOpenArtifacts.ts` — unmount cleanup calls
`resetArtifactPanel()`.
- `useArtifactContent.ts` — `retry()` flips loading state synchronously.
- `ArtifactDragHandle.tsx` — `setPointerCapture` /
`releasePointerCapture`; `touch-action: none`.
- `helpers.ts` — split classifier; `NATIVELY_RENDERED` exempts
image/video/pdf from the size gate.
- `ArtifactContent.tsx` — image/video carry a retry nonce that appends
`?_retry=N` on Try Again.
- Test files — new
`ArtifactErrorBoundary`/`ArtifactDragHandle`/`HTMLRenderer` tests, plus
regression cases added to `ArtifactContent.test.tsx`, `helpers.test.ts`,
`iframe-sandbox-csp.test.ts`, `reactArtifactPreview.test.ts`,
`useAutoOpenArtifacts.test.ts`.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm vitest run src/app/\(platform\)/copilot
src/components/contextual/OutputRenderers
src/lib/__tests__/iframe-sandbox-csp.test.ts` — 247/247 pass
  - [x] `pnpm format && pnpm types` clean
- [x] Manual: open the Plotly-style TOC HTML artifact (SECRT-2255
repro), click each anchor — iframe scrolls internally, browser URL bar
stays put
- [x] Manual: open panel → navigate to /profile → navigate back → panel
closed (SECRT-2254)
- [x] Manual: panel open in session A → click different session → panel
closed (SECRT-2223)
- [ ] Manual: simulate a failed artifact fetch → click Try Again →
skeleton flashes before result (SECRT-2224)
- [x] Manual: expand panel to near-full width → drag back right,
crossing over the iframe → drag keeps working and release ends it
(SECRT-2256)
- [x] Manual: upload a ~25 MB PNG → clicking it previews in an `<img>`,
not a download button (SECRT-2221)

Replaces #12836, #12837, #12838, #12839, #12840 — same fixes, bundled
for review.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches artifact rendering and iframe `srcDoc` generation (including
injected scripts) plus panel state/drag interactions; regressions could
break previews or resizing, but changes are scoped to the copilot
artifact UI with broad test coverage.
> 
> **Overview**
> Improves Copilot’s artifact panel resilience and UX by **resetting
panel state on page unmount/session changes**, making content retries
immediately show the loading skeleton, and fixing resize drags via
pointer capture so iframes can’t “steal” pointer events.
> 
> Hardens artifact rendering by adding a local `ArtifactErrorBoundary`
that reports to Sentry and shows a copyable error fallback instead of a
blank/crashed panel.
> 
> Fixes iframe-based previews by injecting a
`FRAGMENT_LINK_INTERCEPTOR_SCRIPT` into HTML and React artifact `srcDoc`
so `#anchor` clicks scroll within the iframe rather than navigating the
parent URL, and adjusts artifact classification/retry behavior so large
images/videos/PDFs remain previewable and image/video retries cache-bust
failed URLs.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
bde37a13fd. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:53:01 +00:00
Nicholas Tindle
07e5a6a9e4 [Snyk] Security upgrade next from 15.4.10 to 15.4.11 (#12715)
![snyk-top-banner](https://res.cloudinary.com/snyk/image/upload/r-d/scm-platform/snyk-pull-requests/pr-banner-default.svg)

### Snyk has created this PR to fix 1 vulnerabilities in the yarn
dependencies of this project.

#### Snyk changed the following file(s):

- `autogpt_platform/frontend/package.json`


#### Note for
[zero-installs](https://yarnpkg.com/features/zero-installs) users

If you are using the Yarn feature
[zero-installs](https://yarnpkg.com/features/zero-installs) that was
introduced in Yarn V2, note that this PR does not update the
`.yarn/cache/` directory meaning this code cannot be pulled and
immediately developed on as one would expect for a zero-install project
- you will need to run `yarn` to update the contents of the
`./yarn/cache` directory.
If you are not using zero-install you can ignore this as your flow
should likely be unchanged.



<details>
<summary>⚠️ <b>Warning</b></summary>

```
Failed to update the yarn.lock, please update manually before merging.
```

</details>



#### Vulnerabilities that will be fixed with an upgrade:

|  | Issue |  
:-------------------------:|:-------------------------
![high
severity](https://res.cloudinary.com/snyk/image/upload/w_20,h_20/v1561977819/icon/h.png
'high severity') | Allocation of Resources Without Limits or Throttling
<br/>[SNYK-JS-NEXT-15921797](https://snyk.io/vuln/SNYK-JS-NEXT-15921797)




---

> [!IMPORTANT]
>
> - Check the changes in this PR to ensure they won't cause issues with
your project.
> - Max score is 1000. Note that the real score may have changed since
the PR was raised.
> - This PR was automatically created by Snyk using the credentials of a
real user.

---

**Note:** _You are seeing this because you or someone else with access
to this repository has authorized Snyk to open fix PRs._

For more information: <img
src="https://api.segment.io/v1/pixel/track?data=eyJ3cml0ZUtleSI6InJyWmxZcEdHY2RyTHZsb0lYd0dUcVg4WkFRTnNCOUEwIiwiYW5vbnltb3VzSWQiOiJmM2NkN2NiMy1iYzU5LTRkMDMtOGExMi0xOTEwMDk4OGQwNmUiLCJldmVudCI6IlBSIHZpZXdlZCIsInByb3BlcnRpZXMiOnsicHJJZCI6ImYzY2Q3Y2IzLWJjNTktNGQwMy04YTEyLTE5MTAwOTg4ZDA2ZSJ9fQ=="
width="0" height="0"/>
🧐 [View latest project
report](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source&#x3D;github&amp;utm_medium&#x3D;referral&amp;page&#x3D;fix-pr)
📜 [Customise PR
templates](https://docs.snyk.io/scan-using-snyk/pull-requests/snyk-fix-pull-or-merge-requests/customize-pr-templates?utm_source=github&utm_content=fix-pr-template)
🛠 [Adjust project
settings](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source&#x3D;github&amp;utm_medium&#x3D;referral&amp;page&#x3D;fix-pr/settings)
📚 [Read about Snyk's upgrade
logic](https://docs.snyk.io/scan-with-snyk/snyk-open-source/manage-vulnerabilities/upgrade-package-versions-to-fix-vulnerabilities?utm_source=github&utm_content=fix-pr-template)

---

**Learn how to fix vulnerabilities with free interactive lessons:**

🦉 [Allocation of Resources Without Limits or
Throttling](https://learn.snyk.io/lesson/no-rate-limiting/?loc&#x3D;fix-pr)

[//]: #
'snyk:metadata:{"breakingChangeRiskLevel":null,"FF_showPullRequestBreakingChanges":false,"FF_showPullRequestBreakingChangesWebSearch":false,"customTemplate":{"variablesUsed":[],"fieldsUsed":[]},"dependencies":[{"name":"next","from":"15.4.10","to":"15.4.11"}],"env":"prod","issuesToFix":["SNYK-JS-NEXT-15921797"],"prId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","prPublicId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","packageManager":"yarn","priorityScoreList":[null],"projectPublicId":"3d924968-0cf3-4767-9609-501fa4962856","projectUrl":"https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr","prType":"fix","templateFieldSources":{"branchName":"default","commitMessage":"default","description":"default","title":"default"},"templateVariants":["updated-fix-title","pr-warning-shown"],"type":"auto","upgrade":["SNYK-JS-NEXT-15921797"],"vulns":["SNYK-JS-NEXT-15921797"],"patch":[],"isBreakingChange":false,"remediationStrategy":"vuln"}'

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Patch-level upgrade of a core runtime/build dependency (Next.js) can
affect app rendering/build behavior despite being scoped to
dependency/lockfile changes.
> 
> **Overview**
> Upgrades the frontend framework dependency `next` from `15.4.10` to
`15.4.11` in `package.json`.
> 
> Updates `pnpm-lock.yaml` to reflect the new Next.js version (including
`@next/env`) and re-resolves dependent packages that pin `next` in their
peer/optional dependency graphs (e.g., `@sentry/nextjs`,
`@vercel/analytics`, Storybook Next integration).
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
dc19e1f178. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:44:47 +00:00
Zamil Majdy
a098f01bd2 feat(builder): AI chat panel for the flow builder (#12699)
### Why

The flow builder had no AI assistance. Users had to switch to a separate
Copilot session to ask about or modify the agent they were looking at,
and that session had no context on the graph — so the LLM guessed, or
the user had to describe the graph by hand.

### What

An AI chat panel anchored to the `/build` page. Opens with a chat-circle
button (bottom-right), binds to the currently-opened agent, and offers
**only** two tools: `edit_agent` and `run_agent`. Per-agent session is
persisted server-side, so a refresh resumes the same conversation. Gated
behind `Flag.BUILDER_CHAT_PANEL` (default off;
`NEXT_PUBLIC_FORCE_FLAG_BUILDER_CHAT_PANEL=true` to enable locally).

### How

**Frontend — new**:
- `(platform)/build/components/BuilderChatPanel/` — panel shell +
`useBuilderChatPanel.ts` coordinator. Renders the shared Copilot
`ChatMessagesContainer` + `ChatInput` (thought rendering, pulse chips,
fast-mode toggle — all reused, no parallel chat stack). Auto-creates a
blank agent when opened with no `flowID`. Listens for `edit_agent` /
`run_agent` tool outputs and wires them to the builder in-place: edit →
`flowVersion` URL param + canvas refetch; run → `flowExecutionID` URL
param → builder's existing execution-follow UI opens.

**Frontend — touched (minimal)**:
- `copilot/components/CopilotChatActionsProvider` — new `chatSurface:
"copilot" | "builder"` flag so cards can suppress "Open in library" /
"Open in builder" / "View Execution" buttons when the chat is the
builder panel (you're already there).
- `copilot/tools/RunAgent/components/ExecutionStartedCard` — title is
now status-aware (`QUEUED → "Execution started"`, `COMPLETED →
"Execution completed"`, `FAILED → "Execution failed"`, etc.).
- `build/components/FlowEditor/Flow/Flow.tsx` — mount the panel behind
the feature flag.

**Backend — new**:
- `copilot/builder_context.py` — the builder-session logic module. Holds
the tool whitelist (`edit_agent`, `run_agent`), the permissions
resolver, the session-long system-prompt suffix (graph id/name + full
agent-building guide — cacheable across turns), and the per-turn
`<builder_context>` prefix (live version + compact nodes/links
snapshot).
- `copilot/builder_context_test.py` — covers both builders, ownership
forwarding, and cap behavior.

**Backend — touched**:
- `api/features/chat/routes.py` — `CreateSessionRequest` gains
`builder_graph_id`. When set, the endpoint routes through
`get_or_create_builder_session` (keyed on `user_id`+`graph_id`, with a
graph-ownership check). No new route; the former `/sessions/builder` is
folded into `POST /sessions`.
- `copilot/model.py` — `ChatSessionMetadata.builder_graph_id`;
`get_or_create_builder_session` helper.
- `data/graph.py` — `GraphSettings.builder_chat_session_id` (new typed
field; stores the builder-chat session pointer per library agent).
- `api/features/library/db.py` —
`update_library_agent_version_and_settings` preserves
`builder_chat_session_id` across graph-version bumps.
- `copilot/tools/edit_agent.py`, `run_agent.py` — builder-bound guard:
default missing `agent_id` to the bound graph, reject any other id.
`run_agent` additionally inlines `node_executions` into dry-run
responses so the LLM can inspect per-node status in the same turn
instead of a follow-up `view_agent_output`. `wait_for_result` docs now
explain the two dispatch modes.
- `copilot/tools/helpers.py::require_guide_read` — bypassed for
builder-bound sessions (the guide is already in the system-prompt
suffix).
- `copilot/tools/agent_generator/pipeline.py` + `tools/models.py` —
`AgentSavedResponse.graph_version` so the frontend can flip
`flowVersion` to the newly-saved version.
- `copilot/baseline/service.py` + `sdk/service.py` — inject the builder
context suffix into the system prompt and the per-turn prefix into the
current user message.
- `blocks/_base.py` — `validate_data(..., exclude_fields=)` so dry-run
can bypass credential required-checks for blocks that need creds in
normal mode (OrchestratorBlock). `blocks/perplexity.py` override
signature matches.
- `executor/simulator.py` — OrchestratorBlock dry-run iteration cap `1 →
min(original, 10)` so multi-role patterns (Advocate/Critic) actually
close the loop; `manager.py` synthesizes placeholder creds in dry-run so
the block's schema validation passes.

### Session lookup

The builder-chat session pointer lives on
`LibraryAgent.settings.builder_chat_session_id` (typed via
`GraphSettings`). `get_or_create_builder_session` reads/writes it
through `library_db().get_library_agent_by_graph_id` +
`update_library_agent(settings=...)` — no raw SQL or JSON-path filter.
Ownership is enforced by the library-agent query's `userId` filter. The
per-session builder binding still lives on
`ChatSession.metadata.builder_graph_id` (used by
`edit_agent`/`run_agent` guards and the system-prompt injection).

### Scope footnotes

- Feature flag defaults **false**. Rollout gate lives in LaunchDarkly.
- No schema migration required: `builder_chat_session_id` slots into the
existing `LibraryAgent.settings` JSON column via the typed
`GraphSettings` model.
- Commits that address review / CI cycles are interleaved with feature
commits — see the commit log for the per-change rationale.

### Test plan

- [x] `pnpm test:unit` + backend `poetry run test` for new and touched
modules
- [x] Agent-browser pass: panel toggle / auto-create / real-time edit
re-render / real-time exec URL subscribe / queue-while-streaming /
cross-graph reset / hard-refresh session persist
- [x] Codecov patch ≥ 80% on diff

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 22:47:23 +07:00
Nicholas Tindle
59273fe6a0 fix(frontend): forward sentry-trace and baggage across API proxy (#12835)
### Why / What / How

**Why:** Every request that went through Next's rewrite proxy broke
distributed tracing. The browser Sentry SDK emitted `sentry-trace` and
`baggage`, but `createRequestHeaders` only forwarded impersonation + API
key, so the backend started a disconnected transaction. The frontend →
backend lineage never appeared in Sentry. Same gap on
direct-from-browser requests: the custom mutator never attached the
trace headers itself, so even non-proxied paths lost the link.

**What:**
- **Server side:** forward `sentry-trace` and `baggage` from
`originalRequest.headers` alongside the existing impersonation/API key
forwarding.
- **Client side:** the custom mutator pulls trace data via
`Sentry.getTraceData()` and attaches it to outgoing headers when running
on the client.

**How:** Inline additions — no new observability module, no new
dependencies beyond `@sentry/nextjs` which the frontend already uses for
Sentry init.

### Changes 🏗️

- `src/lib/autogpt-server-api/helpers.ts` — forward `sentry-trace` +
`baggage` in `createRequestHeaders`.
- `src/app/api/mutators/custom-mutator.ts` — import `@sentry/nextjs`,
attach `Sentry.getTraceData()` on client-side requests.
- `src/app/api/mutators/__tests__/custom-mutator.test.ts` — three new
tests: trace-data present, trace-data empty, server-side no-op.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] `pnpm vitest run
src/app/api/mutators/__tests__/custom-mutator.test.ts` passes (6/6
locally)
  - [x] `pnpm format && pnpm lint` clean
- [x] `pnpm types` clean for touched files (pre-existing unrelated type
errors on dev are untouched)
- [ ] In a local session with Sentry enabled, a `/copilot` chat turn
produces a distributed trace that spans frontend transaction → backend
transaction (single trace ID in Sentry)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: header-only changes to request construction for
observability, with added tests; primary risk is unintended header
propagation affecting upstream/proxy behavior.
> 
> **Overview**
> Restores **Sentry distributed tracing continuity** for
frontend→backend calls by propagating `sentry-trace`/`baggage` headers.
> 
> On the client, `customMutator` now reads `Sentry.getTraceData()` and
attaches string trace headers to outgoing requests (guarded for
server-side and older Sentry builds). On the server/proxy path,
`createRequestHeaders` now forwards `sentry-trace` and `baggage` from
the incoming `originalRequest` alongside existing impersonation/API-key
forwarding, with new unit tests covering these cases.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
0f6946b776. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:29:19 +00:00
Nicholas Tindle
38c2844b83 feat(admin): Add system diagnostics and execution management dashboard (#11235)
### Changes 🏗️
This PR adds a comprehensive admin diagnostics dashboard for monitoring
system health and managing running executions.


https://github.com/user-attachments/assets/f7afa3ed-63d8-4b5c-85e4-8756d9e3879e


#### Backend Changes:
- **New data layer** (backend/data/diagnostics.py): Created a dedicated
diagnostics module following the established data layer pattern
- get_execution_diagnostics() - Retrieves execution metrics (running,
queued, completed counts)
  - get_agent_diagnostics() - Fetches agent-related metrics
- get_running_executions_details() - Lists all running executions with
detailed info
- stop_execution() and stop_executions_bulk() - Admin controls for
stopping executions

- **Admin API endpoints**
(backend/server/v2/admin/diagnostics_admin_routes.py):
  - GET /admin/diagnostics/executions - Execution status metrics
  - GET /admin/diagnostics/agents - Agent utilization metrics
- GET /admin/diagnostics/executions/running - Paginated list of running
executions
  - POST /admin/diagnostics/executions/stop - Stop single execution
- POST /admin/diagnostics/executions/stop-bulk - Stop multiple
executions
  - All endpoints secured with admin-only access

#### Frontend Changes:
- **Diagnostics Dashboard**
(frontend/src/app/(platform)/admin/diagnostics/page.tsx):
- Real-time system metrics display (running, queued, completed
executions)
  - RabbitMQ queue depth monitoring
  - Agent utilization statistics
  - Auto-refresh every 30 seconds

- **Execution Management Table**
(frontend/src/app/(platform)/admin/diagnostics/components/ExecutionsTable.tsx):
- Displays running executions with: ID, Agent Name, Version, User
Email/ID, Status, Start Time
  - Multi-select functionality with checkboxes
  - Individual stop buttons for each execution
  - "Stop Selected" and "Stop All" bulk actions
  - Confirmation dialogs for safety
  - Pagination for handling large datasets
  - Toast notifications for user feedback

#### Security:
- All admin endpoints properly secured with requires_admin_user
decorator
- Frontend routes protected with role-based access controls
- Admin navigation link only visible to admin users

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  
  - [x] Verified admin-only access to diagnostics page
  - [x] Tested execution metrics display and auto-refresh
  - [x] Confirmed RabbitMQ queue depth monitoring works
  - [x] Tested stopping individual executions
  - [x] Tested bulk stop operations with multi-select
  - [x] Verified pagination works for large datasets
  - [x] Confirmed toast notifications appear for all actions

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
(no changes needed)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (no changes needed)
- [x] I have included a list of my configuration changes in the PR
description (no config changes required)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new admin-only endpoints that can stop, requeue, and bulk-mark
executions as `FAILED`, plus schedule deletion, which can directly
impact production workload and data integrity if misused or buggy.
> 
> **Overview**
> Introduces a **System Diagnostics** admin feature spanning backend +
frontend to monitor execution/schedule health and perform remediation
actions.
> 
> On the backend, adds a new `backend/data/diagnostics.py` data layer
and `diagnostics_admin_routes.py` with admin-secured endpoints to fetch
execution/agent/schedule metrics (including RabbitMQ queue depths and
invalid-state detection), list problem executions/schedules, and perform
bulk operations like `stop`, `requeue`, and `cleanup` (marking
orphaned/stuck items as `FAILED` or deleting orphaned schedules). It
also extends `get_graph_executions`/`get_graph_executions_count` with
`execution_ids` filtering, pagination, started/updated time filters, and
configurable ordering to support efficient bulk/admin queries.
> 
> On the frontend, adds an admin diagnostics page with summary cards and
tables for executions and schedules (tabs for
orphaned/failed/long-running/stuck-queued/invalid, plus confirmation
dialogs for destructive actions), wires it into admin navigation, and
adds comprehensive unit tests for both the new API routes and UI
behavior.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
15b9ed26f9. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-21 15:28:44 +00:00
Zamil Majdy
24850e2a3e feat(backend/autopilot): stream extended_thinking on baseline via OpenRouter (#12870)
### Why / What / How

**Why:** Fast-mode autopilot never renders a Reasoning block. The
frontend already has `ReasoningCollapse` wired up and the wire protocol
already carries `StreamReasoning*` events (landed for SDK mode in
#12853), but the baseline (OpenRouter OpenAI-compat) path never asks
Anthropic for extended thinking and never parses reasoning deltas off
the stream. Result: users on fast/standard get a good answer with no
visible chain-of-thought, while SDK users see the full Reasoning
collapse.

**What:** Plumb reasoning end-to-end through the baseline path by opting
into OpenRouter's non-OpenAI `reasoning` extension, parsing the
reasoning delta fields off each chunk, and emitting the same
`StreamReasoningStart/Delta/End` events the SDK adapter already uses.

**How:**
- **New config:** `baseline_reasoning_max_tokens` (default 8192; 0
disables). Sent as `extra_body={"reasoning": {"max_tokens": N}}` only on
Anthropic routes — other providers drop the field, and
`is_anthropic_model()` already gates this.
- **Delta extraction:** `_extract_reasoning_delta()` handles all three
OpenRouter/provider variants in priority order — legacy
`delta.reasoning` (string), DeepSeek-style `delta.reasoning_content`,
and the structured `delta.reasoning_details` list (text/summary entries;
encrypted or unknown entries are skipped).
- **Event emission:** Reasoning uses the same state-machine rules the
SDK adapter uses — a text delta or tool_use delta arriving mid-stream
closes the open reasoning block first, so the AI SDK v5 transport keeps
reasoning / text / tool-use as distinct UI parts. On stream end, any
still-open reasoning block gets a matching `reasoning-end` so a
reasoning-only turn still finalises the frontend collapse.
- **Scope:** Live streaming only. Reasoning is not persisted to
`ChatMessage` rows or the transcript builder in this PR (SDK path does
so via `content_blocks=[{type: 'thinking', ...}]`, but that round-trip
requires Anthropic signature plumbing baseline doesn't have today).
Reload will still not show reasoning on baseline sessions — can follow
up if we decide it's worth the signature handling.

### Changes

- `backend/copilot/config.py` — new `baseline_reasoning_max_tokens`
field.
- `backend/copilot/baseline/service.py` — new
`_extract_reasoning_delta()` helper; reasoning block state on
`_BaselineStreamState`; `reasoning` gated into `extra_body`; chunk loop
emits `StreamReasoning*` events with text/tool_use transition rules;
stream-end closes any open reasoning block.
- `backend/copilot/baseline/service_unit_test.py` — 11 new tests
covering extractor variants (legacy string, deepseek alias, structured
list with text/summary aliases, encrypted-skip, empty), paired event
ordering (reasoning-end before text-start), reasoning-only streams, and
that the `reasoning` request param is correctly gated by model route
(Anthropic vs non-Anthropic) and by the config flag.

### Checklist

For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/copilot/baseline/service_unit_test.py
backend/copilot/baseline/transcript_integration_test.py` — 103 passed
- [ ] Manual: with `CHAT_USE_CLAUDE_AGENT_SDK=false` and
`CHAT_MODEL=anthropic/claude-sonnet-4-6`, send a multi-step prompt on
fast mode and confirm a Reasoning collapse appears alongside the final
text
- [ ] Manual: flip `CHAT_BASELINE_REASONING_MAX_TOKENS=0` and confirm
baseline responses revert to text-only (no reasoning param, no reasoning
UI)
- [ ] Manual: with a non-Anthropic baseline model (`openai/gpt-4o`),
confirm the request does NOT include `reasoning` and nothing regresses

For configuration changes:
- [x] `.env.default` is compatible — new setting falls back to the
pydantic default
2026-04-21 21:05:00 +07:00
Zamil Majdy
e17e9f13c4 fix(backend/copilot): reduce SDK + baseline prompt cache waste (#12866)
## Summary

Four cost-reduction changes for the copilot feature. Consolidated into
one PR at user request; each commit is self-contained and bisectable.

### 1. SDK: full cross-user cache on every turn (CLI 2.1.116 bump)
Previous behavior: CLI 2.1.97 crashed when `excludeDynamicSections=True`
was combined with `--resume`, so the code fell back to a raw
`system_prompt` string on resume, losing Claude Code's default prompt
and all cache markers. Every Turn 2+ of an SDK session wrote ~33K tokens
to cache instead of reading.

Fix: install `@anthropic-ai/claude-code@2.1.116` in the backend Docker
image and point the SDK at it via
`CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude`. CLI 2.1.98+ fixes the
crash, so we can use the preset with `exclude_dynamic_sections=True` on
every turn — Turn 1, 2, 3+ all share the same static prefix and hit the
**cross-user** prompt cache.

**Local dev requirement:** if `CHAT_CLAUDE_AGENT_CLI_PATH` is unset, the
bundled 2.1.97 fallback will crash on `--resume`. Install the CLI
globally (`npm install -g @anthropic-ai/claude-code@2.1.116`) or set the
env var.

### 2. Baseline: add `cache_control` markers (commit `756b3ecd9` +
follow-ups)
Baseline path had zero `cache_control` across `backend/copilot/**`.
Every turn was full uncached input (~18.6K tokens, ~$0.058). Two
ephemeral markers — on the system message (content-blocks form) and the
last tool schema — plus `anthropic-beta: prompt-caching-2024-07-31` via
`extra_headers` as defense-in-depth. Helpers split into `_mark_tools_*`
(precomputed once per session) and `_mark_system_*` (per-round, O(1)).
Repeat hellos: ~$0.058 → ~$0.006.

### 3. Drop `get_baseline_supplement()` (commit `6e6c4d791`)
`_generate_tool_documentation()` emitted ~4.3K tokens of `(tool_name,
description)` pairs that exactly duplicated the tools array already in
the same request. Deleted. `SHARED_TOOL_NOTES` (cross-tool workflow
rules) is preserved. Baseline "hello" input: ~18.7K → ~14.4K tokens.

### 4. Langfuse "CoPilot Prompt" v26 (published under `review` label)
Separate, out-of-repo change. v25 had three duplicate "Example Response"
blocks + a 10-step "Internal Reasoning Process" section. v26 collapses
to one example + bullet-form reasoning. Char count 20,481 → 7,075 (rough
4 chars/token → ~5,100 → ~1,770 tokens).

- v26 is published with label `review` (NOT `production`); v25 remains
active.
- Promote via `mcp__langfuse__updatePromptLabels(name="CoPilot Prompt",
version=26, newLabels=["production"])` after smoke-test.
- Rollback: relabel v25 `production`.

## Test plan
- [x] Unit tests for `_build_system_prompt_value` (fresh vs resumed
turns emit identical preset dict)
- [x] SDK compat tests pass including
`test_bundled_cli_version_is_known_good_against_openrouter`
- [x] `cli_openrouter_compat_test.py` passes against CLI 2.1.116
(locally verified with
`CHAT_CLAUDE_AGENT_CLI_PATH=/opt/homebrew/bin/claude`)
- [x] 8 new `_mark_*` unit tests + identity regression test for
`_fresh_*` helpers
- [x] `SHARED_TOOL_NOTES` public-constant test passes; 5 old tool-docs
tests removed
- [ ] **Manual cost verification (commit 1):** send two consecutive SDK
turns; Turn 2 and Turn 3 should both show `cacheReadTokens` ≈ 33K (full
cross-user cache hits).
- [ ] **Manual cost verification (commit 2):** send two "hello" turns on
baseline <5 min apart; Turn 2 reports `cacheReadTokens` ≈ 18K and cost ≈
$0.006.
- [ ] **Regression sweep for commit 3:** one turn per tool family —
`search_agents`, `run_agent`,
`add_memory`/`forget_memory`/`search_memory`, `search_docs`,
`read_workspace_file` — to verify no tool-selection regression from
dropping the prose tool docs.
- [ ] **Langfuse v26 smoke test:** 5-10 varied turns after relabelling
to `production`; compare responses vs v25 for regression on persona,
concision, capability-gap handling, credential security flows.

## Deployment notes
- Production Docker image now installs CLI 2.1.116 (~20 MB added).
- `CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude` set in the Dockerfile;
runtime can override via env.
- First deploy after this merge needs a fresh image rebuild to pick up
the new CLI.
2026-04-21 16:34:10 +07:00
Zamil Majdy
f238c153a5 fix(backend/copilot): release session cluster lock on completion (#12867)
## Summary

Fixes a bug where a chat session gets silently stuck after the user
presses Stop mid-turn.

**Root cause:** the cancel endpoint marks the session `failed` after
polling 5s, but the cluster lock held by the still-running task is only
released by `on_run_done` when the task actually finishes. If the task
hangs past the 5s poll (slow LLM call, agent-browser step, etc.), the
lock lingers for up to 5 min — `stream_chat_post`'s `is_turn_in_flight`
check sees the flipped meta (`failed`) and enqueues a new turn, but the
run handler sees the stale lock and drops the user's message at
`manager.py:379` (`reject+requeue=False`). The new SSE stream hangs
until its 60s idle timeout.

### Fix

Two cooperating changes:

1. **`mark_session_completed` force-releases the cluster lock** in the
same transaction that flips status to `completed`/`failed`.
Unconditional delete — by the time we're declaring the session dead, we
don't care who the current lock holder is; the lock has to go so the
next enqueued turn can acquire. This is what closes the stuck-session
window.
2. **`ClusterLock.release()` is now owner-checked** (Lua CAS — `GET ==
token ? DEL : noop` atomically). Force-release means another pod may
legitimately own the key by the time the original task's `on_run_done`
eventually fires. Without the CAS, that late `release()` would wipe the
successor's lock. With it, the late `release()` is a safe no-op when the
owner has changed.

Together: prompt release on completion (via force-delete) + safe cleanup
when on_run_done catches up (via CAS). That re-syncs the API-level
`is_turn_in_flight` check with the actual lock state, so the contention
window disappears.

No changes to the worker-level contention handler: `stream_chat_post`
already queues incoming messages into the pending buffer when a turn is
in flight (via `queue_pending_for_http`). With these fixes, the worker
never sees contention in the common case; if it does (true multi-pod
race), the pre-existing `reject+requeue=False` behaviour still applies —
we'll revisit that path with its own PR if it becomes a production
symptom.

### Verification

- Reproduced the original stuck-session symptom locally (Stop mid-turn →
send new message → backend logs `Session … already running on pod …`,
user message silently lost, SSE stream idle 60s then closes).
- After the fix: cancel → new message → turn starts normally (lock
released by `mark_session_completed`).
- `poetry run pyright` — 0 errors on edited files.
- `pytest backend/copilot/stream_registry_test.py
backend/executor/cluster_lock_test.py` — 33 passed (includes the
successor-not-wiped test).

## Changes

- `autogpt_platform/backend/backend/copilot/executor/utils.py` — extract
`get_session_lock_key(session_id)` helper so the lock-key format has a
single source of truth.
- `autogpt_platform/backend/backend/copilot/executor/manager.py` — use
the helper where the cluster lock is created.
- `autogpt_platform/backend/backend/copilot/stream_registry.py` —
`mark_session_completed` deletes the lock key after the atomic status
swap (force-release).
- `autogpt_platform/backend/backend/executor/cluster_lock.py` —
`ClusterLock.release()` (sync + async) uses a Lua CAS to only delete
when `GET == token`, protecting against wiping a successor after a
force-release.

## Test plan

- [ ] Send a message in /copilot that triggers a long turn (e.g.
`run_agent`), press Stop before it finishes, then send another message.
Expect: new turn starts promptly (no 5-min wait for lock TTL).
- [ ] Happy path regression — send a normal message, verify turn
completes and the session lock key is deleted after completion.
- [ ] Successor protection — unit test
`test_release_does_not_wipe_successor_lock` covers: A acquires, external
DEL, B acquires, A.release() is a no-op, B's lock intact.
2026-04-21 16:27:01 +07:00
Zamil Majdy
01f1289aac feat(copilot): real OpenRouter cost + cost-based rate limits (percent-only public API) (#12864)
## Why

After d7653acd0 removed cost estimation, most baseline turns log with
`tracking_type="tokens"` and no authoritative USD figure (see: dashboard
flipped from `cost_usd` to `tokens` after 4/14/2026). Rate-limit
counters were also token-weighted with hand-rolled cache discounts
(cache_read @ 10%, cache_create @ 25%) and a 5× Opus multiplier — a
proxy for cost that drifts from real OpenRouter billing.

This PR wires real generation cost from OpenRouter into both the
cost-tracking log and the rate limiter, and hides raw spend figures from
the user-facing API so clients can't reverse-engineer per-turn cost or
platform margins.

## What

1. **Real cost from OpenRouter** — baseline passes `extra_body={"usage":
{"include": True}}` and reads `chunk.usage.cost` from the final
streaming chunk. `x-total-cost` header path removed. Missing cost logs
an error and skips the counter update (vs the old estimator that
silently under-counted).
2. **Cost-based rate limiting** — `record_token_usage(...)` →
`record_cost_usage(cost_microdollars)`. The weighted-token math, cache
discount factors, and `_OPUS_COST_MULTIPLIER` are gone; real USD already
reflects model + cache pricing.
3. **Redis key migration** — `copilot:usage:*` → `copilot:cost:*` so
stale token counters can't be misinterpreted as microdollars.
4. **LD flags + config** — renamed to
`copilot-daily-cost-limit-microdollars` /
`copilot-weekly-cost-limit-microdollars` (unit in the LD key so values
can't accidentally be set in dollars or cents).
5. **Public `/usage` hides raw $$** — new `CoPilotUsagePublic` /
`UsageWindowPublic` schemas expose only `percent_used` (0-100) +
`resets_at` + `tier` + `reset_cost`. Admin endpoint keeps raw
microdollars for debugging.
6. **Admin API contract** — `UserRateLimitResponse` fields renamed
`daily/weekly_token_limit` → `daily/weekly_cost_limit_microdollars`,
`daily/weekly_tokens_used` → `daily/weekly_cost_used_microdollars`.
Admin UI displays `$X.XX`.

## How

- `baseline/service.py` — pass `extra_body`, extract cost from
`chunk.usage.cost`, drop the `x-total-cost` header fallback entirely.
- `rate_limit.py` — rewritten around `record_cost_usage`,
`check_rate_limit(daily_cost_limit, weekly_cost_limit)`, new Redis key
prefix. Adds `CoPilotUsagePublic.from_status()` projector for the public
API.
- `token_tracking.py` — converts `cost_usd` → microdollars via
`usd_to_microdollars` and calls `record_cost_usage` only when cost is
present.
- `sdk/service.py` — deletes `_OPUS_COST_MULTIPLIER` and simplifies
`_resolve_model_and_multiplier` to `_resolve_sdk_model_for_request`.
- Chat routes: `/usage` and `/usage/reset` return `CoPilotUsagePublic`.
Internal server-side limit checks still use the raw microdollar
`CoPilotUsageStatus`.
- Admin routes: unchanged response shape (renamed fields only).
- Frontend: `UsagePanelContent`, `UsageLimits`, `CopilotPage`,
`BriefingTabContent`, `credits/page.tsx` consume the new public schema
and render "N% used" + progress bar. Admin `RateLimitDisplay` /
`UsageBar` keep `$X.XX`. Helper `formatMicrodollarsAsUsd` retained for
admin use.
- Tests + snapshots rewritten; new assertions explicitly check that raw
`used`/`limit` keys are absent from the public payload.

## Deploy notes

1. **Before rolling this out, create the new LD flags:**
`copilot-daily-cost-limit-microdollars` (default `500000`) and
`copilot-weekly-cost-limit-microdollars` (default `2500000`). Old
`copilot-*-token-limit` flags can stay in LD for rollback.
2. **One-time Redis cleanup (optional):** token-based counters under
`copilot:usage:*` are orphaned and will TTL out within 7 days. Safe to
ignore or delete manually.

## Test plan

- [x] `poetry run test` — all impacted backend tests pass (182/182 in
targeted scope)
- [x] `pnpm test:unit` — all 1628 integration tests pass
- [x] `poetry run format` / `pnpm format` / `pnpm types` clean
- [x] Manual sanity against dev env — Baseline turn logged $0.1221 for
40K/139 tokens on Sonnet 4 (matches expected pricing)
- [ ] `/pr-test --fix` end-to-end against local native stack
2026-04-21 14:34:43 +07:00
Zamil Majdy
343222ace1 feat(platform): defer paid-to-paid subscription downgrades + cancel-pending flow (#12865)
### Why / What / How

**Why:** Only downgrades to FREE were scheduled at period end; paid→paid
downgrades (e.g. BUSINESS→PRO) applied immediately via Stripe proration.
The asymmetry meant users lost their higher tier mid-cycle in exchange
for a Stripe credit voucher only redeemable on a future subscription — a
confusing pattern that produces negative-value paths for users actually
cancelling. There was also no way to cancel a pending downgrade or
paid→FREE cancellation once scheduled.

**What:** Standardize on "upgrade = immediate, downgrade = next cycle"
and let users cancel a pending change by clicking their current tier.
Harden the new code against conflicting subscription state, concurrent
tab races, flaky Stripe calls, and hot-path latency regressions.

**How:**

Subscription state machine:
- **Upgrade** (PRO→BUSINESS) — `stripe.Subscription.modify` with
immediate proration (unchanged). If a downgrade schedule is already
attached, release it first so the upgrade wins.
- **Paid→paid downgrade** (BUSINESS→PRO) — creates a
`stripe.SubscriptionSchedule` with two phases (current tier until
`current_period_end`, target tier after). No mid-cycle tier demotion.
Defensive pre-clear: existing schedule → release;
`cancel_at_period_end=True` → set to False.
- **Paid→FREE** — unchanged: `cancel_at_period_end=True`.
- **Same-tier update** — reuses the existing `POST
/credits/subscription` route. When `target_tier == current_tier`,
backend calls `release_pending_subscription_schedule` (idempotent) and
returns status. No dedicated cancel-pending endpoint — "Keep my current
tier" IS the cancel operation.
- `release_pending_subscription_schedule` is idempotent on
terminal-state schedules and clears both `schedule` and
`cancel_at_period_end` atomically per call.

API surface:
- New fields on `SubscriptionStatusResponse`: `pending_tier` +
`pending_tier_effective_at` (pulled from the schedule's next-phase
`start_date` so dashboard-authored schedules report the correct
timestamp).
- `POST /credits/subscription` now returns `SubscriptionStatusResponse`
(previously `SubscriptionCheckoutResponse`); the response still carries
`url` for checkout flows and adds the status fields inline.
- `get_pending_subscription_change` is cached with a 30s TTL — avoids
hammering Stripe on every home-page load.
- Webhook dispatches
`subscription_schedule.{released,completed,updated}` through the main
`sync_subscription_from_stripe` flow so both event sources converge to
the same DB state.

Implementation notes:
- New Stripe calls use native async (`stripe.Subscription.list_async`
etc.) and typed attribute access — no `run_in_threadpool` wrapping in
the new helpers.
- Shared `_get_active_subscription` helper collapses the "list
active/trialing subs, take first" pattern used by 4 callers.

Frontend:
- `PendingChangeBanner` sub-component above the tier grid with formatted
effective date + "Keep [CurrentTier]" button. `aria-live="polite"` for
screen readers; locale pinned to `en-US` to avoid SSR/CSR hydration
mismatch.
- "Keep [CurrentTier]" also available as a button on the current tier
card.
- Other tier buttons disabled while a change is pending — user must
resolve pending first to prevent stacked schedules.
- `cancelPendingChange` reuses `useUpdateSubscriptionTier` with `tier:
current_tier`; awaits `refetch()` on both success and error paths so the
UI reconciles even if the server succeeded but the client didn't receive
the response.

### Changes

**Backend (`credit.py`, `v1.py`)**
- Tier-ordering helpers (`is_tier_upgrade`/`is_tier_downgrade`).
- `modify_stripe_subscription_for_tier` routes downgrades through
`_schedule_downgrade_at_period_end`; upgrade path releases any pending
schedule first.
- `_schedule_downgrade_at_period_end` defensively releases pre-existing
schedules and clears `cancel_at_period_end` before creating the new
schedule.
- `release_pending_subscription_schedule` idempotent on terminal-state
schedules; logs partial-failure outcomes.
- `_next_phase_tier_and_start` returns both tier and phase-start
timestamp; warns on unknown prices.
- `get_pending_subscription_change` cached (30s TTL), narrow exception
handling.
- `sync_subscription_schedule_from_stripe` delegates to
`sync_subscription_from_stripe` for convergence with the main webhook
path.
- Shared `_get_active_subscription` +
`_release_schedule_ignoring_terminal` helpers.
- `POST /credits/subscription` absorbs the same-tier "cancel pending
change" branch.

**Frontend (`SubscriptionTierSection/*`)**
- `PendingChangeBanner` new sub-component (a11y, locale-pinned date,
paid→FREE vs paid→paid copy split, non-null effective-date assertion, no
`dark:` utilities).
- "Keep [CurrentTier]" button on current tier card.
- `useSubscriptionTierSection` — `cancelPendingChange` reuses the
update-tier mutation.
- Copy: downgrade dialog + status hint updated.
- `helpers.ts` extracted from the main component.

**Tests**
- Backend: +24 tests (95/95 passing): upgrade-releases-pending-schedule,
schedule-releases-existing-schedule, cancel-at-period-end collision,
terminal-state release idempotency, unknown-price logging, status
response population, same-tier-POST-with-pending, webhook delegation.
- Frontend: +5 integration tests (21/21 passing): banner render/hide,
Keep-button click from banner + current card, paid→paid dialog copy.

### Checklist

- [x] Backend unit tests: 95 pass
- [x] Frontend integration tests: 21 pass
- [x] `poetry run format` / `poetry run lint` clean
- [x] `pnpm format` / `pnpm lint` / `pnpm types` clean
- [ ] Manual E2E on live Stripe (dev env) — pending deploy: BUSINESS→PRO
creates schedule, DB tier unchanged until period end
- [ ] Manual E2E: "Keep BUSINESS" in banner releases schedule
- [ ] Manual E2E: cancel pending paid→FREE flips `cancel_at_period_end`
back to false
- [ ] Manual E2E: BUSINESS→PRO (scheduled) then attempt BUSINESS→FREE
clears the PRO schedule, sets cancel_at_period_end
- [ ] Manual E2E: BUSINESS→PRO (scheduled) then upgrade back to BUSINESS
releases the schedule
2026-04-21 14:01:09 +07:00
Zamil Majdy
a8226af725 fix(copilot): dedupe tool row, lift bash_exec timeout, Stop+resend recovery (#12862)
Closes #12861 · [OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096)

## Why

Four related copilot UX / stability issues surfaced on dev once action
tools started rendering inline in the chat (see #12813):

### 1. Duplicate bash_exec row

`GenericTool` rendered two rows saying the same thing for every
completed tool call — a muted subtitle line ("Command exited with code
1" / "Ran: sleep 20") **and** a `ToolAccordion` with the command echoed
in its description. Previously hidden inside the "Show reasoning" /
"Show steps" collapse, now visibly duplicated.

### 2. `bash_exec` capped at 120s via advisory text

The tool schema said `"Max seconds (default 30, max 120)"`; the model
obeyed, so long-running scripts got clipped at 120s with a vague `Timed
out after 120s` even though the E2B sandbox has no such limit. Confirmed
via Langfuse traces — the model picks `120` for long scripts because
that's what the schema told it the max was. E2B path never had a
server-side clamp.

Originally added in #12103 (default 30) and tightened to "max 120"
advisory in #12398 (token-reduction pass).

### 3. 30s default was too aggressive

`pip install`, small data-processing scripts, etc. routinely cross 30s
and got killed before the model thought to retry with a bigger timeout.

### 4. Stop + edit + resend → "The assistant encountered an error"
([OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096))

Two independent bugs both land on the same banner — fixing only one
leaves the other visible on the next action.

**4a. Stream lock never released on Stop** *(the error in the ticket
screenshot)*. The executor's `async for chunk in
stream_and_publish(...)` broke out on `cancel.is_set()` without calling
`aclose()` on the wrapper. `async for` does NOT auto-close iterators on
`break`, so `stream_chat_completion_sdk` stayed suspended at its current
`await` — still holding the per-session Redis lock (TTL 120s) until GC
eventually closed it. The next `POST /stream` hit `lock.try_acquire()`
at
[sdk/service.py](autogpt_platform/backend/backend/copilot/sdk/service.py)
and yielded `StreamError("Another stream is already active for this
session. Please wait or stop it.")`. The `except GeneratorExit →
lock.release()` handler written exactly for this case never fired
because nothing sent GeneratorExit.

**4b. Orphan `tool_use` after stop-mid-tool.** Even with the lock
released, the stop path persists the session ending on an assistant row
whose `tool_calls` have no matching `role="tool"` row. On the next turn,
`_session_messages_to_transcript` hands Claude CLI `--resume` a JSONL
with a `tool_use` and no paired `tool_result`, and the SDK raises a
vague error — same banner. The ticket's "Open questions" explicitly
flags this.

## What

**Frontend — `GenericTool.tsx`** split responsibilities between the two
rows so they don't duplicate:
- **Subtitle row** (always visible, muted): *what ran* — `Ran: sleep
120`. Never the exit code.
- **Accordion description**: *how it ended* — `completed` / `status code
127 · bash: missing-bin: command not found` / `Timed out after 120s` /
(fallback to command preview for legacy rows missing `exit_code` /
`timed_out`). Pulled from the first non-empty line of `stdout` /
`stderr` when available.
- **Expanded accordion**: full command + stdout + stderr code blocks
(unchanged).

**Backend — `bash_exec.py`**:
- Drop the "max 120" advisory from the schema description.
- Bump default `timeout: 30 → 120`.
- Clean up the result message — `"Command executed with status code 0"`
(no "on E2B", no parens).

**Backend — `executor/processor.py` + `stream_registry.py` (OPEN-3096
#4a)**: wrap the consumer `async for` in `try/finally: await
stream.aclose()`. Close now propagates through `stream_and_publish` into
`stream_chat_completion_sdk`, whose existing `except GeneratorExit →
lock.release()` releases the Redis lock immediately on cancel. Stream
types tightened to `AsyncGenerator[StreamBaseResponse, None]` so the
defensive `getattr(stream, "aclose", None)` goes away.

**Backend — `session_cleanup.py` (OPEN-3096 #4b)**: new
`prune_orphan_tool_calls()` helper walks the trailing session tail and
drops any trailing assistant row whose `tool_calls` have unresolved ids
(plus everything after it) and any trailing `STOPPED_BY_USER_MARKER`
system-stop row. Single backward pass — tolerates the marker being
present or absent. Called from the existing turn-start cleanup in both
`sdk/service.py` and `baseline/service.py`; takes an optional
`log_prefix` so both paths emit the same INFO log when something was
popped. In-memory only — the DB save path is append-only via
`start_sequence`.

## Test plan

- [x] `pnpm exec vitest run src/app/(platform)/copilot/tools/GenericTool
src/app/(platform)/copilot/components/ChatMessagesContainer` — 105 pass
(6 new for GenericTool subtitle/description variants + legacy-fallback
case).
- [x] `pnpm format` / `pnpm lint` / `pnpm types` — clean.
- [x] `poetry run pytest
backend/copilot/sdk/session_persistence_test.py` — 17 pass (6 + 3 new
covering the orphan-tool-call prune and its optional-log-prefix branch).
- [x] `poetry run pytest backend/copilot/stream_registry_test.py
backend/copilot/executor/processor_test.py` — 19 pass (2 for aclose
propagation on the `stream_and_publish` wrapper, 2 for `_execute_async`
aclose propagation on both exit paths, 1 for publish_chunk RedisError
warning ladder).
- [x] `poetry run ruff check` / `poetry run pyright` on touched files —
clean.
- [x] Manual: fire a `bash_exec` — one labelled row, accordion
description reads sensibly (`completed` / `status code 1 · …` / `Timed
out after 120s`).
- [x] Manual: script that needs >120s — no longer clipped.
- [x] Manual: Stop mid-tool + edit + resend — Autopilot resumes without
"Another stream is already active" and without the vague SDK error.

## Scope note

Does not touch `splitReasoningAndResponse` — re-collapsing action tools
back into "Show steps" is #12813's responsibility.
2026-04-21 10:18:52 +07:00
Ubbe
f06b5293de fix(frontend/library): compute monthly spend for AgentBriefingPanel (#12854)
### Why / What / How

<img width="900" alt="Screenshot 2026-04-20 at 19 52 22"
src="https://github.com/user-attachments/assets/c30d5f18-2842-4a8a-ac3d-5bfee18fcd56"
/>

**Why:** The "Spent this month" tile in the Agent Briefing Panel on the
Library page always showed `$0`, even for users with real execution
usage. The tile is meant to give a quick sense of monthly spend across
all agents.

**What:** Compute `monthlySpend` from actual execution data and format
it as currency.

**How:**
- `useLibraryFleetSummary` now sums `stats.cost` (cents) across every
execution whose `started_at` falls within the current calendar month.
Previously `monthlySpend` was hardcoded to `0`.
- `FleetSummary.monthlySpend` is documented as being in cents
(consistent with backend + `formatCents`).
- `StatsGrid` now uses `formatCents` from the copilot usage helpers to
render the tile (e.g. `$12.34` instead of the broken `$0`).

### Changes 🏗️

-
`autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts`:
aggregate `stats.cost` across executions started in the current calendar
month; add `toTimestamp` and `startOfCurrentMonth` helpers.
-
`autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx`:
format the "Spent this month" tile via shared `formatCents` helper.
- `autogpt_platform/frontend/src/app/(platform)/library/types.ts`:
document that `FleetSummary.monthlySpend` is in cents.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Load `/library` with the `AGENT_BRIEFING` flag enabled and at
least one completed execution in the current month — the "Spent this
month" tile shows the correct cumulative cost.
  - [ ] With no executions this month, the tile shows `$0.00`.
- [ ] Type-check (`pnpm types`), lint (`pnpm lint`), and integration
tests (`pnpm test:unit`) pass locally.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:28:47 +07:00
Zamil Majdy
70b591d74f fix(copilot): persist reasoning, split steps/reasoning UX, fix mid-turn promote stream stall (#12853)
## Why

Four related issues that surfaced when queued follow-ups hit an
extended_thinking turn:

1. **Mid-turn promote stalled the SSE stream.** `pollBackendAndPromote`
used `setMessages((prev) => [...prev, bubble])` — Vercel AI SDK's
`useChat` streams SSE deltas into `messages[-1]`, so once a user bubble
ended up there, every subsequent chunk silently landed on the wrong
message. Chat sat frozen until a page refresh, even though the backend's
stream completed cleanly.
2. **Thinking-only final turn looked identical to a frozen UI.** When
Claude's last LLM call after a tool_result produced only a
`ThinkingBlock` (no `TextBlock`, no `ToolUseBlock`), the response
adapter silently dropped it and the UI hung on "Thought for Xs" with no
response text.
3. **Reasoning was invisible.** `ThinkingBlock` was dropped live and
never persisted in a way the frontend could render — sessions on reload
/ shared links showed no thinking, a confusing UX gap ("display for
nothing").
4. **Cross-pod Redis replay dropped reasoning events.** The
`stream_registry._reconstruct_chunk` type map had no entries for
`reasoning-*` types, so any client that subscribed mid-stream (share,
reload, cross-pod) silently dropped them with `Unknown chunk type:
reasoning-delta`.

## What

### Mid-turn promote — splice before the trailing assistant

In `useCopilotPendingChips.ts::pollBackendAndPromote`:

```ts
setMessages((prev) => {
  const bubble = makePromotedUserBubble(drained, "midturn", crypto.randomUUID());
  const lastIdx = prev.length - 1;
  if (lastIdx >= 0 && prev[lastIdx].role === "assistant") {
    return [...prev.slice(0, lastIdx), bubble, prev[lastIdx]];
  }
  return [...prev, bubble];
});
```

Streaming assistant stays at `messages[-1]`, AI SDK deltas keep routing
correctly. `useHydrateOnStreamEnd` snaps the bubble to the DB-canonical
position when the stream ends.

### Reasoning — end-to-end visibility (live + persisted)

- **Wire protocol**: new `StreamReasoningStart` / `StreamReasoningDelta`
/ `StreamReasoningEnd` events matching AI SDK v5's `reasoning-*` wire
names, so `useChat` accumulates them into a `type: 'reasoning'`
UIMessage part natively.
- **Response adapter**: every `ThinkingBlock` now emits reasoning
events; text/tool_use transitions close the open reasoning block so AI
SDK doesn't merge distinct parts.
- **Stream registry**: added `reasoning-*` types to
`_reconstruct_chunk`'s type_to_class map so Redis replay no longer drops
them on cross-pod / reload / share.
- **Persistence** (new): each `StreamReasoningStart` opens a
`ChatMessage(role="reasoning")` row in `session.messages`; deltas
accumulate into its content; `StreamReasoningEnd` closes it. No schema
migration — `ChatMessage.role` is already `String`.
`extract_context_messages` filters `role="reasoning"` out of LLM context
(the `--resume` CLI session already carries thinking separately) so the
model never re-ingests prior reasoning.
- **Frontend conversion**: `convertChatSessionMessagesToUiMessages` maps
`role="reasoning"` DB rows into `{type: "reasoning", text}` parts on the
surrounding assistant bubble, so reload / shared-link sessions render
reasoning identically to live stream.

### Steps / Reasoning UX — modal + accordion split

- **`StepsCollapse`** (new): a Dialog-backed "Show steps" modal wraps
the pre-final-answer group (tool timeline + per-block reasoning). Modal
keeps the steps visually grouped and out of the reading flow.
- **`ReasoningCollapse`** (rewritten): inline accordion with "Show
reasoning" / "Hide reasoning" toggle — no longer a modal, so it expands
*inside* the Steps modal without stacking two dialogs. Reasoning text
appears indented with a left border.
- **`splitReasoningAndResponse`**: reasoning parts now stay in the
reasoning group (instead of being pinned out), so they show up inside
the Steps modal alongside the tool-use timeline.

### Thinking-only final turn — synthesize a closing line
(belt-and-suspenders)

- **Prompt rule** (`_USER_FOLLOW_UP_NOTE`): "Every turn MUST end with at
least one short user-facing text sentence."
- **Adapter fallback**: tracks `_text_since_last_tool_result`; at
`ResultMessage success` with tools run + zero text since, opens a fresh
step (`UserMessage` already closed the previous one) and injects `"(Done
— no further commentary.)"` before `StreamFinish`. Only fires for the
pathological case — pure-text turns untouched.

## Test plan

- [x] `pnpm vitest run` on copilot files — all 638 prior tests pass;
**17 new tests** added covering:
- `convertChatSessionToUiMessages`: reasoning row alone / merged with
assistant text / multi-row / empty skip / duration capture
- `ReasoningCollapse`: initial collapsed, toggle, `rotate-90`,
`aria-expanded`
  - `StepsCollapse`: trigger + dialog open renders children
- `MessagePartRenderer`: reasoning → `<pre>` inside collapse,
whitespace/missing text → null
  - `splitReasoningAndResponse`: reasoning-stays-in-reasoning regression
- [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py` —
36 pass (7 new: 4 reasoning streaming, 3 thinking-only fallback)
- [x] Manual: reasoning streams live and persists across reload on a
fresh session
- [x] Manual: previously-created sessions (pre-persistence) don't have
`role="reasoning"` rows — behaves as a clean no-op (no reasoning shown,
no error), new sessions render reasoning inside Steps modal

## Notes

- No DB migration — `ChatMessage.role` is already an open `String`;
`role="reasoning"` is simply filtered out of LLM context builds but
rendered by the frontend.
- Addresses /pr-review blockers: (a) stream_registry missing reasoning
types in Redis round-trip, (b) fallback text emitted outside a step, (c)
dead `case "thinking"` in renderer (now uses the live `reasoning` type
uniformly).
2026-04-19 10:37:04 +07:00
Zamil Majdy
b1c043c2d8 feat(copilot): queue follow-up messages on busy sessions (UI + run_sub_session + AutoPilot block) (#12737)
## Why

Users and tools can target a copilot session that already has a turn
running. Before this PR there was no uniform behaviour for that case —
the UI manually routed to a separate queue endpoint, `run_sub_session`
and the AutoPilot block raced the cluster lock, and in-turn follow-ups
only reached the model at turn-end via auto-continue. Outcome: dropped
messages, duplicate tool rows, missed mid-turn intent, latent
correctness bugs in block execution.

## What

A single "message arrived → turn already running?" primitive, shared by
every caller:

1. **POST `/stream`** (UI chat): self-defensive. Session idle → SSE as
today; session busy → `202 application/json` with `{buffer_length,
max_buffer_length, turn_in_flight}`. The deprecated `POST
/messages/pending` endpoint is removed (`GET /messages/pending` peek
stays).
2. **`run_copilot_turn_via_queue`** (shared primitive from #12841, used
by `run_sub_session` + `AutoPilotBlock`): gains the same busy-check.
Busy session → push to pending buffer, return `("queued",
SessionResult(queued=True, pending_buffer_length=N))` without creating a
stream registry session or enqueueing a RabbitMQ job. All callers
inherit queueing.
3. **Mid-turn delivery**: drained follow-ups are attached to every
tool_result's `additionalContext` via the SDK's `PostToolUse` hook —
covers both MCP and built-in tools (WebSearch/Read/Agent/etc.), not just
`run_block`. Claude reads the queued text on the next LLM round of the
same turn.
4. **UI observability**: chips promote to a proper user bubble at the
correct chronological position (after the tool_result row that consumed
them). Auto-continue handles end-of-turn drainage; mid-turn backend poll
handles the tool-boundary drainage path.

## How

**Data plane**
- `backend/copilot/pending_messages.py` — Redis list per session
(LPOP-count for atomic drain), TTL, fire-and-forget pub/sub notify. MAX
10 per session.
- `backend/copilot/pending_message_helpers.py` — `is_turn_in_flight`,
`queue_user_message`, `drain_and_format_for_injection`,
`persist_pending_as_user_rows` (shared persist+rollback used by both
baseline and SDK paths).
- `backend/data/redis_helpers.py` — centralised `incr_with_ttl`,
`capped_rpush`, `hash_compare_and_set`; every Lua script and pipeline
atomicity lives in one place.

**Injection sites**
- `backend/copilot/sdk/security_hooks.py::post_tool_use_hook` — drains +
returns `additionalContext`. Single hook covers built-in + MCP tools.
- `backend/copilot/sdk/service.py` — `StreamToolOutputAvailable`
dispatch persists the drained follow-up as a real user row right after
the tool_result (UI bubble at the right index).
`state.midturn_user_rows` keeps the CLI upload watermark honest.
- `backend/copilot/baseline/service.py` — same drain at round
boundaries, uses the shared `persist_pending_as_user_rows` helper so
baseline + SDK code paths don't diverge.

**Dispatch**
- `backend/copilot/sdk/session_waiter.py::run_copilot_turn_via_queue` —
`is_turn_in_flight` short-circuit; `SessionResult` gains `queued` +
`pending_buffer_length`; `SessionOutcome` gains `"queued"`.
- `backend/api/features/chat/routes.py::stream_chat_post` — busy-check
returns 202 with `QueuePendingMessageResponse`; `POST /messages/pending`
deleted.
- `backend/copilot/tools/run_sub_session.py` / `models.py` —
`SubSessionStatusResponse.status` gains `"queued"`;
`response_from_outcome` renders a clear queued-state message with the
pending-buffer depth and a link to watch live.
- `backend/blocks/autopilot.py::execute_copilot` — surfaces queued state
as descriptive response text + empty `tool_calls`/history when
`result.queued`.

**Frontend**
- `src/app/(platform)/copilot/useCopilotPendingChips.ts` — hook owning
the chip lifecycle: backend peek on session load, auto-continue
promotion when a second assistant id appears, mid-turn poll that
promotes when the backend count drops.
- `src/app/(platform)/copilot/useHydrateOnStreamEnd.ts` —
force-hydrate-waits-for-fresh-reference dance extracted.
- `src/app/(platform)/copilot/helpers/stripReplayPrefix.ts` — pure
function with drop / strip / streaming-catch-up cases + helper
decomposition.
- `src/app/(platform)/copilot/helpers/makePromotedBubble.ts` — one-line
helper for the promoted bubble shape.
- `src/app/(platform)/copilot/helpers/queueFollowUpMessage.ts` — thin
`fetch` wrapper for the 202 path (AI SDK's `useChat` fetcher only
handles SSE, so we can't reuse `sendMessage` for the queued response).

## Test plan

Backend unit + integration (`poetry run pytest backend/copilot
backend/api/features/chat`):
- [x] 107 tests pass — pending buffer, drain helpers, routes,
session_waiter queue branch, run_sub_session outcome rendering,
autopilot block
- [x] New `session_waiter_test.py` proves the queue branch
short-circuits `stream_registry.create_session` + `enqueue_copilot_turn`
- [x] Mid-turn persist has a rollback-and-re-queue path tested for when
`session.messages` persist silently fails to back-fill sequences

Frontend unit (`pnpm vitest run`):
- [x] 630 tests pass incl. 22 new for extracted helpers + hooks
- [x] Frontend coverage on touched copilot files: 91%+ (patch 87.37%)

Manual (once merged):
- [ ] Queue two chips while a tool is running; Claude acknowledges both
on the next round, UI shows bubbles in typing order after the tool
output
- [ ] Hand AutoPilot block an existing session_id that has a live turn;
block returns queued status, in-flight turn drains the message on its
next round
- [ ] `run_sub_session` against a busy sub — status=`queued`,
`sub_autopilot_session_link` lets user watch live

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 00:48:59 +07:00
Zamil Majdy
fcaebd1bb7 refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841)
### Why / What / How

**Why:** the 10-min stream-level idle timeout was killing legitimate
long-running tool calls — notably sub-AutoPilot runs via
`run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom
users saw was `"A tool call appears to be stuck"` even though AutoPilot
was actively working. A second long-standing rough edge was shipped
alongside: agents often skipped `get_agent_building_guide` when
generating agent JSON, producing schemas that failed validation and
burned turns on auto-fix loops.

**What:** three threaded pieces.

1. **Async sub-AutoPilot via `run_sub_session`.** New copilot tool that
delegates a task to a fresh (or resumed) sub-AutoPilot, and its
companion `get_sub_session_result` for polling/cancelling. The agent
starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the
sub isn't done inside the cap, receives a handle + polls via
`get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever
blocks the stream for more than 5 min, so the 10-min stream-idle timer
stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`).

2. **Queue-backed copilot turn dispatch** — one code path for all three
callers.
- `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing
`copilot_execution` exchange instead of spawning an in-process
`asyncio.Task`.
- `AutoPilotBlock.execute_copilot` (graph block) now uses the **same
queue** instead of `collect_copilot_response` inline.
   - The HTTP SSE endpoint was already queue-backed.
- All three share a single primitive: `run_copilot_turn_via_queue` →
`create_session` → `enqueue_copilot_turn` → `wait_for_session_result`.
The event-aggregation logic (`EventAccumulator`/`process_event`) is a
shared module used by both the direct-stream path and the cross-process
waiter.
- Benefits: **deploy/crash resilience** (RabbitMQ redelivery survives
worker restarts), **natural load balancing** across copilot_executor
workers, **sessions as first-class resources** (UI users can
`/copilot?sessionId=<inner>` into any sub or AutoPilot block's session),
and every future stream-level feature (pending-messages drain #12737,
compaction policies, etc.) applies uniformly instead of bypassing
graph-block sessions.

3. **Guide-read gate on agent-generation tools.** `create_agent` /
`edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until
the session has called `get_agent_building_guide`. The pre-existing soft
hint was routinely ignored; the gate makes the dependency enforceable.
All four tool descriptions advertise the requirement in one tightened
sentence ("Requires get_agent_building_guide first (refuses
otherwise).") that stays under the 32000-char schema budget.

**How:**

#### Queue-backed sub-AutoPilot + AutoPilotBlock

- `sdk/session_waiter.py` — new module. `SessionResult` dataclass
mirrors `CopilotResult`. `wait_for_session_result` subscribes to
`stream_registry`, drains events via shared `process_event`, returns
`(outcome, result)`. `wait_for_session_completion` is the cheaper
outcome-only variant. `run_copilot_turn_via_queue` is the canonical
three-step dispatch. Every exit path unsubscribes the listener.
- `sdk/stream_accumulator.py` — new module. `EventAccumulator`,
`ToolCallEntry`, `process_event` extracted from `collect.py`. Both the
direct-stream and cross-process paths now use the same fold logic.
- `tools/run_sub_session.py` / `tools/get_sub_session_result.py` —
rewritten around the shared primitive. `sub_session_id` is now the sub's
`ChatSession` id directly (no separate registry handle). Ownership
re-verified on every call via `get_chat_session`. Cancel via
`enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange.
- `blocks/autopilot.py` — `execute_copilot` replaced its inline
`collect_copilot_response` with `run_copilot_turn_via_queue`.
`SessionResult` carries response text, tool calls, and token usage back
from the worker so no DB round-trip is needed. The block's public I/O
contract (inputs, outputs, `ToolCallEntry` shape) is unchanged.
- `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions |
None` field forwarded to the worker's `stream_fn` so the sub's
capability filter survives the queue hop. The processor passes it
through to `stream_chat_completion_sdk` /
`stream_chat_completion_baseline`.
- **Deleted**: `sdk/sub_session_registry.py` (module-level dict,
done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`,
`_reset_for_test`), plus the shutdown-notifier hook in
`copilot_executor.processor.cleanup` — redundant under queue-backed
execution.

#### Run_block single-tool cap (3)

- `tools/helpers.execute_block` caps block execution at
`MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the
generator consumption.
- On timeout: logs `copilot_tool_timeout tool=run_block block=…
block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and
returns an `ErrorResponse` that redirects the LLM to `run_agent` /
`run_sub_session`.
- Billing protection: `_charge_block_credits` is called in a `finally`
guarded by `asyncio.shield` and marked `charge_handled` **before** the
await so cancel-mid-charge doesn't double-bill and
cancel-mid-generator-before-charge still settles via the finally.

#### Guide-read gate

- `helpers.require_guide_read(session, tool_name)` scans
`session.messages` for any prior assistant tool call named
`get_agent_building_guide` (handles both OpenAI and flat shapes).
Applied at the top of `_execute` in `create_agent`, `edit_agent`,
`validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise
the requirement.

#### Shared timing constants

- `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 *
MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool
(`run_agent`, `view_agent_output`, `run_sub_session`,
`get_sub_session_result`, `run_block`) imports from one place; no more
hardcoded 300 / `10*60` literals drifting apart. Stream-idle invariant
("no single tool blocks close to the idle timeout") holds by
construction.

### Frontend

- Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot",
`get_sub_session_result` → "Sub-AutoPilot result", `run_block` →
"Action" (matches the builder UI's own naming), `run_agent` → "Agent".
Fixes the double-verb "Running Run …" phrasing.
- `SubSessionStatusResponse.sub_autopilot_session_link` surfaces
`/copilot?sessionId=<inner>` so users can click into any sub's session
from the tool-call card — same pattern as `run_agent`'s
`library_agent_link`.

### Changes 🏗️

- **New modules**: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`,
`tools/run_sub_session.py`, `tools/get_sub_session_result.py`,
`tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`.
- **New response types**: `SubSessionStatusResponse`,
`SubSessionProgressSnapshot`, `SessionResult`.
- **New gate helper**: `require_guide_read` in `tools/helpers.py`.
- **Queue protocol**: `permissions` field on `CoPilotExecutionEntry`,
threaded through `processor.py` → `stream_fn`.
- **Hidden**: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS`
(run_block can't execute AutoPilotBlock; agents use `run_sub_session`
instead).
- **Deleted**: `sdk/sub_session_registry.py`, processor
shutdown-notifier hook.
- **Regenerated**: `openapi.json` for the new response types; block-docs
for the updated `ToolName` Literal.
- **Tool descriptions**: tightened the guide-gate hint across the four
agent-builder tools to stay under the 32000-char schema budget.
- **40+ tests** across sub_session, execute_block cap + billing races,
stream_accumulator, agent_guide_gate, frontend helpers.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Unit suite green on the full copilot tree; `poetry run format` +
`pyright` clean
- [x] Schema character budget test passes (tool descriptions trimmed to
stay under 32000)
- [x] Native UI E2E (`poetry run app` + `pnpm dev`):
`run_sub_session(wait_for_result=60)` returns `status="completed"` +
`sub_autopilot_session_link` inline;
`run_sub_session(wait_for_result=1)` returns `status="running"` +
handle, `get_sub_session_result(wait_if_running=60)` observes `running →
completed` transition
- [x] AutoPilotBlock (graph) goes through `copilot_executor` queue
end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node
spawned session `f6de335b-…`, a different `CoPilotExecutor` worker
acquired its cluster lock and ran the SDK stream)
- [x] Guide gate: `create_agent` without a prior
`get_agent_building_guide` returns the refusal; agent reads the guide
and retries successfully
2026-04-18 23:11:41 +07:00
Toran Bruce Richards
1c0c7a6b44 fix(copilot): add gh auth status check to Tool Discovery Priority section (#12832)
## Problem

The CoPilot system prompt contains a `gh auth status` instruction in the
E2B-specific `GitHub CLI` section, but models pattern-match to
`connect_integration` from the **Tool Discovery Priority** section —
which is where the actual decision to call an external service is made.

Because the GitHub auth check lives in a separate, later section, it's
not salient at the point of decision-making. This causes the model to
call `connect_integration(provider='github')` even when `gh` is already
authenticated via `GH_TOKEN`, unnecessarily prompting the user.

## Fix

Add a 3-line callout directly inside the **Tool Discovery Priority**
section:

```
> 🔑 **GitHub exception:** Before calling `connect_integration` for GitHub,
> always run `gh auth status` first. If it shows `Logged in`, proceed
> directly with `gh`/`git` — no integration connection needed.
```

This places the rule at the exact location where the model decides which
tool path to take, preventing the miss.

## Why this works

- **Placement over repetition**: The existing instruction isn't wrong —
it's just in the wrong spot relative to where the decision is made
- **Negative framing**: Explicitly says "before calling
`connect_integration`" which directly intercepts the incorrect reflex
- **Minimal change**: 4 lines added, zero removed

Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
2026-04-17 15:22:10 +00:00
Joe Munene
3a01874911 fix(frontend/builder): preserve agent name in AgentExecutor node title after reload (#12805)
## Summary

Fixes #11041

When an `AgentExecutorBlock` is placed in the builder, it initially
displays the agent's name (e.g., "Researcher v2"). After saving and
reloading the page, the title reverts to the generic "Agent Executor."

## Root Cause

The backend correctly persists `agent_name` and `graph_version` in
`hardcodedValues` (via `input_default` in `AgentExecutorBlock`).
However, `NodeHeader.tsx` always resolves the display title from
`data.title` (the generic block name), ignoring the persisted agent
name.

## Fix

Modified the title resolution chain in `NodeHeader.tsx` to check
`data.hardcodedValues.agent_name` between the user's custom name and the
generic block title:

1. `data.metadata.customized_name` (user's manual rename) — highest
priority
2. `agent_name` + ` v{graph_version}` from `hardcodedValues` — **new**
3. `data.title` (generic block name) — fallback

This is a frontend-only change. No backend modifications needed.

## Files Changed

-
`autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeHeader.tsx`
(+11, -1)

## Test Plan

- [x] Place an AgentExecutorBlock, select an agent — title shows agent
name
- [x] Save graph, reload page — title still shows agent name (was "Agent
Executor" before)
- [x] Double-click to rename — custom name takes priority over agent
name
- [x] Clear custom name — falls back to agent name
- [x] Non-AgentExecutor blocks — unaffected, show generic title as
before

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-17 15:20:32 +00:00
Zamil Majdy
6d770d9917 fix(platform/copilot): revert forward pagination, add visibility guarantee for blank chat (#12831)
## Why / What / How

**Why:** PR #12796 changed completed copilot sessions to load messages
from sequence 0 forward (ascending), which broke the standard chat UX —
users now land at the beginning of the conversation instead of the most
recent messages. Reported in Discord.

**What:** Reverts the forward pagination approach and replaces it with a
visibility guarantee that ensures every page contains at least one
user/assistant message.

**How:**
- **Backend**: Removed after_sequence, from_start, forward_paginated,
newest_sequence — always use backward (newest-first) pagination. Added
_expand_for_visibility() helper: after fetching, if the entire page is
tool messages (invisible in UI), expand backward up to 200 messages
until a visible user/assistant message is found.
- **Frontend**: Removed all forwardPaginated/newestSequence plumbing
from hooks and components. Removed bottom LoadMoreSentinel. Simplified
message merge to always prepend paged messages.

### Changes
- routes.py: Reverted to simple backward pagination, removed TOCTOU
re-fetch logic
- db.py: Removed forward mode, extracted _expand_tool_boundary() and
added _expand_for_visibility()
- SessionDetailResponse: Removed newest_sequence and forward_paginated
fields
- openapi.json: Removed after_sequence param and forward pagination
response fields
- Frontend hooks/components: Removed forward pagination props and logic
(-1000 lines)
- Updated all tests (backend: 63 pass, frontend: 1517 pass)

### Checklist
- [x] I have clearly listed my changes in the PR description
- [x] Backend unit tests: 63 pass
- [x] Frontend unit tests: 1517 pass
- [x] Frontend lint + types: clean
- [x] Backend format + pyright: clean
2026-04-17 19:23:28 +07:00
slepybear
334ec18c31 docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819)
### Why / What / How

<!-- Why: Why does this PR exist? What problem does it solve, or what's
broken/missing without it? -->
This PR converts inline Python comments in code examples within
`block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code
examples cleaner and more copy-paste friendly while preserving all
explanatory content.

<!-- What: What does this PR change? Summarize the changes at a high
level. -->
Converts inline comments in code blocks to admonitions following the
pattern established in PR #12396 (new_blocks.md) and PR #12313.

<!-- How: How does it work? Describe the approach, key implementation
details, or architecture decisions. -->
- Wrapped code examples with `!!! note` admonitions
- Removed inline comments from code blocks for clean copy-paste
- Added explanatory admonitions after each code block

### Changes 🏗️

- Provider configuration examples (API key and OAuth)
- Block class Input/Output schema annotations
- Block initialization parameters
- Test configuration
- OAuth and webhook handler implementations
- Authentication types and file handling patterns

### Checklist 📋

#### For documentation changes:
- [x] Follows the admonition pattern from PR #12396
- [x] No code changes, documentation only
- [x] Admonition syntax verified correct

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes

---

**Related Issues**: Closes #8946

Co-authored-by: slepybear <slepybear@users.noreply.github.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-17 07:47:52 +00:00
slepybear
ea5cfdfa2e fix(frontend): remove debug console.log statements (#12823)
## Why
Debug console.log statements were left in production code, which can
leak
sensitive information and pollute browser developer consoles.

## What
Removed console.log from 4 non-legacy frontend components:
- useNavbar.ts: isLoggedIn debug log
- WalletRefill.tsx: autoRefillForm debug log  
- EditAgentForm.tsx: category field debug log
- TimezoneForm.tsx: currentTimezone debug log

## How
Simply deleted the console.log lines as they served no purpose 
other than debugging during development.

## Checklist
- [x] Code follows project conventions
- [x] Only frontend changes (4 files, 6 lines removed)
- [x] No functionality changes

Co-authored-by: slepybear <slepybear@users.noreply.github.com>
2026-04-17 07:31:51 +00:00
Ubbe
d13a85bef7 feat(frontend): surface scheduled agents in library & copilot briefings (#12818)
## Why

Scheduled agents weren't well-surfaced in the Library and Copilot
briefings:

- The Library fleet summary didn't count agents that are scheduled
purely via the scheduler (only those with a `recommended_schedule_cron`
set at the agent level).
- Sitrep items didn't distinguish scheduled or listening (trigger-based)
agents, so they often fell back to a generic "idle" state.
- Scheduled chips showed a generic message with no indication of when
the next run would happen.
- The Copilot Agent Briefing surfaced every scheduled agent regardless
of how far out the next run was — an agent scheduled a month away would
take a slot from something actually happening soon.
- Long sitrep messages overflowed the row.

## What

- Add `is_scheduled` to `LibraryAgent` (sourced from the scheduler) so
the frontend can reliably detect schedule-only agents.
- Count scheduled agents in `useLibraryFleetSummary`.
- Include scheduled and listening agents in sitrep items, with a
priority ordering (error → running → stale → success → listening →
scheduled → idle).
- Show a relative next-run time on scheduled sitrep chips (e.g.
"Scheduled to run in 2h" / "in 3d").
- Filter the Copilot Agent Briefing to scheduled agents whose next run
is within the next 3 days.
- Truncate long sitrep messages to 1 line with `OverflowText` and show
the full text in a tooltip on hover.

## How

- Scheduler → `LibraryAgent` mapping populates `is_scheduled` /
`next_scheduled_run`.
- `useSitrepItems` gains an optional `scheduledWithinMs` parameter.
Copilot's `usePulseChips` passes `3 * 24 * 60 * 60 * 1000`; the Library
briefing omits it to keep its existing (unbounded) behavior.
- Scheduled config-based sitrep items are skipped when
`next_scheduled_run` is missing or outside the window.
- `SitrepItem` wraps the message in `OverflowText` so a single-line
ellipsis + hover tooltip replaces raw overflow.

## Test plan

- [ ] `/library` — scheduled and listening agents appear in the sitrep
with accurate copy; fleet summary counts scheduled agents correctly;
long messages truncate with a tooltip on hover.
- [ ] `/copilot` — on an empty session with the `AGENT_BRIEFING` flag
on, the briefing only shows scheduled agents whose next run is within 3
days; agents scheduled further out no longer appear as "scheduled"
chips.
- [ ] Scheduled chip text reads "Scheduled to run in {Nm|Nh|Nd}"
matching `next_scheduled_run`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:36:15 +07:00
Zamil Majdy
60b85640e7 fix(backend/copilot): replace dedup lock with idempotent append_and_save_message (#12814)
## Why

The Redis dedup lock (`chat:msg_dedup:{session}:{content_hash}`, 30s
TTL) was solving the wrong problem:

- Its purpose: block infra/nginx retries from calling
`append_and_save_message` twice after a client disconnect, writing a
duplicate user message to the DB.
- The approach: deliberately hold the lock for 30s on `GeneratorExit`.
- Why unnecessary: the executor's cluster lock already prevents
duplicate *execution*. The only real gap was duplicate *DB writes* in
the ~1s before the executor picks up the turn.

## What

- **Deleted** `message_dedup.py` and `message_dedup_test.py` (~150 lines
removed).
- **Removed** all dedup lock code from `routes.py` (~40 lines removed).
- **`append_and_save_message`** is now idempotent and self-contained:
- Uses redis-py's built-in `Lock(timeout=10, blocking_timeout=2)` —
Lua-script atomic acquire/release, no manual poll/sleep loop.
- Lock context manager yields `bool` (`True` = acquired, `False` =
degraded). When degraded (Redis down or 2s timeout), reads from DB
directly instead of cache to avoid stale-state duplicates.
- Idempotency check: if `session.messages[-1]` already matches the
incoming role+content, returns `None` instead of the session.
- Lock released explicitly as soon as the write completes; `try/except`
in `finally` so a cleanup error after a successful write never surfaces
a false 500.
- On cache-write failure, the stale cache entry is invalidated so future
reads fall back to the authoritative DB.
- **`routes.py`** uses the `None` signal: `is_duplicate_message = (await
append_and_save_message(...)) is None`
- Skips `create_session` and `enqueue_copilot_turn` for duplicates —
client re-attaches to the existing turn's Redis stream.
- `track_user_message` and `turn_id` generation only happen when
`is_duplicate_message` is false.
- **`subscribe_to_session`** retry window increased from 1×50ms to
3×100ms — covers the window where a duplicate request subscribes before
the original's `create_session` hset completes.
- **Cleaned up** `routes_test.py`: removed 5 dedup-specific tests and
the `mock_redis` setup from `_mock_stream_internals`; added
duplicate-skips-enqueue test.

## How

The idempotency guard distinguishes legit same-text messages from
retries via the **assistant turn between them**: if the user said "yes",
got a response, and says "yes" again, `session.messages[-1]` is the
assistant reply, so the role check fails and the second message goes
through. A retry (no response yet) sees the user message as the last
entry and is blocked.

```python
if (
    session.messages
    and session.messages[-1].role == message.role
    and session.messages[-1].content == message.content
):
    return None  # duplicate — caller skips enqueue
```

The Redis lock ensures this check always sees authoritative state even
in multi-replica deployments. When the lock is unavailable (Redis down
or contention), reading from DB directly (bypassing potentially stale
cache) provides the same safety guarantee at the cost of a DB
round-trip.

## Checklist

- [x] PR targets `dev`
- [x] Conventional commit title with scope
- [x] Tests added/updated (duplicate detection, lock degradation, DB
error, cache invalidation paths)
- [x] `poetry run format` and `poetry run pyright` pass clean
- [x] No new linter suppressors
2026-04-16 22:12:30 +07:00
Zamil Majdy
87e4d42750 fix(backend/copilot): fix initial load missing messages + forward pagination for completed sessions (#12796)
### Why / What / How

**Why:** Completed copilot sessions with many messages showed a
completely empty chat view. A user reported a 158-message session that
appeared blank on reload.

**What:** Two bugs fixed:
1. **Backend** — initial page load always returned the newest 50
messages in DESC order. For sessions heavy in tool calls, the user's
original messages (seq 0–5) were never included; all 50 slots consumed
by mid-session tool outputs.
2. **Frontend** — convertChatSessionToUiMessages silently dropped user
messages with null/empty content.

**How:** For completed sessions (no active stream), the backend now
loads from sequence 0 in ASC order. Active/streaming sessions keep
newest-first for streaming context. A new after_sequence forward cursor
enables infinite-scroll for subsequent pages (sentinel moves to bottom).
The frontend wires forward_paginated + newest_sequence end-to-end.

### Changes 🏗️

- db.py: added from_start (ASC) and after_sequence (forward cursor)
modes; added newest_sequence to PaginatedMessages
- routes.py: detect completed vs active on initial load; pass
from_start=True for completed; expose newest_sequence +
forward_paginated; accept after_sequence param
- convertChatSessionToUiMessages.ts: never drop user messages with empty
content
- useLoadMoreMessages.ts: forward pagination via after_sequence; append
pages to end
- ChatMessagesContainer.tsx: LoadMoreSentinel at bottom for
forward-paginated sessions
- Wire newestSequence + forwardPaginated end-to-end through
useChatSession/useCopilotPage/ChatContainer
- openapi.json: add after_sequence + newest_sequence/forward_paginated;
regenerate types
- db_test.py: 9 new unit tests for from_start and after_sequence modes

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open a completed session with many messages — first user message
visible on initial load
- [x] Scroll to bottom of completed session — load more appends next
page
- [x] Open active/streaming session — newest messages shown first,
streaming unaffected
  - [x] Backend unit tests: all 28 pass
  - [x] Frontend lint/format: clean, no new type errors

---------

Co-authored-by: chernistry <73943355+chernistry@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-16 14:16:54 +00:00
Ubbe
0339d95d12 fix(frontend): small UI fixes, sort menu bg, name update auth, stats grid overflow, pulse chips (#12815)
## Summary
- **LibrarySortMenu / AgentFilterMenu**: Force `!bg-transparent` and
neutralise legacy `SelectTrigger` styles (`m-0.5`, `ring-offset-white`,
`shadow-sm`) that caused a white background around the trigger
- **EditNameDialog**: Replace client-side `supabase.auth.updateUser()`
with server-side `PUT /api/auth/user` route — fixes "Auth session
missing!" error caused by `httpOnly` cookies being inaccessible to
browser JS
- **StatsGrid**: Swap label `Text` for `OverflowText` so tile labels
truncate with `…` and show a tooltip instead of wrapping when the grid
is squeezed
- **PulseChips**: Set fixed `15rem` chip width with `shrink-0`,
horizontal scroll, and styled thin scrollbar
- **Tests**: Updated `EditNameDialog` tests to use MSW instead of
mocking Supabase client; added 7 new `PulseChips` integration tests

## Test plan
- [x] `pnpm test:unit` — all 1495 tests pass (91 files)
- [x] `pnpm format && pnpm lint` — clean
- [x] `pnpm types` — no new errors (pre-existing only)
- [ ] QA `/library?sort=updatedAt` — sort menu trigger has no white bg
- [ ] QA `/library` — StatsGrid labels truncate with tooltip on narrow
viewports
- [ ] QA `/copilot` — PulseChips scroll horizontally at fixed width
- [ ] QA `/copilot` — Edit name dialog saves successfully (no "Auth
session missing!")

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 20:11:21 +07:00
Toran Bruce Richards
f410929560 feat(platform): Add xAI Grok 4.20 models from OpenRouter (#12620)
Requested by @Torantulino

Adds the 2 xAI Grok 4.20 models available on OpenRouter that are missing
from the platform.

## Why

`x-ai/grok-4.20` and `x-ai/grok-4.20-multi-agent` are xAI's current
flagship models (released March 2026) and are available via OpenRouter,
but weren't accessible from the platform's LLM blocks.

## Changes

**`autogpt_platform/backend/backend/blocks/llm.py`**
- Added `GROK_4_20` and `GROK_4_20_MULTI_AGENT` enum members
- Added corresponding `MODEL_METADATA` entries (open_router provider, 2M
context window, price tier 3)

**`autogpt_platform/backend/backend/data/block_cost_config.py`**
- Added `MODEL_COST` entries at 5 credits each (flagship tier, $2/M in)

**`docs/integrations/block-integrations/llm.md`**
- Added new model IDs to all LLM block tables

| Model | Pricing | Context |
|-------|---------|---------|
| `x-ai/grok-4.20` | $2/M in, $6/M out | 2M |
| `x-ai/grok-4.20-multi-agent` | $2/M in, $6/M out | 2M |

Both models use the standard OpenRouter chat completions API — no
special handling needed.

Resolves: SECRT-2196

---------

Co-authored-by: Torantulino <22963551+Torantulino@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-04-16 12:14:56 +00:00
Zamil Majdy
2bbec09e1a feat(platform): subscription tier billing via Stripe Checkout (#12727)
## Why

Introducing paid subscription tiers (PRO, BUSINESS) so we can charge for
AutoPilot capacity beyond the free tier. Without a billing integration,
all users share the same rate limits regardless of their willingness to
pay for additional capacity.

## What

End-to-end subscription billing system using Stripe Checkout Sessions:

**Backend:**
- `SubscriptionTier` enum (`FREE`, `PRO`, `BUSINESS`, `ENTERPRISE`) on
the `User` model
- `POST /credits/subscription` — creates a Stripe Checkout Session for
paid upgrades; for FREE tier or when `ENABLE_PLATFORM_PAYMENT` is off,
sets tier directly
- `GET /credits/subscription` — returns current tier, monthly cost
(cents), and all tier costs
- `POST /credits/stripe_webhook` — handles
`customer.subscription.created/updated/deleted`,
`checkout.session.completed`, `charge.dispute.*`, `refund.created`
- `sync_subscription_from_stripe()` — keeps `User.subscriptionTier` in
sync from webhook events; guards against out-of-order delivery
(cancelled event after new sub created), ENTERPRISE overwrite, and
duplicate webhook replay
- Open-redirect protection on `success_url`/`cancel_url` via
`_validate_checkout_redirect_url()`
- `_cancel_customer_subscriptions()` — cancels both active and trialing
subs; propagates errors so callers can avoid updating DB tier on Stripe
failure
- `_cleanup_stale_subscriptions()` — best-effort cancellation of old
subs when a new one becomes active (paid-to-paid upgrade), to prevent
double-billing
- `get_stripe_customer_id()` with idempotency key to prevent duplicate
Stripe customers on concurrent requests
- `cache_none=False` sentinel fix in `@cached` decorator so Stripe price
lookups retry on transient error instead of poisoning the cache with
`None`
- Stripe Price IDs read from LaunchDarkly (`stripe-price-id-pro`,
`stripe-price-id-business`). If not configured, upgrade returns 422.

**Frontend:**
- `SubscriptionTierSection` component on billing page: tier cards
(FREE/PRO/BUSINESS), upgrade/downgrade buttons, per-tier cost display,
Stripe redirect on upgrade
- Confirmation dialog for downgrades
- ENTERPRISE users see a read-only admin-managed banner
- Success toast on return from Stripe Checkout (`?subscription=success`)
- Uses generated `useGetSubscriptionStatus` /
`useUpdateSubscriptionTier` hooks

## How

- Paid upgrades use Stripe Checkout Sessions (not server-side
subscription creation) — Stripe handles PCI-compliant card collection
and the subscription lifecycle
- Tier is synced back via webhook on
`customer.subscription.created/updated/deleted`
- Downgrade to FREE cancels via Stripe API immediately; a
`stripe.StripeError` during cancellation returns 502 with a generic
message (no Stripe detail leakage)
- LaunchDarkly flags: `stripe-price-id-pro` (string),
`stripe-price-id-business` (string), `enable-platform-payment` (bool)
- `ENABLE_PLATFORM_PAYMENT=false` bypasses Stripe for beta/internal
access (sets tier directly)

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `ENABLE_PLATFORM_PAYMENT=false` → tier change updates directly, no
Stripe redirect
- [x] `ENABLE_PLATFORM_PAYMENT=true` with price IDs configured → paid
upgrade redirects to Stripe Checkout
- [x] Stripe webhook `customer.subscription.created` →
`User.subscriptionTier` updated
  - [x] Unrecognised price ID in webhook → logs warning, tier unchanged
  - [x] ENTERPRISE user webhook event → tier not overwritten
  - [x] Empty `STRIPE_WEBHOOK_SECRET` → 503 (prevents HMAC bypass)
  - [x] Open-redirect attack on `success_url`/`cancel_url` → 422

#### For configuration changes:
- [x] No `.env` or `docker-compose.yml` changes
- [x] LaunchDarkly flags to create: `stripe-price-id-pro` (string),
`stripe-price-id-business` (string), `enable-platform-payment` (bool)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: majdyz <majdy.zamil@gmail.com>
2026-04-16 17:52:06 +07:00
Ubbe
31b88a6e56 feat(frontend): add Agent Briefing Panel (#12764)
## Summary

<img width="800" height="772" alt="Screenshot_2026-04-13_at_18 29 19"
src="https://github.com/user-attachments/assets/3da6eaf2-1485-4c08-9651-18f2f4220eba"
/>
<img width="800" height="285" alt="Screenshot_2026-04-13_at_18 29 24"
src="https://github.com/user-attachments/assets/6a5f981a-1e1d-4d22-a33d-9e1b0e7555a7"
/>
<img width="800" height="288" alt="Screenshot_2026-04-13_at_18 29 27"
src="https://github.com/user-attachments/assets/f97b4611-7c23-4fc9-a12d-edf6314a77ef"
/>
<img width="800" height="433" alt="Screenshot_2026-04-13_at_18 29 31"
src="https://github.com/user-attachments/assets/e6d7241d-84f3-4936-b8cd-e0b12df392bb"
/>
<img width="700" height="554" alt="Screenshot_2026-04-13_at_18 29 40"
src="https://github.com/user-attachments/assets/92c08f21-f950-45cd-8c1d-529905a6e85f"
/>


Implements the Agent Intelligence Layer — real-time agent awareness
across the Library and Copilot pages.

### Core Features
- **Agent Briefing Panel** — stats grid with fleet-wide counts (running,
recently completed, needs attention, scheduled, idle, monthly spend) and
tab-driven content below
- **Enhanced Library Cards** — StatusBadge, run counts, contextual
action buttons (See tasks, Start, Chat) with consistent icon-left
styling
- **Situation Report Items** — prioritized sitrep with error-first
ranking, "See task" deep-links for completed runs, and "Ask AutoPilot"
bridge
- **Home Pulse Chips** — agent status chips on Copilot empty state with
hover-reveal actions (slide-up animation + backdrop blur on desktop,
always visible on touch)
- **Edit Display Name** — pencil icon on Copilot greeting to update
Supabase user metadata inline

### Backend
- **Execution count API** — batch `COUNT(*)` query on
`AgentGraphExecution` grouped by `agentGraphId` for the current user,
avoiding loading full execution rows. Wired into `list_library_agents`
and `list_favorite_library_agents` via `execution_count_override` on
`LibraryAgent.from_db()`

### UI Polish
- Subtler gradient on AgentBriefingPanel (reduced opacity on background
+ animated border)
- Consistent button styles across all action buttons (icon-left, same
sizing)
- Removed duplicate "Open in builder" menu item (kept "Edit agent")
- "Recently completed" tab replaces "Listening" in briefing panel,
showing agents with completed runs in last 72h

## Changes

### Backend
- `backend/api/features/library/db.py` — added
`_fetch_execution_counts()` batch COUNT query, wired into list endpoints
- `backend/api/features/library/model.py` — added
`execution_count_override` param to `LibraryAgent.from_db()`

### Frontend — New files
- `EditNameDialog/EditNameDialog.tsx` — modal to update display name via
Supabase auth
- `PulseChips/PulseChips.module.css` — hover-reveal animation + glass
panel styles

### Frontend — Modified files
- `EmptySession.tsx` — added EditNameDialog and PulseChips
- `PulseChips.tsx` — redesigned with See/Ask buttons, hover overlay on
desktop
- `usePulseChips.ts` — added agentID for deep-linking
- `AgentBriefingPanel.tsx` — subtler gradient, adjusted padding
- `AgentBriefingPanel.module.css` — reduced conic gradient opacity
- `BriefingTabContent.tsx` — added "completed" tab routing
- `StatsGrid.tsx` — replaced Listening with Recently completed,
reordered tabs
- `SitrepItem.tsx` — consistent button styles, "See task" link for
completed items, updated copilot prompt
- `ContextualActionButton.tsx` — icon-left, smaller icon, renamed Run to
Start
- `LibraryAgentCard.tsx` — icon-left on all buttons, EyeIcon for See
tasks
- `AgentCardMenu.tsx` — removed duplicate "Open in builder"
- `useAgentStatus.ts` — added completed count to FleetSummary
- `useLibraryFleetSummary.ts` — added recent completion tracking
- `types.ts` — added `completed` to FleetSummary and AgentStatusFilter

## Test plan
- [ ] Library page renders Agent Briefing Panel with stats grid
- [ ] "Recently completed" tab shows agents with completed runs in last
72h
- [ ] Agent cards show real execution counts (not 0)
- [ ] Action buttons have consistent styling with icon on the left
- [ ] "See task" on completed items deep-links to agent page with
execution selected
- [ ] "Ask AutoPilot" generates last-run-specific prompt for completed
items
- [ ] Copilot empty state shows PulseChips with hover-reveal actions on
desktop
- [ ] PulseChips show See/Ask buttons always on touch screens
- [ ] Pencil icon on greeting opens edit name dialog
- [ ] Name update persists via Supabase and refreshes greeting
- [ ] `pnpm format && pnpm lint && pnpm types` pass
- [ ] `poetry run format` passes for backend changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: John Ababseh <jababseh7@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Bentlybro <Github@bentlybro.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-16 17:32:17 +07:00
Zamil Majdy
d357956d98 refactor(backend/copilot): make session-file helper fns public to fix Pyright warnings (#12812)
## Why
After PR #12804 was squashed into dev, two module-level helper functions
in `backend/copilot/sdk/service.py` remained private (`_`-prefixed)
while being directly imported by name in `sdk/transcript_test.py`.
Pyright reports `reportAttributeAccessIssue` when tests (even those
excluded from CI lint) import private symbols from outside their
defining module.

## What
Rename two helpers to remove the underscore prefix:
- `_process_cli_restore` → `process_cli_restore`
- `_read_cli_session_from_disk` → `read_cli_session_from_disk`

Update call sites in `service.py` and imports/calls/docstrings in
`sdk/transcript_test.py`.

## How
Pure rename — no logic change. Both functions were already module-level
helpers with no reason to be private; the underscore was convention
carried over during the refactor but they are directly unit-tested and
should be public.

All 66 `sdk/transcript_test.py` tests pass after the rename.

## Checklist
- [x] Tests pass (`poetry run pytest
backend/copilot/sdk/transcript_test.py`)
- [x] No `_`-prefixed symbols imported across module boundaries
- [x] No linter suppressors added
2026-04-16 17:00:02 +07:00
Zamil Majdy
697ffa81f0 fix(backend/copilot): update transcript_test to use strip_for_upload after upload_cli_session removal 2026-04-16 16:17:02 +07:00
Zamil Majdy
2b4727e8b2 chore: merge master into dev, resolve baseline/transcript conflicts
Conflicts in baseline/service.py, baseline/transcript_integration_test.py,
and transcript.py arose because dev-only commit 0cd0a76305
(baseline upload fix) overlapped with the same fix in PR #12804 which
landed in master. Took master's version for all three files — it is the
complete, reviewed implementation.
2026-04-16 15:38:46 +07:00
Zamil Majdy
0d4b31e8a1 refactor(backend/copilot): unified transcript context — extract_context_messages, mode-gated --resume, compaction-aware gap-fill (#12804)
### Why / What / How

**Why:** The copilot had two separate GCS paths (`cli-sessions/` and
`chat-transcripts/`), redundant function names
(`upload_cli_session`/`restore_cli_session`), and no shared context
strategy between modes. When switching from baseline→SDK or
SDK→baseline, the receiving mode discarded the stored transcript and
fell back to full DB reconstruction — loading all raw messages instead
of the compacted form — causing inflated context, wasted tokens, and
loss of CLI compaction summaries.

**What:**
- Single GCS path (`cli-sessions/`) for both modes — `chat-transcripts/`
removed
- Unified public API: `upload_transcript` / `download_transcript` /
`TranscriptDownload`
- `TranscriptMode = Literal["sdk", "baseline"]` persisted in
`.meta.json` — SDK skips `--resume` when `mode != "sdk"`
(baseline-written JSONL has stripped fields / synthetic IDs)
- `extract_context_messages(download, session_messages)` — shared
context primitive used by **both SDK and baseline**: reads compacted
transcript content + fills only the DB gap (messages after watermark),
so CLI compaction summaries are preserved across mode switches
- Watermark fix: `_jsonl_covered = transcript_msg_count + 2` when a real
transcript is present, preventing false gap detection after `--resume`
- Baseline gap-fill: `_append_gap_to_builder` converts `ChatMessage` →
JSONL entries; no more silently discarded stale transcripts

**How:**

```
SDK turn (mode="sdk" transcript available):
  ──► --resume  [full CLI session restored natively]
  ──► inject gap prefix if DB has messages after watermark

SDK turn (mode="baseline" transcript available):
  ──► cannot --resume (synthetic CLI IDs)
  ──► extract_context_messages(download, session_messages):
        returns transcript JSONL (compacted, isCompactSummary preserved) + gap
        excludes session_messages[-1] (current turn — caller injects it separately)
  ──► format as <conversation_history> + "Now, the user says: {current}"

Baseline turn (any transcript):
  ──► _load_prior_transcript → TranscriptDownload
  ──► extract_context_messages(download, session_messages) + session_messages[-1]
        replaces full session.messages DB read
  ──► LLM messages: [compacted history + gap] + [current user turn]

Transcript unavailable — both SDK (use_resume=False) and baseline:
  ──► extract_context_messages(None, session_messages) returns session_messages[:-1]
        (all prior DB messages except the current user turn at [-1])
  ──► graceful fallback — no crash, no empty context
  ──► covers: first turn, GCS error, corrupt JSONL, missing .meta.json
  ──► next successful response uploads a fresh transcript
```

`extract_context_messages` is the shared primitive — both modes call the
same function, which handles:
- `download=None` (first turn, GCS unavailable) → falls back to
`session_messages[:-1]`
- Empty/corrupt content → falls back to `session_messages[:-1]`
- `bytes` content (raw GCS) or `str` content (pre-decoded baseline path)
- `isCompactSummary=True` entries → preserved so CLI compaction survives
mode switches
- Missing/corrupt `.meta.json` → `message_count` defaults to `0`, `mode`
defaults to `"sdk"`

**Why `[:-1]` and not all messages?** `session_messages[-1]` is always
the current user turn being handled right now. Both callers inject it
separately — SDK wraps it as `"Now, the user says: ..."`, baseline
appends it as the final message in the LLM array. Returning it inside
`extract_context_messages` would double-inject it.

### Changes 🏗️

- **`transcript.py`**: `CliSessionRestore` → `TranscriptDownload` +
`mode` field; `upload_cli_session` → `upload_transcript`;
`restore_cli_session` → `download_transcript`; add `TranscriptMode`,
`detect_gap`, `extract_context_messages`; import `ChatMessage` via
relative path to match `service.py` style
- **`sdk/service.py`**: mode-check before `--resume`; `_RestoreResult`
carries `baseline_download` + `context_messages` + `transcript_content`;
`_build_query_message` accepts `prior_messages` override;
`_restore_cli_session_for_turn` populates `context_messages` via
`extract_context_messages` and sets `transcript_content` to prevent
duplicate DB reconstruction; watermark fix (`_jsonl_covered =
transcript_msg_count + 2`)
- **`baseline/service.py`**: `_load_prior_transcript` returns `(bool,
TranscriptDownload | None)`; LLM context replaced with
`extract_context_messages(download, messages)`; `_append_gap_to_builder`
+ `detect_gap` call; `upload_transcript(mode="baseline")`
- **`sdk/transcript.py`**: updated re-exports, old aliases removed
- **`scripts/download_transcripts.py`**: updated for `bytes | str`
content type
- **Test files**: 179 tests total; `transcript_test.py`,
`baseline/transcript_integration_test.py`,
`sdk/service_helpers_test.py`, `sdk/test_transcript_watermark.py`,
`test/copilot/test_transcript_watermark.py` all updated/added

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] 179 unit tests pass — `transcript_test`,
`baseline/transcript_integration_test`, `sdk/service_helpers_test`,
`sdk/test_transcript_watermark`
  - [x] pyright 0 errors on all changed files
- [x] SDK `--resume` path still works when `mode="sdk"` transcript is
present
- [x] SDK fallback uses `extract_context_messages` (compacted baseline
content + gap) when `mode="baseline"` transcript is stored — no more
full DB reconstruction
- [x] Baseline uses `extract_context_messages` per turn instead of full
`session.messages` DB read
  - [x] `isCompactSummary=True` entries preserved across mode switches
- [x] Watermark (`_jsonl_covered`) fix prevents false gap detection
after `--resume`
- [x] Baseline gap detection no longer silently discards stale
transcripts
- [x] `TranscriptDownload.content` accepts `bytes | str` — backward
compatible
- [x] Transcript unavailable (GCS error, first turn, corrupt file)
gracefully falls back to `session_messages[:-1]` without crash — applies
to both SDK and baseline paths

---------

Co-authored-by: chernistry <73943355+chernistry@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-16 15:35:18 +07:00
Zamil Majdy
0cd0a76305 fix(backend/copilot): baseline always uploads when GCS has no transcript
_load_prior_transcript was returning False for missing/invalid transcripts,
which caused should_upload_transcript to suppress the upload. The original
intent was to protect against overwriting a *newer* GCS version — but a
missing or corrupt file is not 'newer'. Only stale (watermark ahead) and
download errors (unknown GCS state) should suppress upload.

Also renames transcript_covers_prefix → transcript_upload_safe throughout
to accurately describe what the flag means.
2026-04-16 14:58:42 +07:00
Toran Bruce Richards
d01a51be0e Add check for GitHub account connection status (#12807)
Added instruction to check GitHub authentication status before prompting
user. This prevents repeated, unnecessary asking of the user to add
their GitHub credentials when they're already added, which is currently
a prevalent bug.

### Changes 🏗️
- Added one line to
`autogpt_platform/backend/backend/copilot/prompting.py` instructing
AutoPilot to run `gh auth status` before prompting the user to connect
their GitHub account.

Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
2026-04-16 12:09:00 +07:00
chernistry
bd2efed080 fix(frontend): allow zooming out more in the builder (#12690)
Reduced minZoom on the builder canvas from 0.1 to 0.05 to allow zooming
out further when working with large agent graphs.

Fixes #9325

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-15 21:25:07 +00:00
Zamil Majdy
5fccd8a762 Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-04-16 01:23:07 +07:00
Zamil Majdy
2740b2be3a fix(backend/copilot): disable fallback model to fix prod CLI rejection (#12802)
### Why / What / How

**Why:** `fffbe0aad8` changed both `ChatConfig.model` and
`ChatConfig.claude_agent_fallback_model` to `claude-sonnet-4-6`. The
Claude Code CLI rejects this with `Error: Fallback model cannot be the
same as the main model`, causing every standard-mode copilot turn to
fail with exit code 1 — the session "completes" in ~30s but produces no
response and drops the transcript.

**What:** Set `claude_agent_fallback_model` default to `""`.
`_resolve_fallback_model()` already returns `None` on empty string,
which means the `--fallback-model` flag is simply not passed to the CLI.
On 529 overload errors the turn will surface normally instead of
silently retrying with a fallback.

**How:** One-line config change + test update.

### Changes 🏗️

- `ChatConfig.claude_agent_fallback_model` default:
`"claude-sonnet-4-6"` → `""`
- Update `test_fallback_model_default` to assert the empty default

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `poetry run pytest backend/copilot/sdk/p0_guardrails_test.py`

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
2026-04-16 01:22:20 +07:00
Zamil Majdy
d27d22159d Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-04-16 00:05:32 +07:00
Nicholas Tindle
fffbe0aad8 fix(backend): default copilot sonnet to 4.6 (#12799)
### Why / What / How

Why: Copilot/Autopilot standard requests were still defaulting to Claude
Sonnet 4, while the expected default for this path is Sonnet 4.6.

What: This PR updates the backend Copilot defaults so the
standard/default path and fast path use Sonnet 4.6, and aligns the SDK
fallback model and related test expectations.

How: It changes `ChatConfig.model`, `ChatConfig.fast_model`, and
`ChatConfig.claude_agent_fallback_model` to Sonnet 4.6 values, then
updates backend tests that assert the default Sonnet model strings.

### Changes 🏗️

- Switch `ChatConfig.model` from `anthropic/claude-sonnet-4` to
`anthropic/claude-sonnet-4-6`
- Switch `ChatConfig.fast_model` from `anthropic/claude-sonnet-4` to
`anthropic/claude-sonnet-4-6`
- Switch `ChatConfig.claude_agent_fallback_model` from
`claude-sonnet-4-20250514` to `claude-sonnet-4-6`
- Update backend Copilot tests that assert the default Sonnet model
strings
- Configuration changes:
  - No new environment variables or docker-compose changes are required
- Existing `.env.default` and compose files remain compatible because
this only changes backend default model values in code

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `poetry run format`
- [x] `poetry run pytest
backend/copilot/baseline/transcript_integration_test.py`
  - [x] `poetry run pytest backend/copilot/sdk/service_helpers_test.py`
  - [x] `poetry run pytest backend/copilot/sdk/service_test.py`
  - [x] `poetry run pytest backend/copilot/sdk/p0_guardrails_test.py`

<details>
  <summary>Example test plan</summary>
  
  - [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
  - [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
  - [ ] Edit an agent from monitor, and confirm it executes correctly
</details>

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<details>
  <summary>Examples of configuration changes</summary>

  - Changing ports
  - Adding new services that need to communicate with each other
  - Secrets or environment variable changes
  - New or infrastructure changes such as databases
</details>

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes default/fallback LLM model identifiers for Copilot requests,
which can affect runtime behavior, cost, and availability
characteristics across both baseline and SDK paths. Risk is mitigated by
being a small, config-only change with updated tests.
> 
> **Overview**
> Updates Copilot backend defaults so both the standard (`model`) and
fast (`fast_model`) paths use `anthropic/claude-sonnet-4-6`, and aligns
the Claude Agent SDK fallback model to `claude-sonnet-4-6`.
> 
> Adjusts related test expectations in baseline transcript integration
and SDK helper tests to match the new Sonnet 4.6 model strings.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
563361ac11. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-04-15 16:53:30 +00:00
Zamil Majdy
df205b5444 fix(backend/copilot): strip CLI session file to prevent auto-compaction context loss
The Claude Code CLI auto-compacts its native session JSONL when the context
approaches the model's token limit (~200K for Sonnet).  After compaction the
detailed conversation history is replaced by a ~27K-token summary, causing
the silent context loss users see as memory failures in long sessions.

Root cause identified from production logs for session 93ecf7c9:
- T6 CLI session: 233KB / ~207K tokens (near Sonnet limit)
- T7 CLI compacted session -> ~167KB / ~47K tokens (PreCompact hook missed)
- T12 second compaction -> ~176KB / ~27K tokens (just system prompt + summary)
- T14-T21: cache_read=26714 constantly -- only system prompt visible to Claude

The same stripping we already apply to our transcript (stale thinking blocks,
progress/metadata entries) now also runs on the CLI native session file.  At
~2x the size of the stripped transcript, unstripped sessions routinely hit the
compaction threshold within 6-10 turns of a heavy Opus/thinking session.
After stripping:
- same-pod turns reuse the stripped local file (no compaction trigger)
- cross-pod turns restore the stripped GCS file (same benefit)
2026-04-15 23:19:12 +07:00
majdyz
4efa1c4310 fix(copilot): set session_id on mode-switch T1 to enable --resume on subsequent turns
When a user switches from baseline (fast) mode to SDK (extended_thinking)
mode mid-session, the first SDK turn has has_history=True (prior baseline
messages in DB) but no CLI session file in storage.

The old code gated session_id on `not has_history`, so mode-switch T1
never received a session_id — the CLI generated a random ID that wasn't
uploaded under the expected key.  Every subsequent SDK turn would fail to
restore the CLI session and run without --resume, injecting the full
compressed history on each turn, causing model confusion.

Fix: set session_id whenever not using --resume (the `else` branch),
covering T1 fresh, mode-switch T1, and T2+ fallback turns.  The retry
path is updated to use `"session_id" in sdk_options_kwargs` as the
discriminator (instead of `not has_history`) so mode-switch T1 retries
also keep the session_id while T2+ retries (where T1 restored a session
file via restore_cli_session) still remove it to avoid "Session ID
already in use".
2026-04-15 23:19:11 +07:00
Nicholas Tindle
ab3221a251 feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening (#12765)
### Why / What / How

**Why:** CoPilot's Graphiti memory system needed structured metadata to
distinguish memory types (rules, procedures, facts, preferences),
support scoped retrieval, enable targeted deletion, and track memory
costs under the AutoPilot billing account separately from the platform.

**What:** Adds the MemoryEnvelope metadata model, structured
rule/procedure memory types, a derived-finding lane for
assistant-distilled knowledge, two-step forget tools, scope-aware
retrieval filtering, AutoPilot-dedicated API key routing, and several
reliability fixes (streaming socket leaks, event-loop-scoped caches,
ingestion hardening).

**How:** MemoryEnvelope wraps every stored episode with typed metadata
(source_kind, memory_kind, scope, status, confidence) serialized as
JSON. Retrieval filters by scope at the context layer. The forget flow
uses a search-then-confirm two-step pattern. Ingestion queues and client
caches are scoped per event loop via WeakKeyDictionary to prevent
cross-loop RuntimeErrors in multi-worker deployments. API key resolution
falls back to AutoPilot-dedicated keys (CHAT_API_KEY,
CHAT_OPENAI_API_KEY) before platform-wide keys.

### Changes 🏗️

**New: MemoryEnvelope metadata model** (`memory_model.py`)
- Typed memory categories: fact, preference, rule, finding, plan, event,
procedure
- Source tracking: user_asserted, assistant_derived, tool_observed
- Scope namespacing: `real:global`, `project:<name>`, `book:<title>`,
`session:<id>`
- Status lifecycle: active, tentative, superseded, contradicted
- Structured `RuleMemory` and `ProcedureMemory` models for complex
instructions

**New: Targeted forget tools** (`graphiti_forget.py`)
- `memory_forget_search`: returns candidate facts with UUIDs for user
confirmation
- `memory_forget_confirm`: deletes specific edges by UUID after
confirmation

**New: Architecture test** (`architecture_test.py`)
- Validates no new `@cached(...)` usage around event-loop-bound async
clients
- Allowlists pre-existing violations for future cleanup

**Enhanced: memory_store tool** (`graphiti_store.py`)
- Accepts MemoryEnvelope metadata fields (source_kind, scope,
memory_kind, rule, procedure)
- Wraps content in MemoryEnvelope before ingestion

**Enhanced: memory_search tool** (`graphiti_search.py`)
- Scope-aware retrieval with hard filtering on group_id

**Enhanced: Ingestion pipeline** (`ingest.py`)
- Derived-finding lane: distills substantive assistant responses into
tentative findings
- Event-loop-scoped queues and workers via WeakKeyDictionary (fixes
multi-worker RuntimeError)
- Improved error handling and dropped-episode reporting

**Enhanced: Client cache** (`client.py`)
- Per-loop client cache and lock via WeakKeyDictionary (fixes "Future
attached to a different loop")

**Enhanced: Warm context** (`context.py`)
- Filters out non-global-scope episodes from warm context

**Fix: Streaming socket leak** (`baseline/service.py`)
- try/finally around async stream iteration to release httpx connections
on early exit

**Config: AutoPilot key routing** (`config.py`, `.env.default`)
- LLM key fallback: GRAPHITI_LLM_API_KEY → CHAT_API_KEY →
OPEN_ROUTER_API_KEY
- Embedder key fallback: GRAPHITI_EMBEDDER_API_KEY → CHAT_OPENAI_API_KEY
→ OPENAI_API_KEY
- Backwards-compatible: existing behavior unchanged until new keys are
provisioned

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/copilot/graphiti/config_test.py` — 16
tests pass (key fallback priority)
- [x] `poetry run pytest backend/copilot/tools/graphiti_store_test.py` —
store envelope tests pass
- [x] `poetry run pytest backend/copilot/graphiti/ingest_test.py` —
ingestion tests pass
- [x] `poetry run pytest backend/util/architecture_test.py` — structural
validation passes
  - [x] Verify memory store/retrieve/forget cycle via copilot chat
- [x] Run AgentProbe multi-session memory benchmark (31 scenarios x3
repeats)
- [x] Confirm no CLOSE_WAIT socket accumulation under sustained
streaming load
- [x] Verify multi-worker deployment doesn't produce loop-binding errors

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- Configuration changes:
- New optional env var `CHAT_OPENAI_API_KEY` — AutoPilot-dedicated
OpenAI key for Graphiti embeddings (falls back to `OPENAI_API_KEY` if
not set)
- `CHAT_API_KEY` now used as first fallback for Graphiti LLM calls (was
`OPEN_ROUTER_API_KEY`)
- Infra action needed: add `CHAT_OPENAI_API_KEY` sealed secret in
`autogpt-shared-config` values (dev + prod)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches Graphiti memory ingestion/retrieval and introduces hard-delete
capabilities plus event-loop–scoped caching/queues; failures could
affect memory correctness or delete the wrong edges. Also changes
streaming resource cleanup and key routing, which could surface as
connection or billing/cost attribution issues if misconfigured.
> 
> **Overview**
> **Graphiti memory is upgraded from plain text episodes to a structured
JSON `MemoryEnvelope`.** `memory_store` now wraps content with typed
metadata (source, kind, scope, status) and optional structured
`rule`/`procedure` payloads, and ingestion supports JSON episodes.
> 
> **Memory retrieval and lifecycle controls are expanded.**
`memory_search` adds optional scope hard-filtering to prevent
cross-scope leakage, warm-context formatting drops non-global scoped
episodes (and avoids empty wrappers), and new two-step tools
(`memory_forget_search` → `memory_forget_confirm`) enable targeted soft-
or hard-deletion of specific graph edges by UUID.
> 
> **Reliability and multi-worker safety improvements.** Graphiti client
caching and ingestion worker registries are now per-event-loop (avoiding
cross-loop `Future` errors), streaming chat completions explicitly close
async streams to prevent `CLOSE_WAIT` socket leaks, warm-context is
injected into the first user message to keep the system prompt
cacheable, and a new `architecture_test.py` blocks future process-wide
caching of event-loop–bound async clients. Config updates route Graphiti
LLM/embedder keys to AutoPilot-specific env vars first, and OpenAPI
schema exports include the new memory response types.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
5fb4bd0a43. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-15 09:40:43 -05:00
Zamil Majdy
b2f7faabc7 fix(backend/copilot): pre-create assistant msg before first yield to prevent last_role=tool (#12797)
## Changes

**Root cause:** When a copilot session ends with a tool result as the
last saved message (`last_role=tool`), the next assistant response is
never persisted. This happens when:

1. An intermediate flush saves the session with `last_role=tool` (after
a tool call completes)
2. The Claude Agent SDK generates a text response for the next turn
3. The client disconnects (`GeneratorExit`) at the `yield
StreamStartStep` — the very first yield of the new turn
4. `_dispatch_response(StreamTextDelta)` is never called, so the
assistant message is never appended to `ctx.session.messages`
5. The session `finally` block persists the session still with
`last_role=tool`

**Fix:** In `_run_stream_attempt`, after `convert_message()` returns the
full list of adapter responses but *before* entering the yield loop,
pre-create the assistant message placeholder in `ctx.session.messages`
when:
- `acc.has_tool_results` is True (there are pending tool results)
- `acc.has_appended_assistant` is True (at least one prior message
exists)
- A `StreamTextDelta` is present in the batch (confirms this is a text
response turn)

This ensures that even if `GeneratorExit` fires at the first `yield`,
the placeholder assistant message is already in the session and will be
persisted by the `finally` block.

**Tests:** Added `session_persistence_test.py` with 7 unit tests
covering the pre-create condition logic and delta accumulation behavior.

**Confirmed:** Langfuse trace `e57ebd26` for session
`465bf5cf-7219-4313-a1f6-5194d2a44ff8` showed the final assistant
response was logged at 13:06:49 but never reached DB — session had 51
messages with `last_role=tool`.

## Checklist

- [x] My code follows the code style of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation (N/A)
- [x] My changes generate no new warnings (Pyright warnings are
pre-existing)
- [x] I have added tests that prove my fix is effective
- [x] New and existing unit tests pass locally with my changes

---------

Co-authored-by: Zamil Majdy <zamilmajdy@gmail.com>
2026-04-15 21:09:44 +07:00
Zamil Majdy
c9fa6bcd62 fix(backend/copilot): make system prompt fully static for cross-user prompt caching (#12790)
### Why / What / How

**Why:** Anthropic prompt caching keys on exact system prompt content.
Two sources of per-session dynamic data were leaking into the system
prompt, making it unique per session/user — causing a full 28K-token
cache write (~$0.10 on Sonnet) on *every* first message for *every*
session instead of once globally per model.

**What:**
1. `get_sdk_supplement` was embedding the session-specific working
directory (`/tmp/copilot-<uuid>`) in the system prompt text. Every
session has a different UUID, making every session's system prompt
unique, blocking cross-session cache hits.
2. Graphiti `warm_ctx` (user-personalised memory facts fetched on the
first turn) was appended directly to the system prompt, making it unique
per user per query.

**How:**
- `get_sdk_supplement` now uses the constant placeholder
`/tmp/copilot-<session-id>` in the supplement text and memoizes the
result. The actual `cwd` is still passed to `ClaudeAgentOptions.cwd` so
the CLI subprocess uses the correct session directory.
- `warm_ctx` is now injected into the first user message as a trusted
`<memory_context>` block (prepended before `inject_user_context` runs),
following the same pattern already used for business understanding. It
is persisted to DB and replayed correctly on `--resume`.
- `sanitize_user_supplied_context` now also strips user-supplied
`<memory_context>` tags, preventing context-spoofing via the new tag.

After this change the system prompt is byte-for-byte identical across
all users and sessions for a given model.

### Changes 🏗️

- `backend/copilot/prompting.py`: `get_sdk_supplement` ignores `cwd` and
uses a constant working-directory placeholder; result is memoized in
`_LOCAL_STORAGE_SUPPLEMENT`.
- `backend/copilot/sdk/service.py`: `warm_ctx` is saved to a local
variable instead of appended to `system_prompt`; on the first turn it is
prepended to `current_message` as a `<memory_context>` block before
`inject_user_context` is called.
- `backend/copilot/service.py`: `sanitize_user_supplied_context`
extended to strip `<memory_context>` blocks alongside `<user_context>`.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `poetry run pytest backend/copilot/prompting_test.py
backend/copilot/prompt_cache_test.py` — all passed

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Zamil Majdy <zamilmajdy@gmail.com>
2026-04-15 20:40:24 +07:00
Krzysztof Czerwinski
c955b3901c fix(frontend/copilot): load older chat messages reliably and preserve scrollback across turns (#12792)
### Why / What / How

Fixes two SECRT-2226 bugs in copilot chat pagination.

**Bug 1 — can't load older messages when the newest page fits on
screen.** The `IntersectionObserver` in `LoadMoreSentinel` bailed when
`scrollHeight <= clientHeight`, which happens routinely once reasoning +
tool groups collapse. With no scrollbar and no button, users were stuck.
Fix: remove the guard, cap auto-fill at 3 non-scrollable rounds (keeps
the original anti-loop intent), and add a manual "Load older messages"
button as the always-available escape hatch.

**Bug 2 — older loaded pages vanish after a new turn, then reloading
them produces duplicates.** After each stream `useCopilotStream`
invalidates the session query; the refetch returns a shifted
`oldest_sequence`, which `useLoadMoreMessages` used as a signal to wipe
`olderRawMessages` and reset the local cursor. Scroll-back history was
lost on every turn, and the next load fetched a page that overlapped
with AI SDK's retained `currentMessages` — the "loops" users reported.
Fix: once any older page is loaded, preserve `olderRawMessages` and the
local cursor across same-session refetches. Only reset on session
change. The gap between the new initial window and older pages is
covered by AI SDK's retained state.

### Changes 🏗️

- `ChatMessagesContainer.tsx`: drop the scrollability guard; add
`MAX_AUTO_FILL_ROUNDS = 3` counter; add "Load older messages" button
(`ghost`/`small`); distinguish observer-triggered vs. button-triggered
loads so the button bypasses the cap; export `LoadMoreSentinel` for
testing.
- `useLoadMoreMessages.ts`: remove the wipe-and-reset branch on
`initialOldestSequence` change; preserve local state mid-session; still
mirror parent's cursor while no older page is loaded.
- New integration test `__tests__/LoadMoreSentinel.test.tsx`.

No backend changes.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Short/collapsed newest page: "Load older messages" button loads
older pages, preserves scroll
- [x] Full-viewport newest page: scroll-to-top auto-pagination still
works (no regression)
- [x] `has_more_messages=false` hides the button; `isLoadingMore=true`
shows spinner instead
- [x] Bug 2 reproduced locally with temporary `limit=5`: before fix
older page vanished and next load duplicated AI SDK messages; after fix
older page stays and next load fetches cleanly further back
- [x] `pnpm format`, `pnpm lint`, `pnpm types`, `pnpm test:unit` all
pass (1208/1208)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**) — N/A

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:14:59 +00:00
Zamil Majdy
56864aea87 fix(copilot/frontend): align ModelToggleButton styling + add execution ID filter to platform cost page (#12793)
## Why

Two fixes bundled together:

1. **ModelToggleButton styling**: after merging the ModelToggleButton
feature, the "Standard" state was invisible — no background, no label —
while "Advanced" had a colored pill. This was inconsistent with
`ModeToggleButton` where both states (Fast / Thinking) always show a
colored background + label.

2. **Execution ID filter on platform cost admin page**: admins needed to
look up cost rows for a specific agent run but had no way to filter by
`graph_exec_id`. All other identifiers (user, model, provider, block,
tracking type) were already filterable.

## What

- **ModelToggleButton**: inactive (Standard) state now uses
`bg-neutral-100 text-neutral-700 hover:bg-neutral-200` (same palette as
ModeToggleButton inactive), always shows the "Standard" label.
- **Platform cost admin page**: added `graph_exec_id` query filter
across the full stack — backend service functions, FastAPI route
handlers, generated TypeScript params types, `usePlatformCostContent`
hook, and the filter UI in `PlatformCostContent`.

## How

### ModelToggleButton

Changed the inactive-state class from hover-only transparent to
always-visible neutral background, and added the "Standard" text label
(was empty before — only the CPU icon showed).

### Execution ID filter

Added `graph_exec_id: str | None = None` parameter to:
- `_build_prisma_where` — applies `where["graphExecId"] = graph_exec_id`
- `get_platform_cost_dashboard`, `get_platform_cost_logs`,
`get_platform_cost_logs_for_export`
- All three FastAPI route handlers (`/dashboard`, `/logs`,
`/logs/export`)
- Generated TypeScript params types
- `usePlatformCostContent`: new `executionIDInput` /
`setExecutionIDInput` state, wired into `filterParams`, `handleFilter`,
and `handleClear`
- `PlatformCostContent`: new Execution ID input field in the filter bar

## Changes

- [x] I have explained why I made the changes, not just what I changed
- [x] There are no unrelated changes in this PR
- [x] I have run the relevant linters and tests before submitting

---------

Co-authored-by: Zamil Majdy <zamilmajdy@gmail.com>
2026-04-15 20:20:55 +07:00
Zamil Majdy
d23ca824ad fix(copilot): set session_id on mode-switch T1 to enable --resume on subsequent SDK turns (#12795)
## Why

When a user switches from **baseline** (fast) mode to **SDK**
(extended_thinking) mode mid-session, every subsequent SDK turn started
fresh with no memory of prior conversation.

Root cause: two complementary bugs on mode-switch T1 (first SDK turn
after baseline turns):
1. `session_id` was gated on `not has_history`. On mode-switch T1,
`has_history=True` (prior baseline turns in DB) so no `session_id` was
set. The CLI generated a random ID and could not upload the session file
under a predictable path → `--resume` failed on every following SDK
turn.
2. Even if `session_id` were set, the upload guard `(not has_history or
state.use_resume)` would block the session file upload on mode-switch T1
(`has_history=True`, `use_resume=False`), so the next turn still cannot
`--resume`.

Together these caused every SDK turn to re-inject the full compressed
history, causing model confusion (proactive tool calls, forgetting
context) observed in session `8237a27b-45d0-4688-af20-c185379e926f`.

## What

- **`service.py`**: Change `elif not has_history:` → `else:` for the
`session_id` assignment — set it whenever `--resume` is not active.
Covers T1 fresh, mode-switch T1 (`has_history=True` but no CLI session
exists), and T2+ fallback turns where restore failed.
- **`service.py` retry path**: Replace `not has_history` with
`"session_id" in sdk_options_kwargs` as the discriminator, so
mode-switch T1 retries also keep `session_id` while T2+ retries (where
`restore_cli_session` put a file on disk) correctly remove it to avoid
"Session ID already in use".
- **`service.py` upload guard**: Remove `and not skip_transcript_upload`
and `and (not has_history or state.use_resume)` from the
`upload_cli_session` guard. The CLI session file is independent of the
JSONL transcript; and upload must run on mode-switch T1 so the next turn
can `--resume`. `upload_cli_session` silently skips when the file is
absent, so unconditional upload is always safe.

## How

| Scenario | Before | After |
|---|---|---|
| T1 fresh (`has_history=False`) | `session_id` set ✓ | `session_id` set
✓ |
| Mode-switch T1 (`has_history=True`, no CLI session) |  not set —
**bug** | `session_id` set ✓ |
| T2+ with `--resume` | `resume` set ✓ | `resume` set ✓ |
| T2+ retry after `--resume` failed | `session_id` removed ✓ |
`session_id` removed ✓ |
| Mode-switch T1 retry | `session_id` removed  | `session_id` kept ✓ |
| Upload on mode-switch T1 |  blocked by guard — **bug** | uploaded ✓ |

7 new unit tests in `TestSdkSessionIdSelection` document all session_id
cases.
6 new tests in `mode_switch_context_test.py` cover transcript bridging
for both fast→SDK and SDK→fast switches.

## Checklist

- [x] I have read the contributing guidelines
- [x] My changes are covered by tests
- [x] `poetry run format` passes

---------

Co-authored-by: Zamil Majdy <zamilmajdy@gmail.com>
2026-04-15 19:03:18 +07:00
Zamil Majdy
227c60abd3 fix(backend/copilot): idempotency guard + frontend dedup fix for duplicate messages (#12788)
## Why

After merging #12782 to dev, a k8s rolling deployment triggered
infrastructure-level POST retries — nginx detected the old pod's
connection reset mid-stream and resent the same POST to a new pod. Both
pods independently saved the user message and ran the executor,
producing duplicate entries in the DB (seq 159, 161, 163) and a
duplicate response in the chat. The model saw the same question 3× in
its context window and spent its response commenting on that instead of
answering.

Two compounding issues:
1. **No backend idempotency**: `append_and_save_message` saves
unconditionally — k8s/nginx retries silently produce duplicate turns.
2. **Frontend dedup cleared after success**:
`lastSubmittedMsgRef.current = null` after every completed turn wipes
the dedup guard, so any rapid re-submit of the same text (from a stalled
UI or user double-click) slips through.

## What

**Backend** — Redis idempotency gate in `stream_chat_post`:
- Before saving the user message, compute `sha256(session_id +
message)[:16]` and `SET NX ex=30` in Redis
- If key already exists → duplicate: return empty SSE (`StreamFinish +
[DONE]`) immediately, skip save + executor enqueue
- User messages only (`is_user_message=True`); system/assistant messages
bypass the check

**Frontend** — Keep `lastSubmittedMsgRef` populated after success:
- Remove `lastSubmittedMsgRef.current = null` on stream complete
- `getSendSuppressionReason` already has a two-condition check: `ref ===
text AND lastUserMsg === text` — so legitimate re-asks (after a
different question was answered) still work; only rapid re-sends of the
exact same text while it's still the last user message are blocked

## How

- 30 s Redis TTL covers infrastructure retry windows (k8s SIGTERM →
connection reset → ingress retry typically < 5 s)
- Empty SSE response is well-formed (StreamFinish + [DONE]) — frontend
AI SDK marks the turn complete without rendering a ghost message
- Frontend ref kept live means: submit "foo" → success → submit "foo"
again instantly → suppressed. Submit "foo" → success → submit "bar" →
proceeds (different text updates the ref).

## Tests

- 3 new backend route tests: duplicate blocked, first POST proceeds,
non-user messages bypass
- 5 new frontend `getSendSuppressionReason` unit tests: fresh ref,
reconnecting, duplicate suppressed, different-turn re-ask allowed,
different text allowed

## Checklist

- [x] I have read the [AutoGPT Contributing
Guide](https://github.com/Significant-Gravitas/AutoGPT/blob/master/CONTRIBUTING.md)
- [x] I have performed a self-review of my code
- [x] I have added tests that prove the fix is effective
- [x] I have run `poetry run format` and `pnpm format` + `pnpm lint`
2026-04-15 18:54:59 +07:00
Ubbe
0284614df0 fix(copilot): abort SSE stream and disconnect backend listeners on session switch (#12766)
## Summary

Fixes stream disconnection bugs where the UI shows "running" with no
output when users switch between copilot chat sessions. The root cause
is that the old SSE fetch is not aborted and backend XREAD listeners
keep running until timeout when switching sessions.

### Changes

**Frontend (`useCopilotStream.ts`, `helpers.ts`)**
- Call `sdkStop()` on session switch to abort the in-flight SSE fetch
from the old session's transport
- Fire-and-forget `DELETE` to new backend disconnect endpoint so
server-side listeners release immediately
- Store `resumeStream` and `sdkStop` in refs to fix stale closure bugs
in:
- Wake re-sync visibility handler (could call stale `resumeStream` after
tab sleep)
  - Reconnect timer callback (could target wrong session's transport)
- Resume effect (captured stale `resumeStream` during rapid session
switches)

**Backend (`stream_registry.py`, `routes.py`)**
- Add `disconnect_all_listeners(session_id)` to stream registry —
iterates active listener tasks, cancels any matching the session
- Add `DELETE /sessions/{session_id}/stream` endpoint — auth-protected,
calls `disconnect_all_listeners`, returns 204

### Why

Reported by multiple team members: when using Autopilot for anything
serious, the frontend loses the SSE connection — particularly when
switching between conversations. The backend completes fine (refreshing
shows full output), but the UI gets stuck showing "running". This is the
worst UX bug we have right now because real users will never know to
refresh.

### How to test

1. Start a long-running autopilot task (e.g., "build a snake game")
2. While it's streaming, switch to a different chat session
3. Switch back — the UI should correctly show the completed output or
resume the stream
4. Verify no "stuck running" state

## Test plan

- [ ] Manual: switch sessions during active stream — no stuck "running"
state
- [ ] Manual: background tab for >30s during stream, return — wake
re-sync works
- [ ] Manual: trigger reconnect (kill network briefly) — reconnects to
correct session
- [ ] Verify: `pnpm lint`, `pnpm types`, `poetry run lint` all pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-15 09:50:19 +00:00
708 changed files with 88011 additions and 11275 deletions

View File

@@ -25,6 +25,8 @@ Understand the **Why / What / How** before addressing comments — you need cont
gh pr view {N} --json body --jq '.body'
```
> If GraphQL is rate-limited, `gh pr view` fails. See [GitHub rate limits](#github-rate-limits) for REST fallbacks.
## Fetch comments (all sources)
### 1. Inline review threads — GraphQL (primary source of actionable items)
@@ -109,12 +111,16 @@ Only after this loop completes (all pages fetched, count confirmed) should you b
**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
> If GraphQL is rate-limited, see [GitHub rate limits](#github-rate-limits) for the REST fallback (flat comment list — no thread grouping or `isResolved`).
### 2. Top-level reviews — REST (MUST paginate)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
> **Already REST — unaffected by GraphQL rate limits or outages. Continue polling reviews normally even when GraphQL is exhausted.**
**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
Two things to extract:
@@ -133,6 +139,8 @@ Two things to extract:
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
> **Already REST — unaffected by GraphQL rate limits.**
Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
## For each unaddressed comment
@@ -327,18 +335,65 @@ git push
5. Restart the polling loop from the top — new commits reset CI status.
## GitHub abuse rate limits
## GitHub rate limits
Two distinct rate limits exist — they have different causes and recovery times:
Three distinct rate limits exist — they have different causes, error shapes, and recovery times:
| Error | HTTP code | Cause | Recovery |
|---|---|---|---|
| `{"code":"abuse"}` | 403 | Secondary rate limit — too many write operations (comments, mutations) in a short window | Wait **23 minutes**. 60s is often not enough. |
| `{"message":"API rate limit exceeded"}` | 429 | Primary rate limit — too many API calls per hour | Wait until `X-RateLimit-Reset` header timestamp |
| `{"message":"API rate limit exceeded"}` | 429 | Primary REST rate limit — 5000 calls/hr per user | Wait until `X-RateLimit-Reset` header timestamp |
| `GraphQL: API rate limit already exceeded for user ID ...` | 403 on stderr, `gh` exits 1 | **GraphQL-specific** per-user limit — distinct from REST's 5000/hr and from the abuse secondary limit. Trips faster than REST because point costs per query. | Wait until the GraphQL window resets (typically ~1 hour from the first call in the window). REST still works — use fallbacks below. |
**Prevention:** Add `sleep 3` between individual thread reply API calls. When posting >20 replies, increase to `sleep 5`.
**Recovery from secondary rate limit (403):**
### Detection
The `gh` CLI surfaces the GraphQL limit on stderr with the exact string `GraphQL: API rate limit already exceeded for user ID <id>` and exits 1 — any `gh api graphql ...` **or** `gh pr view ...` call fails. Check current quota and reset time via the REST endpoint that reports GraphQL quota (this call is REST and still works whether GraphQL is rate-limited OR fully down):
```bash
gh api rate_limit --jq '.resources.graphql' # { "limit": 5000, "used": 5000, "remaining": 0, "reset": 1729...}
# Human-readable reset:
gh api rate_limit --jq '.resources.graphql.reset' | xargs -I{} date -r {}
```
Retry when `remaining > 0`. If you need to proceed sooner, sleep 25 min and probe again — the limit is per user, not per machine, so other concurrent agents under the same token also consume it.
### What keeps working
When GraphQL is unavailable (rate-limited or outage):
- **Keeps working (REST):** top-level reviews fetch, conversation comments fetch, all inline-comment replies, CI status (`gh pr checks`), and the `gh api rate_limit` probe.
- **Degraded:** inline thread list — fall back to flat `/pulls/{N}/comments` REST, which drops thread grouping, `isResolved`, and Relay thread IDs. You still get comment bodies and the `databaseId` as `id`, enough to read and reply.
- **Blocked:** `gh pr view`, the `resolveReviewThread` mutation, and any new `gh api graphql` queries — wait for the quota to reset.
### Fall back to REST
**PR metadata reads** — `gh pr view` uses GraphQL under the hood; use the REST pulls endpoint instead, which returns the full PR object:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.body' # == --json body
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.base.ref' # == --json baseRefName
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.mergeable' # == --json mergeable
```
Note: REST `mergeable` returns `true|false|null`; GraphQL returns `MERGEABLE|CONFLICTING|UNKNOWN`. The `null` case maps to `UNKNOWN` — treat it the same (still computing; poll again).
**Inline comments (flat list)** — no thread grouping or `isResolved`, but enough to read and reply:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate \
| jq '[.[] | {id, path, line, user: .user.login, body: .body[:200], in_reply_to_id}]'
```
Use this degraded mode to make progress on the fix → reply loop, then return to GraphQL for `resolveReviewThread` once the rate limit resets.
**Replies** — already REST-native (`/pulls/{N}/comments/{ID}/replies`); no change needed, use the same command as the main flow.
**`resolveReviewThread`** — **no REST equivalent**; GitHub does not expose a REST endpoint for thread resolution. Queue the thread IDs needing resolution, wait for the GraphQL limit to reset, then run the resolve mutations in a batch (with `sleep 3` between calls, per the secondary-limit guidance).
### Recovery from secondary rate limit (403 abuse)
1. Stop all API writes immediately
2. Wait **2 minutes minimum** (not 60s — secondary limits are stricter)
3. Resume with `sleep 3` between each call
@@ -397,6 +452,8 @@ gh api graphql -f query='mutation { resolveReviewThread(input: {threadId: "THREA
**Never call this mutation before committing the fix.** The orchestrator will verify actual unresolved counts via GraphQL after you output `ORCHESTRATOR:DONE` — false resolutions will be caught and you will be re-briefed.
> `resolveReviewThread` is GraphQL-only — no REST equivalent. If GraphQL is rate-limited, see [GitHub rate limits](#github-rate-limits) for the queue-and-retry flow.
### Verify actual count before outputting ORCHESTRATOR:DONE
Before claiming "0 unresolved threads", always query GitHub directly — don't rely on your own bookkeeping. Paginate all pages — a single `first: 100` query misses threads beyond page 1:

View File

@@ -0,0 +1,245 @@
---
name: pr-polish
description: Alternate /pr-review and /pr-address on a PR until the PR is truly mergeable — no new review findings, zero unresolved inline threads, zero unaddressed top-level reviews or issue comments, all CI checks green, and two consecutive quiet polls after CI settles. Use when the user wants a PR polished to merge-ready without setting a fixed number of rounds.
user-invocable: true
argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
---
# PR Polish
**Goal.** Drive a PR to merge-ready by alternating `/pr-review` and `/pr-address` until **all** of the following hold:
1. The most recent `/pr-review` produces **zero new findings** (no new inline comments, no new top-level reviews with a non-empty body).
2. Every inline review thread reachable via GraphQL reports `isResolved: true`.
3. Every non-bot, non-author top-level review has been acknowledged (replied-to) OR resolved via a thread it spawned.
4. Every non-bot, non-author issue comment has been acknowledged (replied-to).
5. Every CI check is `conclusion: "success"` or `"skipped"` / `"neutral"` — none `"failure"` or still pending.
6. **Two consecutive post-CI polls** (≥60s apart) stay clean — no new threads, no new non-empty reviews, no new issue comments. Bots (coderabbitai, sentry, autogpt-reviewer) frequently post late after CI settles; a single green snapshot is not sufficient.
**Do not stop at a fixed number of rounds.** If round N introduces new comments, round N+1 is required. Cap at `_MAX_ROUNDS = 10` as a safety valve, but expect 25 in practice.
## TodoWrite
Before starting, write two todos so the user can see the loop progression:
- `Round {current}: /pr-review + /pr-address on PR #{N}` — current iteration.
- `Final polish polling: 2 consecutive clean polls, CI green, 0 unresolved` — runs after the last non-empty review round.
Update the `current` round counter at the start of each iteration; mark `completed` only when the round's address step finishes (all new threads addressed + resolved).
## Find the PR
```bash
ARG_PR="${ARG:-}"
# Normalize URL → numeric ID if the skill arg is a pull-request URL.
if [[ "$ARG_PR" =~ ^https?://github\.com/[^/]+/[^/]+/pull/([0-9]+) ]]; then
ARG_PR="${BASH_REMATCH[1]}"
fi
PR="${ARG_PR:-$(gh pr list --head "$(git branch --show-current)" --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')}"
if [ -z "$PR" ] || [ "$PR" = "null" ]; then
echo "No PR found for current branch. Provide a PR number or URL as the skill arg."
exit 1
fi
echo "Polishing PR #$PR"
```
## The outer loop
```text
round = 0
while round < _MAX_ROUNDS:
round += 1
baseline = snapshot_state(PR) # see "Snapshotting state" below
invoke_skill("pr-review", PR) # posts findings as inline comments / top-level review
findings = diff_state(PR, baseline)
if findings.total == 0:
break # no new findings → go to polish polling
invoke_skill("pr-address", PR) # resolves every unresolved thread + CI failure
# Post-loop: polish polling (see below).
polish_polling(PR)
```
### Snapshotting state
Before each `/pr-review`, capture a baseline so the diff after the review reflects **only** what the review just added (not pre-existing threads):
```bash
# Inline threads — total count + latest databaseId per thread
gh api graphql -f query="
{
repository(owner: \"Significant-Gravitas\", name: \"AutoGPT\") {
pullRequest(number: ${PR}) {
reviewThreads(first: 100) {
totalCount
nodes {
id
isResolved
comments(last: 1) { nodes { databaseId } }
}
}
}
}
}" > /tmp/baseline_threads.json
# Top-level reviews — count + latest id per non-empty review
gh api "repos/Significant-Gravitas/AutoGPT/pulls/${PR}/reviews" --paginate \
--jq '[.[] | select((.body // "") != "") | {id, user: .user.login, state, submitted_at}]' \
> /tmp/baseline_reviews.json
# Issue comments — count + latest id per non-bot, non-author comment.
# Bots are filtered by User.type == "Bot" (GitHub sets this for app/bot
# accounts like coderabbitai, github-actions, sentry-io). The author is
# filtered by comparing login to the PR author — export it so jq can see it.
AUTHOR=$(gh api "repos/Significant-Gravitas/AutoGPT/pulls/${PR}" --jq '.user.login')
gh api "repos/Significant-Gravitas/AutoGPT/issues/${PR}/comments" --paginate \
--jq --arg author "$AUTHOR" \
'[.[] | select(.user.type != "Bot" and .user.login != $author)
| {id, user: .user.login, created_at}]' \
> /tmp/baseline_issue_comments.json
```
### Diffing after a review
After `/pr-review` runs, any of these counting as "new findings" means another address round is needed:
- New inline thread `id` not in the baseline.
- An existing thread whose latest comment `databaseId` is higher than the baseline's (new reply on an old thread).
- A new top-level review `id` with a non-empty body.
- A new issue comment `id` from a non-bot, non-author user.
If any of the four buckets is non-empty → not done; invoke `/pr-address` and loop.
## Polish polling
Once `/pr-review` produces zero new findings, do **not** exit yet. Bots (coderabbitai, sentry, autogpt-reviewer) commonly post late reviews after CI settles — 3090 seconds after the final push. Poll at 60-second intervals:
```text
NON_SUCCESS_TERMINAL = {"failure", "cancelled", "timed_out", "action_required", "startup_failure"}
clean_polls = 0
required_clean = 2
while clean_polls < required_clean:
# 1. CI gate — any terminal non-success conclusion (not just "failure")
# must trigger /pr-address. "success", "skipped", "neutral" are clean;
# anything else (including cancelled, timed_out, action_required) is a
# blocker that won't self-resolve.
ci = fetch_check_runs(PR)
if any ci.conclusion in NON_SUCCESS_TERMINAL:
invoke_skill("pr-address", PR) # address failures + any new comments
baseline = snapshot_state(PR) # reset — push during address invalidates old baseline
clean_polls = 0
continue
if any ci.conclusion is None (still in_progress):
sleep 60; continue # wait without counting this as clean
# 2. Comment / thread gate
threads = fetch_unresolved_threads(PR)
new_issue_comments = diff_against_baseline(issue_comments)
new_reviews = diff_against_baseline(reviews)
if threads or new_issue_comments or new_reviews:
invoke_skill("pr-address", PR)
baseline = snapshot_state(PR) # reset — the address loop just dealt with these,
# otherwise they stay "new" relative to the old baseline forever
clean_polls = 0
continue
# 3. Mergeability gate
mergeable = gh api repos/.../pulls/${PR} --jq '.mergeable'
if mergeable == false (CONFLICTING):
resolve_conflicts(PR) # see pr-address skill
clean_polls = 0
continue
if mergeable is null (UNKNOWN):
sleep 60; continue
clean_polls += 1
sleep 60
```
Only after `clean_polls == 2` do you report `ORCHESTRATOR:DONE`.
### Why 2 clean polls, not 1
A single green snapshot can be misleading — the final CI check often completes ~30s before a bot posts its delayed review. One quiet cycle does not prove the PR is stable; two consecutive cycles with no new threads, reviews, or issue comments arriving gives high confidence nothing else is incoming.
### Why checking every source each poll
`/pr-address` polling inside a single round already re-checks its own comments, but `/pr-polish` sits a level above and must also catch:
- New top-level reviews (autogpt-reviewer sometimes posts structured feedback only after several CI green cycles).
- Issue comments from human reviewers (not caught by inline thread polling).
- Sentry bug predictions that land on new line numbers post-push.
- Merge conflicts introduced by a race between your push and a merge to `dev`.
## Invocation pattern
Delegate to existing skills with the `Skill` tool; do not re-implement the review or address logic inline. This keeps the polish loop focused on orchestration and lets the child skills evolve independently.
```python
Skill(skill="pr-review", args=pr_url)
Skill(skill="pr-address", args=pr_url)
```
After each child invocation, re-query GitHub state directly — never trust a summary for the stop condition. The orchestrator's `ORCHESTRATOR:DONE` is verified against actual GraphQL / REST responses per the rules in `pr-address`'s "Verify actual count before outputting ORCHESTRATOR:DONE" section.
### **Auto-continue: do NOT end your response between child skills**
`/pr-polish` is a single orchestration task — one invocation drives the PR all the way to merge-ready. When a child `Skill()` call returns control to you:
- Do NOT summarize and stop.
- Do NOT wait for user confirmation to continue.
- Immediately, in the same response, perform the next loop step: state diff → decide next action → next `Skill()` call or polling sleep.
The child skill returning is a **loop iteration boundary**, not a conversation turn boundary. You are expected to keep going until one of the exit conditions in the opening section is met (2 consecutive clean polls, `_MAX_ROUNDS` hit, or an unrecoverable error).
If the user needs to approve a risky action mid-loop (e.g., a force-push or a destructive git operation), pause there — but not at the routine "round N finished, round N+1 needed" boundary. Those are silent transitions.
## GitHub rate limits
This skill issues many GraphQL calls (one review-thread query per outer iteration plus per-poll queries inside polish polling). Expect the GraphQL budget to be tight on large PRs. When `gh api rate_limit --jq .resources.graphql.remaining` drops below ~200, back off:
- Fall back to REST for reads (flat `/pulls/{N}/comments`, `/pulls/{N}/reviews`, `/issues/{N}/comments`) per the `pr-address` skill's GraphQL-fallback section.
- Queue thread resolutions (GraphQL-only) until the budget resets; keep making progress on fixes + REST replies meanwhile.
- `sleep 5` between any batch of ≥20 writes to avoid secondary rate limits.
## Safety valves
- `_MAX_ROUNDS = 10` — if review+address rounds exceed this, stop and escalate to the user with a summary of what's still unresolved. A PR that cannot converge in 10 rounds has systemic issues that need human judgment.
- After each commit, run `poetry run format` / `pnpm format && pnpm lint && pnpm types` per the target codebase's conventions. A failing format check is CI `failure` that will never self-resolve.
- Every `/pr-review` round checks for **duplicate** concerns first (via `pr-review`'s own "Fetch existing review comments" step) so the loop does not re-post the same finding that a prior round already resolved.
## Reporting
When the skill finishes (either via two clean polls or hitting `_MAX_ROUNDS`), produce a compact summary:
```
PR #{N} polish complete ({rounds_completed} rounds):
- {X} inline threads opened and resolved
- {Y} CI failures fixed
- {Z} new commits pushed
Final state: CI green, {total} threads all resolved, mergeable.
```
If exiting via `_MAX_ROUNDS`, flag explicitly:
```
PR #{N} polish stopped at {_MAX_ROUNDS} rounds — NOT merge-ready:
- {N} threads still unresolved: {titles}
- CI status: {summary}
Needs human review.
```
## When to use this skill
Use when the user says any of:
- "polish this PR"
- "keep reviewing and addressing until it's mergeable"
- "loop /pr-review + /pr-address until done"
- "make sure the PR is actually merge-ready"
Do **not** use when:
- User wants just one review pass (→ `/pr-review`).
- User wants to address already-posted comments without further self-review (→ `/pr-address`).
- A fixed round count is explicitly requested (e.g., "do 3 rounds") — honour the count instead of converging.

View File

@@ -5,7 +5,7 @@ user-invocable: true
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
metadata:
author: autogpt-team
version: "2.0.0"
version: "2.1.0"
---
# Manual E2E Test
@@ -180,6 +180,120 @@ Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.
## Step 3.0: Claim the testing lock (coordinate parallel agents)
Multiple worktrees share the same host — Docker infra (postgres, redis, clamav), app ports (3000/8006/…), and the test user. Two agents running `/pr-test` concurrently will corrupt each other's state (connection-pool exhaustion, port binds failing silently, cross-test assertions). Use the root-worktree lock file to take turns.
### Lock file contract
Path (**always** the root worktree so all siblings see it): `$REPO_ROOT/.ign.testing.lock`
Body (one `key=value` per line):
```
holder=<pr-XXXXX-purpose>
pid=<pid-or-"self">
started=<iso8601>
heartbeat=<iso8601, updated every ~2 min>
worktree=<full path>
branch=<branch name>
intent=<one-line description + rough duration>
```
### Claim
```bash
LOCK=$REPO_ROOT/.ign.testing.lock
NOW=$(date -u +%Y-%m-%dT%H:%MZ)
STALE_AFTER_MIN=5
if [ -f "$LOCK" ]; then
HB=$(grep '^heartbeat=' "$LOCK" | cut -d= -f2)
HB_EPOCH=$(date -j -f '%Y-%m-%dT%H:%MZ' "$HB" +%s 2>/dev/null || date -d "$HB" +%s 2>/dev/null || echo 0)
AGE_MIN=$(( ( $(date -u +%s) - HB_EPOCH ) / 60 ))
if [ "$AGE_MIN" -gt "$STALE_AFTER_MIN" ]; then
echo "WARN: stale lock (${AGE_MIN}m old) — reclaiming"
cat "$LOCK" | sed 's/^/ stale: /'
else
echo "Another agent holds the lock:"; cat "$LOCK"
echo "Wait until released or resume after $((STALE_AFTER_MIN - AGE_MIN))m."
exit 1
fi
fi
cat > "$LOCK" <<EOF
holder=pr-${PR_NUMBER}-e2e
pid=self
started=$NOW
heartbeat=$NOW
worktree=$WORKTREE_PATH
branch=$(cd $WORKTREE_PATH && git branch --show-current)
intent=E2E test PR #${PR_NUMBER}, native mode, ~60min
EOF
echo "Lock claimed"
```
### Heartbeat (MUST run in background during the whole test)
Without a heartbeat a crashed agent keeps the lock forever. Run this as a background process right after claim:
```bash
(while true; do
sleep 120
[ -f "$LOCK" ] || exit 0 # lock released → exit heartbeat
perl -i -pe "s/^heartbeat=.*/heartbeat=$(date -u +%Y-%m-%dT%H:%MZ)/" "$LOCK"
done) &
HEARTBEAT_PID=$!
echo "$HEARTBEAT_PID" > /tmp/pr-test-heartbeat.pid
```
### Release (always — even on failure)
```bash
kill "$HEARTBEAT_PID" 2>/dev/null
rm -f "$LOCK" /tmp/pr-test-heartbeat.pid
echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] released lock" \
>> $REPO_ROOT/.ign.testing.log
```
Use a `trap` so release runs even on `exit 1`:
```bash
trap 'kill "$HEARTBEAT_PID" 2>/dev/null; rm -f "$LOCK"' EXIT INT TERM
```
### **Release the lock AS SOON AS the test run is done**
The lock guards **test execution**, not **app lifecycle**. Once Step 5 (record results) and Step 6 (post PR comment) are complete, release the lock IMMEDIATELY — even if:
- The native `poetry run app` / `pnpm dev` processes are still running so the user can keep poking at the app manually.
- You're leaving docker containers up.
- You're tailing logs for a minute or two.
Keeping the lock held past the test run is the single most common way `/pr-test` stalls other agents. **The app staying up is orthogonal to the lock; don't conflate them.** Sibling worktrees running their own `/pr-test` will kill the stray processes and free the ports themselves (Step 3c/3e-native handle that) — they just need the lock file gone.
Concretely, the sequence at the end of every `/pr-test` run (success or failure) is:
```bash
# 1. Write the final report + post PR comment — done above in Step 5/6.
# 2. Release the lock right now, even if the app is still up.
kill "$HEARTBEAT_PID" 2>/dev/null
rm -f "$LOCK" /tmp/pr-test-heartbeat.pid
echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] released lock (app may still be running)" \
>> $REPO_ROOT/.ign.testing.log
# 3. Optionally leave the app running and note it so the user knows:
echo "Native stack still running on :3000 / :8006 for manual poking. Kill with:"
echo " pkill -9 -f 'poetry run app'; pkill -9 -f 'next-server|next dev'"
```
If a sibling agent's `/pr-test` needs to take over, it'll do the kill+rebuild dance from Step 3c/3e-native on its own — your only job is to not hold the lock file past the end of your test.
### Shared status log
`$REPO_ROOT/.ign.testing.log` is an append-only channel any agent can read/write. Use it for "I'm waiting", "I'm done, resources free", or post-run notes:
```bash
echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] <message>" \
>> $REPO_ROOT/.ign.testing.log
```
## Step 3: Environment setup
### 3a. Copy .env files from the root worktree
@@ -248,7 +362,87 @@ docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocke
done
```
### 3e. Build and start
**Native mode also:** when running the app natively (see 3e-native), kill any stray host processes and free the app ports before starting — otherwise `poetry run app` and `pnpm dev` will fail to bind.
```bash
# Kill stray native app processes from prior runs
pkill -9 -f "python.*backend" 2>/dev/null || true
pkill -9 -f "poetry run app" 2>/dev/null || true
pkill -9 -f "next-server|next dev" 2>/dev/null || true
# Free app ports (errors per port are ignored — port may simply be unused)
for port in 3000 8006 8001 8002 8005 8008; do
lsof -ti :$port -sTCP:LISTEN | xargs -r kill -9 2>/dev/null || true
done
```
### 3e-native. Run the app natively (PREFERRED for iterative dev)
Native mode runs infra (postgres, supabase, redis, rabbitmq, clamav) in docker but runs the backend and frontend directly on the host. This avoids the 3-8 minute `docker compose build` cycle on every backend change — code edits are picked up on process restart (seconds) instead of a full image rebuild.
**When to prefer native mode (default for this skill):**
- Iterative dev/debug loops where you're editing backend or frontend code between test runs
- Any PR that touches Python/TS source but not Dockerfiles, compose config, or infra images
- Fast repro of a failing scenario — restart `poetry run app` in a couple of seconds
**When to prefer docker mode (3e fallback):**
- Testing changes to `Dockerfile`, `docker-compose.yml`, or base images
- Production-parity smoke tests (exact container env, networking, volumes)
- CI-equivalent runs where you need the exact image that'll ship
**Note on 3b (copilot auth):** no npm install anywhere. `poetry install` pulls in `claude_agent_sdk`, which ships its own Claude CLI binary — available on `PATH` whenever you run commands via `poetry run` (native) OR whenever the copilot_executor container is built from its Poetry lockfile (docker). The OAuth token extraction still applies (same `refresh_claude_token.sh` call).
**Preamble:** before starting native, run the kill-stray + free-ports block from 3c's "Native mode also" subsection.
**1. Start infra only (one-time per session):**
```bash
cd $PLATFORM_DIR && docker compose --profile local up deps --detach --remove-orphans --build
```
This brings up postgres/supabase/redis/rabbitmq/clamav and skips all app services.
**2. Start the backend natively:**
```bash
cd $BACKEND_DIR && (poetry run app 2>&1 | tee .ign.application.logs) &
```
`poetry run app` spawns **all** app subprocesses — `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, `database_manager` — inside ONE parent process. No separate containers, no separate terminals. The `.ign.application.logs` prefix is already gitignored.
**3. Wait for the backend on :8006 BEFORE starting the frontend.** This ordering matters — the frontend's `pnpm dev` startup invokes `generate-api-queries`, which fetches `/openapi.json` from the backend. If the backend isn't listening yet, `pnpm dev` fails immediately.
```bash
for i in $(seq 1 60); do
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8006/docs 2>/dev/null)" = "200" ]; then
echo "Backend ready"
break
fi
sleep 2
done
```
**4. Start the frontend natively:**
```bash
cd $FRONTEND_DIR && (pnpm dev 2>&1 | tee .ign.frontend.logs) &
```
**5. Wait for the frontend on :3000:**
```bash
for i in $(seq 1 60); do
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null)" = "200" ]; then
echo "Frontend ready"
break
fi
sleep 2
done
```
Once both are up, skip 3e/3f and go straight to **3g/3h** (feature flags / test user creation).
### 3e. Build and start (docker — fallback)
```bash
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
@@ -442,6 +636,22 @@ agent-browser --session-name pr-test snapshot | grep "text:"
### Checking logs
**Native mode:** when running via `poetry run app` + `pnpm dev`, all app logs stream to the `.ign.*.logs` files written by the `tee` pipes in 3e-native. `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, and `database_manager` are all subprocesses of the single `poetry run app` parent, so their output is interleaved in `.ign.application.logs`.
```bash
# Backend (all app subprocesses interleaved)
tail -f $BACKEND_DIR/.ign.application.logs
# Frontend (Next.js dev server)
tail -f $FRONTEND_DIR/.ign.frontend.logs
# Filter for errors across either log
grep -iE "error|exception|traceback" $BACKEND_DIR/.ign.application.logs | tail -20
grep -iE "error|exception|traceback" $FRONTEND_DIR/.ign.frontend.logs | tail -20
```
**Docker mode:**
```bash
# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
@@ -571,6 +781,19 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
**CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.** Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
**CRITICAL — NEVER paste absolute local paths into the PR comment.** Strings like `/Users/…`, `/home/…`, `C:\…` are useless to every reviewer except you. Before posting, grep the final body for `/Users/`, `/home/`, `/tmp/`, `/private/`, `C:\`, `~/` and either drop those lines entirely or rewrite them as repo-relative paths (`autogpt_platform/backend/…`). The PR comment is an artifact reviewers on GitHub read — it must be self-contained on github.com. Keep local paths in `$RESULTS_DIR/test-report.md` for yourself; only copy the *content* they reference (excerpts, test names, log lines) into the PR comment, not the path.
**Pre-post sanity check** (paste after building the comment body, before `gh api ... comments`):
```bash
# Reject any local-looking absolute path or home-dir shortcut in the body
if grep -nE '(^|[^A-Za-z])(/Users/|/home/|/tmp/|/private/|C:\\|~/)[A-Za-z0-9]' "$COMMENT_FILE" ; then
echo "ABORT: local filesystem paths detected in PR comment body."
echo "Remove or rewrite as repo-relative (autogpt_platform/...) before posting."
exit 1
fi
```
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
@@ -876,9 +1099,15 @@ test scenario → find issue (bug OR UX problem) → screenshot broken state
### Problem: Frontend shows cookie banner blocking interaction
**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
### Problem: Container loses npm packages after rebuild
**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
**Fix:** Add packages to the Dockerfile instead of installing at runtime.
### Problem: Claude CLI not found in copilot_executor container
**Symptom:** Copilot logs say `claude: command not found` or similar when starting an SDK turn.
**Cause:** Image was built without `poetry install` (stale base layer, or Dockerfile bypass). The SDK CLI ships inside the `claude_agent_sdk` Poetry dep — it is NOT an npm package.
**Fix:** Rebuild the image cleanly: `docker compose build --no-cache copilot_executor && docker compose up -d copilot_executor`. Do NOT `docker exec ... npm install -g @anthropic-ai/claude-code` — that is outdated guidance and will pollute the container with a second CLI that the SDK won't use.
### Problem: agent-browser screenshot hangs / times out
**Symptom:** `agent-browser screenshot` exits with code 124 even on `about:blank`.
**Cause:** Stuck CDP connection or Chromium process tree. Seen on macOS when a prior `/pr-test` left a zombie Chrome for Testing.
**Fix:** `pkill -9 -f "agent-browser|chromium|Chrome for Testing" && sleep 2`, then reopen the browser with a fresh `--session-name`. If still failing, verify via `agent-browser eval` + `agent-browser snapshot` (DOM state) instead of relying on PNGs — the feature under test is the same.
### Problem: Services not starting after `docker compose up`
**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.

1
.gitignore vendored
View File

@@ -195,3 +195,4 @@ test.db
# Implementation plans (generated by AI agents)
plans/
.claude/worktrees/
test-results/

View File

@@ -267,7 +267,7 @@
"filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
"hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
"is_verified": false,
"line_number": 55
"line_number": 67
}
],
"autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
@@ -467,5 +467,5 @@
}
]
},
"generated_at": "2026-04-09T14:20:23Z"
"generated_at": "2026-04-24T16:42:44Z"
}

View File

@@ -1,3 +1,6 @@
*.ignore.*
*.ign.*
.application.logs
# Claude Code local settings only — the rest of .claude/ is shared (skills etc.)
.claude/settings.local.json

View File

@@ -59,6 +59,8 @@ class OAuthState(BaseModel):
code_verifier: Optional[str] = None
scopes: list[str]
"""Unix timestamp (seconds) indicating when this OAuth state expires"""
credential_id: Optional[str] = None
"""If set, this OAuth flow upgrades an existing credential's scopes."""
class UserMetadata(BaseModel):

View File

@@ -60,7 +60,8 @@ NVIDIA_API_KEY=
# Graphiti Temporal Knowledge Graph Memory
# Rollout controlled by LaunchDarkly flag "graphiti-memory"
# LLM/embedder keys fall back to OPEN_ROUTER_API_KEY and OPENAI_API_KEY when empty.
# LLM key falls back to CHAT_API_KEY (AutoPilot), then OPEN_ROUTER_API_KEY.
# Embedder key falls back to CHAT_OPENAI_API_KEY (AutoPilot), then OPENAI_API_KEY.
GRAPHITI_FALKORDB_HOST=localhost
GRAPHITI_FALKORDB_PORT=6380
GRAPHITI_FALKORDB_PASSWORD=
@@ -178,6 +179,9 @@ MEM0_API_KEY=
OPENWEATHERMAP_API_KEY=
GOOGLE_MAPS_API_KEY=
# Platform Bot Linking
PLATFORM_LINK_BASE_URL=http://localhost:3000/link
# Communication Services
DISCORD_BOT_TOKEN=
MEDIUM_API_KEY=

View File

@@ -0,0 +1,932 @@
import asyncio
import logging
from typing import List
from autogpt_libs.auth import requires_admin_user
from autogpt_libs.auth.models import User as AuthUser
from fastapi import APIRouter, HTTPException, Security
from prisma.enums import AgentExecutionStatus
from pydantic import BaseModel
from backend.api.features.admin.model import (
AgentDiagnosticsResponse,
ExecutionDiagnosticsResponse,
)
from backend.data.diagnostics import (
FailedExecutionDetail,
OrphanedScheduleDetail,
RunningExecutionDetail,
ScheduleDetail,
ScheduleHealthMetrics,
cleanup_all_stuck_queued_executions,
cleanup_orphaned_executions_bulk,
cleanup_orphaned_schedules_bulk,
get_agent_diagnostics,
get_all_orphaned_execution_ids,
get_all_schedules_details,
get_all_stuck_queued_execution_ids,
get_execution_diagnostics,
get_failed_executions_count,
get_failed_executions_details,
get_invalid_executions_details,
get_long_running_executions_details,
get_orphaned_executions_details,
get_orphaned_schedules_details,
get_running_executions_details,
get_schedule_health_metrics,
get_stuck_queued_executions_details,
stop_all_long_running_executions,
)
from backend.data.execution import get_graph_executions
from backend.executor.utils import add_graph_execution, stop_graph_execution
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["diagnostics", "admin"],
dependencies=[Security(requires_admin_user)],
)
class RunningExecutionsListResponse(BaseModel):
"""Response model for list of running executions"""
executions: List[RunningExecutionDetail]
total: int
class FailedExecutionsListResponse(BaseModel):
"""Response model for list of failed executions"""
executions: List[FailedExecutionDetail]
total: int
class StopExecutionRequest(BaseModel):
"""Request model for stopping a single execution"""
execution_id: str
class StopExecutionsRequest(BaseModel):
"""Request model for stopping multiple executions"""
execution_ids: List[str]
class StopExecutionResponse(BaseModel):
"""Response model for stop execution operations"""
success: bool
stopped_count: int = 0
message: str
class RequeueExecutionResponse(BaseModel):
"""Response model for requeue execution operations"""
success: bool
requeued_count: int = 0
message: str
@router.get(
"/diagnostics/executions",
response_model=ExecutionDiagnosticsResponse,
summary="Get Execution Diagnostics",
)
async def get_execution_diagnostics_endpoint():
"""
Get comprehensive diagnostic information about execution status.
Returns all execution metrics including:
- Current state (running, queued)
- Orphaned executions (>24h old, likely not in executor)
- Failure metrics (1h, 24h, rate)
- Long-running detection (stuck >1h, >24h)
- Stuck queued detection
- Throughput metrics (completions/hour)
- RabbitMQ queue depths
"""
logger.info("Getting execution diagnostics")
diagnostics = await get_execution_diagnostics()
response = ExecutionDiagnosticsResponse(
running_executions=diagnostics.running_count,
queued_executions_db=diagnostics.queued_db_count,
queued_executions_rabbitmq=diagnostics.rabbitmq_queue_depth,
cancel_queue_depth=diagnostics.cancel_queue_depth,
orphaned_running=diagnostics.orphaned_running,
orphaned_queued=diagnostics.orphaned_queued,
failed_count_1h=diagnostics.failed_count_1h,
failed_count_24h=diagnostics.failed_count_24h,
failure_rate_24h=diagnostics.failure_rate_24h,
stuck_running_24h=diagnostics.stuck_running_24h,
stuck_running_1h=diagnostics.stuck_running_1h,
oldest_running_hours=diagnostics.oldest_running_hours,
stuck_queued_1h=diagnostics.stuck_queued_1h,
queued_never_started=diagnostics.queued_never_started,
invalid_queued_with_start=diagnostics.invalid_queued_with_start,
invalid_running_without_start=diagnostics.invalid_running_without_start,
completed_1h=diagnostics.completed_1h,
completed_24h=diagnostics.completed_24h,
throughput_per_hour=diagnostics.throughput_per_hour,
timestamp=diagnostics.timestamp,
)
logger.info(
f"Execution diagnostics: running={diagnostics.running_count}, "
f"queued_db={diagnostics.queued_db_count}, "
f"orphaned={diagnostics.orphaned_running + diagnostics.orphaned_queued}, "
f"failed_24h={diagnostics.failed_count_24h}"
)
return response
@router.get(
"/diagnostics/agents",
response_model=AgentDiagnosticsResponse,
summary="Get Agent Diagnostics",
)
async def get_agent_diagnostics_endpoint():
"""
Get diagnostic information about agents.
Returns:
- agents_with_active_executions: Number of unique agents with running/queued executions
- timestamp: Current timestamp
"""
logger.info("Getting agent diagnostics")
diagnostics = await get_agent_diagnostics()
response = AgentDiagnosticsResponse(
agents_with_active_executions=diagnostics.agents_with_active_executions,
timestamp=diagnostics.timestamp,
)
logger.info(
f"Agent diagnostics: with_active_executions={diagnostics.agents_with_active_executions}"
)
return response
@router.get(
"/diagnostics/executions/running",
response_model=RunningExecutionsListResponse,
summary="List Running Executions",
)
async def list_running_executions(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of running and queued executions (recent, likely active).
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
Returns:
List of running executions with details
"""
logger.info(f"Listing running executions (limit={limit}, offset={offset})")
executions = await get_running_executions_details(limit=limit, offset=offset)
# Get total count for pagination
diagnostics = await get_execution_diagnostics()
total = diagnostics.running_count + diagnostics.queued_db_count
return RunningExecutionsListResponse(executions=executions, total=total)
@router.get(
"/diagnostics/executions/orphaned",
response_model=RunningExecutionsListResponse,
summary="List Orphaned Executions",
)
async def list_orphaned_executions(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of orphaned executions (>24h old, likely not in executor).
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
Returns:
List of orphaned executions with details
"""
logger.info(f"Listing orphaned executions (limit={limit}, offset={offset})")
executions = await get_orphaned_executions_details(limit=limit, offset=offset)
# Get total count for pagination
diagnostics = await get_execution_diagnostics()
total = diagnostics.orphaned_running + diagnostics.orphaned_queued
return RunningExecutionsListResponse(executions=executions, total=total)
@router.get(
"/diagnostics/executions/failed",
response_model=FailedExecutionsListResponse,
summary="List Failed Executions",
)
async def list_failed_executions(
limit: int = 100,
offset: int = 0,
hours: int = 24,
):
"""
Get detailed list of failed executions.
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
hours: Number of hours to look back (default 24)
Returns:
List of failed executions with error details
"""
logger.info(
f"Listing failed executions (limit={limit}, offset={offset}, hours={hours})"
)
executions = await get_failed_executions_details(
limit=limit, offset=offset, hours=hours
)
# Get total count for pagination
# Always count actual total for given hours parameter
total = await get_failed_executions_count(hours=hours)
return FailedExecutionsListResponse(executions=executions, total=total)
@router.get(
"/diagnostics/executions/long-running",
response_model=RunningExecutionsListResponse,
summary="List Long-Running Executions",
)
async def list_long_running_executions(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of long-running executions (RUNNING status >24h).
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
Returns:
List of long-running executions with details
"""
logger.info(f"Listing long-running executions (limit={limit}, offset={offset})")
executions = await get_long_running_executions_details(limit=limit, offset=offset)
# Get total count for pagination
diagnostics = await get_execution_diagnostics()
total = diagnostics.stuck_running_24h
return RunningExecutionsListResponse(executions=executions, total=total)
@router.get(
"/diagnostics/executions/stuck-queued",
response_model=RunningExecutionsListResponse,
summary="List Stuck Queued Executions",
)
async def list_stuck_queued_executions(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of stuck queued executions (QUEUED >1h, never started).
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
Returns:
List of stuck queued executions with details
"""
logger.info(f"Listing stuck queued executions (limit={limit}, offset={offset})")
executions = await get_stuck_queued_executions_details(limit=limit, offset=offset)
# Get total count for pagination
diagnostics = await get_execution_diagnostics()
total = diagnostics.stuck_queued_1h
return RunningExecutionsListResponse(executions=executions, total=total)
@router.get(
"/diagnostics/executions/invalid",
response_model=RunningExecutionsListResponse,
summary="List Invalid Executions",
)
async def list_invalid_executions(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of executions in invalid states (READ-ONLY).
Invalid states indicate data corruption and require manual investigation:
- QUEUED but has startedAt (impossible - can't start while queued)
- RUNNING but no startedAt (impossible - can't run without starting)
⚠️ NO BULK ACTIONS PROVIDED - These need case-by-case investigation.
Each invalid execution likely has a different root cause (crashes, race conditions,
DB corruption). Investigate the execution history and logs to determine appropriate
action (manual cleanup, status fix, or leave as-is if system recovered).
Args:
limit: Maximum number of executions to return (default 100)
offset: Number of executions to skip (default 0)
Returns:
List of invalid state executions with details
"""
logger.info(f"Listing invalid state executions (limit={limit}, offset={offset})")
executions = await get_invalid_executions_details(limit=limit, offset=offset)
# Get total count for pagination
diagnostics = await get_execution_diagnostics()
total = (
diagnostics.invalid_queued_with_start
+ diagnostics.invalid_running_without_start
)
return RunningExecutionsListResponse(executions=executions, total=total)
@router.post(
"/diagnostics/executions/requeue",
response_model=RequeueExecutionResponse,
summary="Requeue Stuck Execution",
)
async def requeue_single_execution(
request: StopExecutionRequest, # Reuse same request model (has execution_id)
user: AuthUser = Security(requires_admin_user),
):
"""
Requeue a stuck QUEUED execution (admin only).
Uses add_graph_execution with existing graph_exec_id to requeue.
⚠️ WARNING: Only use for stuck executions. This will re-execute and may cost credits.
Args:
request: Contains execution_id to requeue
Returns:
Success status and message
"""
logger.info(f"Admin {user.user_id} requeueing execution {request.execution_id}")
# Get the execution (validation - must be QUEUED)
executions = await get_graph_executions(
graph_exec_id=request.execution_id,
statuses=[AgentExecutionStatus.QUEUED],
)
if not executions:
raise HTTPException(
status_code=404,
detail="Execution not found or not in QUEUED status",
)
execution = executions[0]
# Use add_graph_execution in requeue mode
await add_graph_execution(
graph_id=execution.graph_id,
user_id=execution.user_id,
graph_version=execution.graph_version,
graph_exec_id=request.execution_id, # Requeue existing execution
)
return RequeueExecutionResponse(
success=True,
requeued_count=1,
message="Execution requeued successfully",
)
@router.post(
"/diagnostics/executions/requeue-bulk",
response_model=RequeueExecutionResponse,
summary="Requeue Multiple Stuck Executions",
)
async def requeue_multiple_executions(
request: StopExecutionsRequest, # Reuse same request model (has execution_ids)
user: AuthUser = Security(requires_admin_user),
):
"""
Requeue multiple stuck QUEUED executions (admin only).
Uses add_graph_execution with existing graph_exec_id to requeue.
⚠️ WARNING: Only use for stuck executions. This will re-execute and may cost credits.
Args:
request: Contains list of execution_ids to requeue
Returns:
Number of executions requeued and success message
"""
logger.info(
f"Admin {user.user_id} requeueing {len(request.execution_ids)} executions"
)
# Get executions by ID list (must be QUEUED)
executions = await get_graph_executions(
execution_ids=request.execution_ids,
statuses=[AgentExecutionStatus.QUEUED],
)
if not executions:
return RequeueExecutionResponse(
success=False,
requeued_count=0,
message="No QUEUED executions found to requeue",
)
# Requeue all executions in parallel using add_graph_execution
async def requeue_one(exec) -> bool:
try:
await add_graph_execution(
graph_id=exec.graph_id,
user_id=exec.user_id,
graph_version=exec.graph_version,
graph_exec_id=exec.id, # Requeue existing
)
return True
except Exception as e:
logger.error(f"Failed to requeue {exec.id}: {e}")
return False
results = await asyncio.gather(
*[requeue_one(exec) for exec in executions], return_exceptions=False
)
requeued_count = sum(1 for success in results if success)
return RequeueExecutionResponse(
success=requeued_count > 0,
requeued_count=requeued_count,
message=f"Requeued {requeued_count} of {len(request.execution_ids)} executions",
)
@router.post(
"/diagnostics/executions/stop",
response_model=StopExecutionResponse,
summary="Stop Single Execution",
)
async def stop_single_execution(
request: StopExecutionRequest,
user: AuthUser = Security(requires_admin_user),
):
"""
Stop a single execution (admin only).
Uses robust stop_graph_execution which cascades to children and waits for termination.
Args:
request: Contains execution_id to stop
Returns:
Success status and message
"""
logger.info(f"Admin {user.user_id} stopping execution {request.execution_id}")
# Get the execution to find its owner user_id (required by stop_graph_execution)
executions = await get_graph_executions(
graph_exec_id=request.execution_id,
)
if not executions:
raise HTTPException(status_code=404, detail="Execution not found")
execution = executions[0]
# Use robust stop_graph_execution (cascades to children, waits for termination)
await stop_graph_execution(
user_id=execution.user_id,
graph_exec_id=request.execution_id,
wait_timeout=15.0,
cascade=True,
)
return StopExecutionResponse(
success=True,
stopped_count=1,
message="Execution stopped successfully",
)
@router.post(
"/diagnostics/executions/stop-bulk",
response_model=StopExecutionResponse,
summary="Stop Multiple Executions",
)
async def stop_multiple_executions(
request: StopExecutionsRequest,
user: AuthUser = Security(requires_admin_user),
):
"""
Stop multiple active executions (admin only).
Uses robust stop_graph_execution which cascades to children and waits for termination.
Args:
request: Contains list of execution_ids to stop
Returns:
Number of executions stopped and success message
"""
logger.info(
f"Admin {user.user_id} stopping {len(request.execution_ids)} executions"
)
# Get executions by ID list
executions = await get_graph_executions(
execution_ids=request.execution_ids,
)
if not executions:
return StopExecutionResponse(
success=False,
stopped_count=0,
message="No executions found",
)
# Stop all executions in parallel using robust stop_graph_execution
async def stop_one(exec) -> bool:
try:
await stop_graph_execution(
user_id=exec.user_id,
graph_exec_id=exec.id,
wait_timeout=15.0,
cascade=True,
)
return True
except Exception as e:
logger.error(f"Failed to stop execution {exec.id}: {e}")
return False
results = await asyncio.gather(
*[stop_one(exec) for exec in executions], return_exceptions=False
)
stopped_count = sum(1 for success in results if success)
return StopExecutionResponse(
success=stopped_count > 0,
stopped_count=stopped_count,
message=f"Stopped {stopped_count} of {len(request.execution_ids)} executions",
)
@router.post(
"/diagnostics/executions/cleanup-orphaned",
response_model=StopExecutionResponse,
summary="Cleanup Orphaned Executions",
)
async def cleanup_orphaned_executions(
request: StopExecutionsRequest,
user: AuthUser = Security(requires_admin_user),
):
"""
Cleanup orphaned executions by directly updating DB status (admin only).
For executions in DB but not actually running in executor (old/stale records).
Args:
request: Contains list of execution_ids to cleanup
Returns:
Number of executions cleaned up and success message
"""
logger.info(
f"Admin {user.user_id} cleaning up {len(request.execution_ids)} orphaned executions"
)
cleaned_count = await cleanup_orphaned_executions_bulk(
request.execution_ids, user.user_id
)
return StopExecutionResponse(
success=cleaned_count > 0,
stopped_count=cleaned_count,
message=f"Cleaned up {cleaned_count} of {len(request.execution_ids)} orphaned executions",
)
# ============================================================================
# SCHEDULE DIAGNOSTICS ENDPOINTS
# ============================================================================
class SchedulesListResponse(BaseModel):
"""Response model for list of schedules"""
schedules: List[ScheduleDetail]
total: int
class OrphanedSchedulesListResponse(BaseModel):
"""Response model for list of orphaned schedules"""
schedules: List[OrphanedScheduleDetail]
total: int
class ScheduleCleanupRequest(BaseModel):
"""Request model for cleaning up schedules"""
schedule_ids: List[str]
class ScheduleCleanupResponse(BaseModel):
"""Response model for schedule cleanup operations"""
success: bool
deleted_count: int = 0
message: str
@router.get(
"/diagnostics/schedules",
response_model=ScheduleHealthMetrics,
summary="Get Schedule Diagnostics",
)
async def get_schedule_diagnostics_endpoint():
"""
Get comprehensive diagnostic information about schedule health.
Returns schedule metrics including:
- Total schedules (user vs system)
- Orphaned schedules by category
- Upcoming executions
"""
logger.info("Getting schedule diagnostics")
diagnostics = await get_schedule_health_metrics()
logger.info(
f"Schedule diagnostics: total={diagnostics.total_schedules}, "
f"user={diagnostics.user_schedules}, "
f"orphaned={diagnostics.total_orphaned}"
)
return diagnostics
@router.get(
"/diagnostics/schedules/all",
response_model=SchedulesListResponse,
summary="List All User Schedules",
)
async def list_all_schedules(
limit: int = 100,
offset: int = 0,
):
"""
Get detailed list of all user schedules (excludes system monitoring jobs).
Args:
limit: Maximum number of schedules to return (default 100)
offset: Number of schedules to skip (default 0)
Returns:
List of schedules with details
"""
logger.info(f"Listing all schedules (limit={limit}, offset={offset})")
schedules = await get_all_schedules_details(limit=limit, offset=offset)
# Get total count
diagnostics = await get_schedule_health_metrics()
total = diagnostics.user_schedules
return SchedulesListResponse(schedules=schedules, total=total)
@router.get(
"/diagnostics/schedules/orphaned",
response_model=OrphanedSchedulesListResponse,
summary="List Orphaned Schedules",
)
async def list_orphaned_schedules():
"""
Get detailed list of orphaned schedules with orphan reasons.
Returns:
List of orphaned schedules categorized by orphan type
"""
logger.info("Listing orphaned schedules")
schedules = await get_orphaned_schedules_details()
return OrphanedSchedulesListResponse(schedules=schedules, total=len(schedules))
@router.post(
"/diagnostics/schedules/cleanup-orphaned",
response_model=ScheduleCleanupResponse,
summary="Cleanup Orphaned Schedules",
)
async def cleanup_orphaned_schedules(
request: ScheduleCleanupRequest,
user: AuthUser = Security(requires_admin_user),
):
"""
Cleanup orphaned schedules by deleting from scheduler (admin only).
Args:
request: Contains list of schedule_ids to delete
Returns:
Number of schedules deleted and success message
"""
logger.info(
f"Admin {user.user_id} cleaning up {len(request.schedule_ids)} orphaned schedules"
)
deleted_count = await cleanup_orphaned_schedules_bulk(
request.schedule_ids, user.user_id
)
return ScheduleCleanupResponse(
success=deleted_count > 0,
deleted_count=deleted_count,
message=f"Deleted {deleted_count} of {len(request.schedule_ids)} orphaned schedules",
)
@router.post(
"/diagnostics/executions/stop-all-long-running",
response_model=StopExecutionResponse,
summary="Stop ALL Long-Running Executions",
)
async def stop_all_long_running_executions_endpoint(
user: AuthUser = Security(requires_admin_user),
):
"""
Stop ALL long-running executions (RUNNING >24h) by sending cancel signals (admin only).
Operates on entire dataset, not limited to pagination.
Returns:
Number of executions stopped and success message
"""
logger.info(f"Admin {user.user_id} stopping ALL long-running executions")
stopped_count = await stop_all_long_running_executions(user.user_id)
return StopExecutionResponse(
success=stopped_count > 0,
stopped_count=stopped_count,
message=f"Stopped {stopped_count} long-running executions",
)
@router.post(
"/diagnostics/executions/cleanup-all-orphaned",
response_model=StopExecutionResponse,
summary="Cleanup ALL Orphaned Executions",
)
async def cleanup_all_orphaned_executions(
user: AuthUser = Security(requires_admin_user),
):
"""
Cleanup ALL orphaned executions (>24h old) by directly updating DB status.
Operates on all executions, not just paginated results.
Returns:
Number of executions cleaned up and success message
"""
logger.info(f"Admin {user.user_id} cleaning up ALL orphaned executions")
# Fetch all orphaned execution IDs
execution_ids = await get_all_orphaned_execution_ids()
if not execution_ids:
return StopExecutionResponse(
success=True,
stopped_count=0,
message="No orphaned executions to cleanup",
)
cleaned_count = await cleanup_orphaned_executions_bulk(execution_ids, user.user_id)
return StopExecutionResponse(
success=cleaned_count > 0,
stopped_count=cleaned_count,
message=f"Cleaned up {cleaned_count} orphaned executions",
)
@router.post(
"/diagnostics/executions/cleanup-all-stuck-queued",
response_model=StopExecutionResponse,
summary="Cleanup ALL Stuck Queued Executions",
)
async def cleanup_all_stuck_queued_executions_endpoint(
user: AuthUser = Security(requires_admin_user),
):
"""
Cleanup ALL stuck queued executions (QUEUED >1h) by updating DB status (admin only).
Operates on entire dataset, not limited to pagination.
Returns:
Number of executions cleaned up and success message
"""
logger.info(f"Admin {user.user_id} cleaning up ALL stuck queued executions")
cleaned_count = await cleanup_all_stuck_queued_executions(user.user_id)
return StopExecutionResponse(
success=cleaned_count > 0,
stopped_count=cleaned_count,
message=f"Cleaned up {cleaned_count} stuck queued executions",
)
@router.post(
"/diagnostics/executions/requeue-all-stuck",
response_model=RequeueExecutionResponse,
summary="Requeue ALL Stuck Queued Executions",
)
async def requeue_all_stuck_executions(
user: AuthUser = Security(requires_admin_user),
):
"""
Requeue ALL stuck queued executions (QUEUED >1h) by publishing to RabbitMQ.
Operates on all executions, not just paginated results.
Uses add_graph_execution with existing graph_exec_id to requeue.
⚠️ WARNING: This will re-execute ALL stuck executions and may cost significant credits.
Returns:
Number of executions requeued and success message
"""
logger.info(f"Admin {user.user_id} requeueing ALL stuck queued executions")
# Fetch all stuck queued execution IDs
execution_ids = await get_all_stuck_queued_execution_ids()
if not execution_ids:
return RequeueExecutionResponse(
success=True,
requeued_count=0,
message="No stuck queued executions to requeue",
)
# Get stuck executions by ID list (must be QUEUED)
executions = await get_graph_executions(
execution_ids=execution_ids,
statuses=[AgentExecutionStatus.QUEUED],
)
# Requeue all in parallel using add_graph_execution
async def requeue_one(exec) -> bool:
try:
await add_graph_execution(
graph_id=exec.graph_id,
user_id=exec.user_id,
graph_version=exec.graph_version,
graph_exec_id=exec.id, # Requeue existing
)
return True
except Exception as e:
logger.error(f"Failed to requeue {exec.id}: {e}")
return False
results = await asyncio.gather(
*[requeue_one(exec) for exec in executions], return_exceptions=False
)
requeued_count = sum(1 for success in results if success)
return RequeueExecutionResponse(
success=requeued_count > 0,
requeued_count=requeued_count,
message=f"Requeued {requeued_count} stuck executions",
)

View File

@@ -0,0 +1,889 @@
from datetime import datetime, timezone
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from prisma.enums import AgentExecutionStatus
import backend.api.features.admin.diagnostics_admin_routes as diagnostics_admin_routes
from backend.data.diagnostics import (
AgentDiagnosticsSummary,
ExecutionDiagnosticsSummary,
FailedExecutionDetail,
OrphanedScheduleDetail,
RunningExecutionDetail,
ScheduleDetail,
ScheduleHealthMetrics,
)
from backend.data.execution import GraphExecutionMeta
app = fastapi.FastAPI()
app.include_router(diagnostics_admin_routes.router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
"""Setup admin auth overrides for all tests in this module"""
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def test_get_execution_diagnostics_success(
mocker: pytest_mock.MockFixture,
):
"""Test fetching execution diagnostics with invalid state detection"""
mock_diagnostics = ExecutionDiagnosticsSummary(
running_count=10,
queued_db_count=5,
rabbitmq_queue_depth=3,
cancel_queue_depth=0,
orphaned_running=2,
orphaned_queued=1,
failed_count_1h=5,
failed_count_24h=20,
failure_rate_24h=0.83,
stuck_running_24h=1,
stuck_running_1h=3,
oldest_running_hours=26.5,
stuck_queued_1h=2,
queued_never_started=1,
invalid_queued_with_start=1, # New invalid state
invalid_running_without_start=1, # New invalid state
completed_1h=50,
completed_24h=1200,
throughput_per_hour=50.0,
timestamp=datetime.now(timezone.utc).isoformat(),
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=mock_diagnostics,
)
response = client.get("/admin/diagnostics/executions")
assert response.status_code == 200
data = response.json()
# Verify new invalid state fields are included
assert data["invalid_queued_with_start"] == 1
assert data["invalid_running_without_start"] == 1
# Verify all expected fields present
assert "running_executions" in data
assert "orphaned_running" in data
assert "failed_count_24h" in data
def test_list_invalid_executions(
mocker: pytest_mock.MockFixture,
):
"""Test listing executions in invalid states (read-only endpoint)"""
mock_invalid_executions = [
RunningExecutionDetail(
execution_id="exec-invalid-1",
graph_id="graph-123",
graph_name="Test Graph",
graph_version=1,
user_id="user-123",
user_email="test@example.com",
status="QUEUED",
created_at=datetime.now(timezone.utc),
started_at=datetime.now(
timezone.utc
), # QUEUED but has startedAt - INVALID!
queue_status=None,
),
RunningExecutionDetail(
execution_id="exec-invalid-2",
graph_id="graph-456",
graph_name="Another Graph",
graph_version=2,
user_id="user-456",
user_email="user@example.com",
status="RUNNING",
created_at=datetime.now(timezone.utc),
started_at=None, # RUNNING but no startedAt - INVALID!
queue_status=None,
),
]
mock_diagnostics = ExecutionDiagnosticsSummary(
running_count=10,
queued_db_count=5,
rabbitmq_queue_depth=3,
cancel_queue_depth=0,
orphaned_running=0,
orphaned_queued=0,
failed_count_1h=0,
failed_count_24h=0,
failure_rate_24h=0.0,
stuck_running_24h=0,
stuck_running_1h=0,
oldest_running_hours=None,
stuck_queued_1h=0,
queued_never_started=0,
invalid_queued_with_start=1,
invalid_running_without_start=1,
completed_1h=0,
completed_24h=0,
throughput_per_hour=0.0,
timestamp=datetime.now(timezone.utc).isoformat(),
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_invalid_executions_details",
return_value=mock_invalid_executions,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=mock_diagnostics,
)
response = client.get("/admin/diagnostics/executions/invalid?limit=100&offset=0")
assert response.status_code == 200
data = response.json()
assert data["total"] == 2 # Sum of both invalid state types
assert len(data["executions"]) == 2
# Verify both types of invalid states are returned
assert data["executions"][0]["execution_id"] in [
"exec-invalid-1",
"exec-invalid-2",
]
assert data["executions"][1]["execution_id"] in [
"exec-invalid-1",
"exec-invalid-2",
]
def test_requeue_single_execution_with_add_graph_execution(
mocker: pytest_mock.MockFixture,
admin_user_id: str,
):
"""Test requeueing uses add_graph_execution in requeue mode"""
mock_exec_meta = GraphExecutionMeta(
id="exec-stuck-123",
user_id="user-123",
graph_id="graph-456",
graph_version=1,
inputs=None,
credential_inputs=None,
nodes_input_masks=None,
preset_id=None,
status=AgentExecutionStatus.QUEUED,
started_at=datetime.now(timezone.utc),
ended_at=datetime.now(timezone.utc),
stats=None,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[mock_exec_meta],
)
mock_add_graph_execution = mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.add_graph_execution",
return_value=AsyncMock(),
)
response = client.post(
"/admin/diagnostics/executions/requeue",
json={"execution_id": "exec-stuck-123"},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["requeued_count"] == 1
# Verify it used add_graph_execution in requeue mode
mock_add_graph_execution.assert_called_once()
call_kwargs = mock_add_graph_execution.call_args.kwargs
assert call_kwargs["graph_exec_id"] == "exec-stuck-123" # Requeue mode!
assert call_kwargs["graph_id"] == "graph-456"
assert call_kwargs["user_id"] == "user-123"
def test_stop_single_execution_with_stop_graph_execution(
mocker: pytest_mock.MockFixture,
admin_user_id: str,
):
"""Test stopping uses robust stop_graph_execution"""
mock_exec_meta = GraphExecutionMeta(
id="exec-running-123",
user_id="user-789",
graph_id="graph-999",
graph_version=2,
inputs=None,
credential_inputs=None,
nodes_input_masks=None,
preset_id=None,
status=AgentExecutionStatus.RUNNING,
started_at=datetime.now(timezone.utc),
ended_at=datetime.now(timezone.utc),
stats=None,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[mock_exec_meta],
)
mock_stop_graph_execution = mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.stop_graph_execution",
return_value=AsyncMock(),
)
response = client.post(
"/admin/diagnostics/executions/stop",
json={"execution_id": "exec-running-123"},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 1
# Verify it used stop_graph_execution with cascade
mock_stop_graph_execution.assert_called_once()
call_kwargs = mock_stop_graph_execution.call_args.kwargs
assert call_kwargs["graph_exec_id"] == "exec-running-123"
assert call_kwargs["user_id"] == "user-789"
assert call_kwargs["cascade"] is True # Stops children too!
assert call_kwargs["wait_timeout"] == 15.0
def test_requeue_not_queued_execution_fails(
mocker: pytest_mock.MockFixture,
):
"""Test that requeue fails if execution is not in QUEUED status"""
# Mock an execution that's RUNNING (not QUEUED)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[], # No QUEUED executions found
)
response = client.post(
"/admin/diagnostics/executions/requeue",
json={"execution_id": "exec-running-123"},
)
assert response.status_code == 404
assert "not found or not in QUEUED status" in response.json()["detail"]
def test_list_invalid_executions_no_bulk_actions(
mocker: pytest_mock.MockFixture,
):
"""Verify invalid executions endpoint is read-only (no bulk actions)"""
# This is a documentation test - the endpoint exists but should not
# have corresponding cleanup/stop/requeue endpoints
# These endpoints should NOT exist for invalid states:
invalid_bulk_endpoints = [
"/admin/diagnostics/executions/cleanup-invalid",
"/admin/diagnostics/executions/stop-invalid",
"/admin/diagnostics/executions/requeue-invalid",
]
for endpoint in invalid_bulk_endpoints:
response = client.post(endpoint, json={"execution_ids": ["test"]})
assert response.status_code == 404, f"{endpoint} should not exist (read-only)"
def test_execution_ids_filter_efficiency(
mocker: pytest_mock.MockFixture,
):
"""Test that bulk operations use efficient execution_ids filter"""
mock_exec_metas = [
GraphExecutionMeta(
id=f"exec-{i}",
user_id=f"user-{i}",
graph_id="graph-123",
graph_version=1,
inputs=None,
credential_inputs=None,
nodes_input_masks=None,
preset_id=None,
status=AgentExecutionStatus.QUEUED,
started_at=datetime.now(timezone.utc),
ended_at=datetime.now(timezone.utc),
stats=None,
)
for i in range(3)
]
mock_get_graph_executions = mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=mock_exec_metas,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.add_graph_execution",
return_value=AsyncMock(),
)
response = client.post(
"/admin/diagnostics/executions/requeue-bulk",
json={"execution_ids": ["exec-0", "exec-1", "exec-2"]},
)
assert response.status_code == 200
# Verify it used execution_ids filter (not fetching all queued)
mock_get_graph_executions.assert_called_once()
call_kwargs = mock_get_graph_executions.call_args.kwargs
assert "execution_ids" in call_kwargs
assert call_kwargs["execution_ids"] == ["exec-0", "exec-1", "exec-2"]
assert call_kwargs["statuses"] == [AgentExecutionStatus.QUEUED]
# ---------------------------------------------------------------------------
# Helper: reusable mock diagnostics summary
# ---------------------------------------------------------------------------
def _make_mock_diagnostics(**overrides) -> ExecutionDiagnosticsSummary:
defaults = dict(
running_count=10,
queued_db_count=5,
rabbitmq_queue_depth=3,
cancel_queue_depth=0,
orphaned_running=2,
orphaned_queued=1,
failed_count_1h=5,
failed_count_24h=20,
failure_rate_24h=0.83,
stuck_running_24h=3,
stuck_running_1h=5,
oldest_running_hours=26.5,
stuck_queued_1h=2,
queued_never_started=1,
invalid_queued_with_start=1,
invalid_running_without_start=1,
completed_1h=50,
completed_24h=1200,
throughput_per_hour=50.0,
timestamp=datetime.now(timezone.utc).isoformat(),
)
defaults.update(overrides)
return ExecutionDiagnosticsSummary(**defaults)
_SENTINEL = object()
def _make_mock_execution(
exec_id: str = "exec-1",
status: str = "RUNNING",
started_at: datetime | None | object = _SENTINEL,
) -> RunningExecutionDetail:
return RunningExecutionDetail(
execution_id=exec_id,
graph_id="graph-123",
graph_name="Test Graph",
graph_version=1,
user_id="user-123",
user_email="test@example.com",
status=status,
created_at=datetime.now(timezone.utc),
started_at=(
datetime.now(timezone.utc) if started_at is _SENTINEL else started_at
),
queue_status=None,
)
def _make_mock_failed_execution(
exec_id: str = "exec-fail-1",
) -> FailedExecutionDetail:
return FailedExecutionDetail(
execution_id=exec_id,
graph_id="graph-123",
graph_name="Test Graph",
graph_version=1,
user_id="user-123",
user_email="test@example.com",
status="FAILED",
created_at=datetime.now(timezone.utc),
started_at=datetime.now(timezone.utc),
failed_at=datetime.now(timezone.utc),
error_message="Something went wrong",
)
def _make_mock_schedule_health(**overrides) -> ScheduleHealthMetrics:
defaults = dict(
total_schedules=15,
user_schedules=10,
system_schedules=5,
orphaned_deleted_graph=2,
orphaned_no_library_access=1,
orphaned_invalid_credentials=0,
orphaned_validation_failed=0,
total_orphaned=3,
schedules_next_hour=4,
schedules_next_24h=8,
total_runs_next_hour=12,
total_runs_next_24h=48,
timestamp=datetime.now(timezone.utc).isoformat(),
)
defaults.update(overrides)
return ScheduleHealthMetrics(**defaults)
# ---------------------------------------------------------------------------
# GET endpoints: execution list variants
# ---------------------------------------------------------------------------
def test_list_running_executions(mocker: pytest_mock.MockFixture):
mock_execs = [
_make_mock_execution("exec-run-1"),
_make_mock_execution("exec-run-2"),
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_running_executions_details",
return_value=mock_execs,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=_make_mock_diagnostics(),
)
response = client.get("/admin/diagnostics/executions/running?limit=50&offset=0")
assert response.status_code == 200
data = response.json()
assert data["total"] == 15 # running_count(10) + queued_db_count(5)
assert len(data["executions"]) == 2
assert data["executions"][0]["execution_id"] == "exec-run-1"
def test_list_orphaned_executions(mocker: pytest_mock.MockFixture):
mock_execs = [_make_mock_execution("exec-orphan-1", status="RUNNING")]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_orphaned_executions_details",
return_value=mock_execs,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=_make_mock_diagnostics(),
)
response = client.get("/admin/diagnostics/executions/orphaned?limit=50&offset=0")
assert response.status_code == 200
data = response.json()
assert data["total"] == 3 # orphaned_running(2) + orphaned_queued(1)
assert len(data["executions"]) == 1
def test_list_failed_executions(mocker: pytest_mock.MockFixture):
mock_execs = [_make_mock_failed_execution("exec-fail-1")]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_failed_executions_details",
return_value=mock_execs,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_failed_executions_count",
return_value=42,
)
response = client.get(
"/admin/diagnostics/executions/failed?limit=50&offset=0&hours=24"
)
assert response.status_code == 200
data = response.json()
assert data["total"] == 42
assert len(data["executions"]) == 1
assert data["executions"][0]["error_message"] == "Something went wrong"
def test_list_long_running_executions(mocker: pytest_mock.MockFixture):
mock_execs = [_make_mock_execution("exec-long-1")]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_long_running_executions_details",
return_value=mock_execs,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=_make_mock_diagnostics(),
)
response = client.get(
"/admin/diagnostics/executions/long-running?limit=50&offset=0"
)
assert response.status_code == 200
data = response.json()
assert data["total"] == 3 # stuck_running_24h
assert len(data["executions"]) == 1
def test_list_stuck_queued_executions(mocker: pytest_mock.MockFixture):
mock_execs = [
_make_mock_execution("exec-stuck-1", status="QUEUED", started_at=None)
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_stuck_queued_executions_details",
return_value=mock_execs,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_execution_diagnostics",
return_value=_make_mock_diagnostics(),
)
response = client.get(
"/admin/diagnostics/executions/stuck-queued?limit=50&offset=0"
)
assert response.status_code == 200
data = response.json()
assert data["total"] == 2 # stuck_queued_1h
assert len(data["executions"]) == 1
# ---------------------------------------------------------------------------
# GET endpoints: agent + schedule diagnostics
# ---------------------------------------------------------------------------
def test_get_agent_diagnostics(mocker: pytest_mock.MockFixture):
mock_diag = AgentDiagnosticsSummary(
agents_with_active_executions=7,
timestamp=datetime.now(timezone.utc).isoformat(),
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_agent_diagnostics",
return_value=mock_diag,
)
response = client.get("/admin/diagnostics/agents")
assert response.status_code == 200
data = response.json()
assert data["agents_with_active_executions"] == 7
def test_get_schedule_diagnostics(mocker: pytest_mock.MockFixture):
mock_metrics = _make_mock_schedule_health()
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_schedule_health_metrics",
return_value=mock_metrics,
)
response = client.get("/admin/diagnostics/schedules")
assert response.status_code == 200
data = response.json()
assert data["user_schedules"] == 10
assert data["total_orphaned"] == 3
assert data["total_runs_next_hour"] == 12
def test_list_all_schedules(mocker: pytest_mock.MockFixture):
mock_schedules = [
ScheduleDetail(
schedule_id="sched-1",
schedule_name="Daily Run",
graph_id="graph-1",
graph_name="My Agent",
graph_version=1,
user_id="user-1",
user_email="alice@example.com",
cron="0 9 * * *",
timezone="UTC",
next_run_time=datetime.now(timezone.utc).isoformat(),
),
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_all_schedules_details",
return_value=mock_schedules,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_schedule_health_metrics",
return_value=_make_mock_schedule_health(),
)
response = client.get("/admin/diagnostics/schedules/all?limit=50&offset=0")
assert response.status_code == 200
data = response.json()
assert data["total"] == 10
assert len(data["schedules"]) == 1
assert data["schedules"][0]["schedule_name"] == "Daily Run"
def test_list_orphaned_schedules(mocker: pytest_mock.MockFixture):
mock_orphans = [
OrphanedScheduleDetail(
schedule_id="sched-orphan-1",
schedule_name="Ghost Schedule",
graph_id="graph-deleted",
graph_version=1,
user_id="user-1",
orphan_reason="deleted_graph",
error_detail=None,
next_run_time=datetime.now(timezone.utc).isoformat(),
),
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_orphaned_schedules_details",
return_value=mock_orphans,
)
response = client.get("/admin/diagnostics/schedules/orphaned")
assert response.status_code == 200
data = response.json()
assert data["total"] == 1
assert data["schedules"][0]["orphan_reason"] == "deleted_graph"
# ---------------------------------------------------------------------------
# POST endpoints: bulk stop, cleanup, requeue
# ---------------------------------------------------------------------------
def test_stop_multiple_executions(mocker: pytest_mock.MockFixture):
mock_exec_metas = [
GraphExecutionMeta(
id=f"exec-{i}",
user_id=f"user-{i}",
graph_id="graph-123",
graph_version=1,
inputs=None,
credential_inputs=None,
nodes_input_masks=None,
preset_id=None,
status=AgentExecutionStatus.RUNNING,
started_at=datetime.now(timezone.utc),
ended_at=None,
stats=None,
)
for i in range(2)
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=mock_exec_metas,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.stop_graph_execution",
return_value=AsyncMock(),
)
response = client.post(
"/admin/diagnostics/executions/stop-bulk",
json={"execution_ids": ["exec-0", "exec-1"]},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 2
def test_stop_multiple_executions_none_found(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[],
)
response = client.post(
"/admin/diagnostics/executions/stop-bulk",
json={"execution_ids": ["nonexistent"]},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is False
assert data["stopped_count"] == 0
def test_cleanup_orphaned_executions(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.cleanup_orphaned_executions_bulk",
return_value=3,
)
response = client.post(
"/admin/diagnostics/executions/cleanup-orphaned",
json={"execution_ids": ["exec-1", "exec-2", "exec-3"]},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 3
def test_cleanup_orphaned_schedules(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.cleanup_orphaned_schedules_bulk",
return_value=2,
)
response = client.post(
"/admin/diagnostics/schedules/cleanup-orphaned",
json={"schedule_ids": ["sched-1", "sched-2"]},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["deleted_count"] == 2
def test_stop_all_long_running_executions(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.stop_all_long_running_executions",
return_value=5,
)
response = client.post("/admin/diagnostics/executions/stop-all-long-running")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 5
def test_cleanup_all_orphaned_executions(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_all_orphaned_execution_ids",
return_value=["exec-1", "exec-2"],
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.cleanup_orphaned_executions_bulk",
return_value=2,
)
response = client.post("/admin/diagnostics/executions/cleanup-all-orphaned")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 2
def test_cleanup_all_orphaned_executions_none(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_all_orphaned_execution_ids",
return_value=[],
)
response = client.post("/admin/diagnostics/executions/cleanup-all-orphaned")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 0
assert "No orphaned" in data["message"]
def test_cleanup_all_stuck_queued_executions(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.cleanup_all_stuck_queued_executions",
return_value=4,
)
response = client.post("/admin/diagnostics/executions/cleanup-all-stuck-queued")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["stopped_count"] == 4
def test_requeue_all_stuck_executions(mocker: pytest_mock.MockFixture):
mock_exec_metas = [
GraphExecutionMeta(
id=f"exec-stuck-{i}",
user_id=f"user-{i}",
graph_id="graph-123",
graph_version=1,
inputs=None,
credential_inputs=None,
nodes_input_masks=None,
preset_id=None,
status=AgentExecutionStatus.QUEUED,
started_at=None,
ended_at=None,
stats=None,
)
for i in range(3)
]
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_all_stuck_queued_execution_ids",
return_value=["exec-stuck-0", "exec-stuck-1", "exec-stuck-2"],
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=mock_exec_metas,
)
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.add_graph_execution",
return_value=AsyncMock(),
)
response = client.post("/admin/diagnostics/executions/requeue-all-stuck")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["requeued_count"] == 3
def test_requeue_all_stuck_executions_none(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_all_stuck_queued_execution_ids",
return_value=[],
)
response = client.post("/admin/diagnostics/executions/requeue-all-stuck")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["requeued_count"] == 0
assert "No stuck" in data["message"]
def test_requeue_bulk_none_found(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[],
)
response = client.post(
"/admin/diagnostics/executions/requeue-bulk",
json={"execution_ids": ["nonexistent"]},
)
assert response.status_code == 200
data = response.json()
assert data["success"] is False
assert data["requeued_count"] == 0
def test_stop_single_execution_not_found(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.admin.diagnostics_admin_routes.get_graph_executions",
return_value=[],
)
response = client.post(
"/admin/diagnostics/executions/stop",
json={"execution_id": "nonexistent"},
)
assert response.status_code == 404
assert "not found" in response.json()["detail"]

View File

@@ -14,3 +14,70 @@ class UserHistoryResponse(BaseModel):
class AddUserCreditsResponse(BaseModel):
new_balance: int
transaction_key: str
class ExecutionDiagnosticsResponse(BaseModel):
"""Response model for execution diagnostics"""
# Current execution state
running_executions: int
queued_executions_db: int
queued_executions_rabbitmq: int
cancel_queue_depth: int
# Orphaned execution detection
orphaned_running: int
orphaned_queued: int
# Failure metrics
failed_count_1h: int
failed_count_24h: int
failure_rate_24h: float
# Long-running detection
stuck_running_24h: int
stuck_running_1h: int
oldest_running_hours: float | None
# Stuck queued detection
stuck_queued_1h: int
queued_never_started: int
# Invalid state detection (data corruption - no auto-actions)
invalid_queued_with_start: int
invalid_running_without_start: int
# Throughput metrics
completed_1h: int
completed_24h: int
throughput_per_hour: float
timestamp: str
class AgentDiagnosticsResponse(BaseModel):
"""Response model for agent diagnostics"""
agents_with_active_executions: int
timestamp: str
class ScheduleHealthMetrics(BaseModel):
"""Response model for schedule diagnostics"""
total_schedules: int
user_schedules: int
system_schedules: int
# Orphan detection
orphaned_deleted_graph: int
orphaned_no_library_access: int
orphaned_invalid_credentials: int
orphaned_validation_failed: int
total_orphaned: int
# Upcoming
schedules_next_hour: int
schedules_next_24h: int
timestamp: str

View File

@@ -43,6 +43,7 @@ async def get_cost_dashboard(
model: str | None = Query(None),
block_name: str | None = Query(None),
tracking_type: str | None = Query(None),
graph_exec_id: str | None = Query(None),
):
logger.info("Admin %s fetching platform cost dashboard", admin_user_id)
return await get_platform_cost_dashboard(
@@ -53,6 +54,7 @@ async def get_cost_dashboard(
model=model,
block_name=block_name,
tracking_type=tracking_type,
graph_exec_id=graph_exec_id,
)
@@ -72,6 +74,7 @@ async def get_cost_logs(
model: str | None = Query(None),
block_name: str | None = Query(None),
tracking_type: str | None = Query(None),
graph_exec_id: str | None = Query(None),
):
logger.info("Admin %s fetching platform cost logs", admin_user_id)
logs, total = await get_platform_cost_logs(
@@ -84,6 +87,7 @@ async def get_cost_logs(
model=model,
block_name=block_name,
tracking_type=tracking_type,
graph_exec_id=graph_exec_id,
)
total_pages = (total + page_size - 1) // page_size
return PlatformCostLogsResponse(
@@ -117,6 +121,7 @@ async def export_cost_logs(
model: str | None = Query(None),
block_name: str | None = Query(None),
tracking_type: str | None = Query(None),
graph_exec_id: str | None = Query(None),
):
logger.info("Admin %s exporting platform cost logs", admin_user_id)
logs, truncated = await get_platform_cost_logs_for_export(
@@ -127,6 +132,7 @@ async def export_cost_logs(
model=model,
block_name=block_name,
tracking_type=tracking_type,
graph_exec_id=graph_exec_id,
)
return PlatformCostExportResponse(
logs=logs,

View File

@@ -32,10 +32,10 @@ router = APIRouter(
class UserRateLimitResponse(BaseModel):
user_id: str
user_email: Optional[str] = None
daily_token_limit: int
weekly_token_limit: int
daily_tokens_used: int
weekly_tokens_used: int
daily_cost_limit_microdollars: int
weekly_cost_limit_microdollars: int
daily_cost_used_microdollars: int
weekly_cost_used_microdollars: int
tier: SubscriptionTier
@@ -101,17 +101,19 @@ async def get_user_rate_limit(
logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
daily_limit, weekly_limit, tier = await get_global_rate_limits(
resolved_id, config.daily_token_limit, config.weekly_token_limit
resolved_id,
config.daily_cost_limit_microdollars,
config.weekly_cost_limit_microdollars,
)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
return UserRateLimitResponse(
user_id=resolved_id,
user_email=resolved_email,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
daily_cost_limit_microdollars=daily_limit,
weekly_cost_limit_microdollars=weekly_limit,
daily_cost_used_microdollars=usage.daily.used,
weekly_cost_used_microdollars=usage.weekly.used,
tier=tier,
)
@@ -141,7 +143,9 @@ async def reset_user_rate_limit(
raise HTTPException(status_code=500, detail="Failed to reset usage") from e
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
user_id,
config.daily_cost_limit_microdollars,
config.weekly_cost_limit_microdollars,
)
usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
@@ -154,10 +158,10 @@ async def reset_user_rate_limit(
return UserRateLimitResponse(
user_id=user_id,
user_email=resolved_email,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
daily_cost_limit_microdollars=daily_limit,
weekly_cost_limit_microdollars=weekly_limit,
daily_cost_used_microdollars=usage.daily.used,
weekly_cost_used_microdollars=usage.weekly.used,
tier=tier,
)

View File

@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
return_value=(2_500_000, 12_500_000, SubscriptionTier.BASIC),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -85,11 +85,11 @@ def test_get_rate_limit(
data = response.json()
assert data["user_id"] == target_user_id
assert data["user_email"] == _TARGET_EMAIL
assert data["daily_token_limit"] == 2_500_000
assert data["weekly_token_limit"] == 12_500_000
assert data["daily_tokens_used"] == 500_000
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
assert data["daily_cost_limit_microdollars"] == 2_500_000
assert data["weekly_cost_limit_microdollars"] == 12_500_000
assert data["daily_cost_used_microdollars"] == 500_000
assert data["weekly_cost_used_microdollars"] == 3_000_000
assert data["tier"] == "BASIC"
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
@@ -117,7 +117,7 @@ def test_get_rate_limit_by_email(
data = response.json()
assert data["user_id"] == target_user_id
assert data["user_email"] == _TARGET_EMAIL
assert data["daily_token_limit"] == 2_500_000
assert data["daily_cost_limit_microdollars"] == 2_500_000
def test_get_rate_limit_by_email_not_found(
@@ -160,10 +160,10 @@ def test_reset_user_usage_daily_only(
assert response.status_code == 200
data = response.json()
assert data["daily_tokens_used"] == 0
assert data["daily_cost_used_microdollars"] == 0
# Weekly is untouched
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
assert data["weekly_cost_used_microdollars"] == 3_000_000
assert data["tier"] == "BASIC"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
@@ -192,9 +192,9 @@ def test_reset_user_usage_daily_and_weekly(
assert response.status_code == 200
data = response.json()
assert data["daily_tokens_used"] == 0
assert data["weekly_tokens_used"] == 0
assert data["tier"] == "FREE"
assert data["daily_cost_used_microdollars"] == 0
assert data["weekly_cost_used_microdollars"] == 0
assert data["tier"] == "BASIC"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
@@ -231,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
return_value=(2_500_000, 12_500_000, SubscriptionTier.BASIC),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -324,7 +324,7 @@ def test_set_user_tier(
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
return_value=SubscriptionTier.BASIC,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
@@ -347,7 +347,7 @@ def test_set_user_tier_downgrade(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test downgrading a user's tier from PRO to FREE."""
"""Test downgrading a user's tier from PRO to BASIC."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
@@ -365,14 +365,14 @@ def test_set_user_tier_downgrade(
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "FREE"},
json={"user_id": target_user_id, "tier": "BASIC"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "FREE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
assert data["tier"] == "BASIC"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.BASIC)
def test_set_user_tier_invalid_tier(
@@ -456,7 +456,7 @@ def test_set_user_tier_db_failure(
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
return_value=SubscriptionTier.BASIC,
)
mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",

View File

@@ -2,19 +2,18 @@
import asyncio
import logging
import re
from collections.abc import AsyncGenerator
from typing import Annotated
from uuid import uuid4
from autogpt_libs import auth
from fastapi import APIRouter, HTTPException, Query, Response, Security
from fastapi.responses import StreamingResponse
from prisma.models import UserWorkspaceFile
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel, ConfigDict, Field, field_validator
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
from backend.copilot.builder_context import resolve_session_permissions
from backend.copilot.config import ChatConfig, CopilotLlmModel, CopilotMode
from backend.copilot.db import get_chat_messages_paginated
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
@@ -26,11 +25,18 @@ from backend.copilot.model import (
create_chat_session,
delete_chat_session,
get_chat_session,
get_or_create_builder_session,
get_user_sessions,
update_session_title,
)
from backend.copilot.pending_message_helpers import (
QueuePendingMessageResponse,
is_turn_in_flight,
queue_pending_for_http,
)
from backend.copilot.pending_messages import peek_pending_messages
from backend.copilot.rate_limit import (
CoPilotUsageStatus,
CoPilotUsagePublic,
RateLimitExceeded,
acquire_reset_lock,
check_rate_limit,
@@ -42,7 +48,7 @@ from backend.copilot.rate_limit import (
reset_daily_usage,
)
from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
from backend.copilot.service import strip_user_context_prefix
from backend.copilot.service import strip_injected_context_for_display
from backend.copilot.tools.e2b_sandbox import kill_sandbox
from backend.copilot.tools.models import (
AgentDetailsResponse,
@@ -61,17 +67,22 @@ from backend.copilot.tools.models import (
InputValidationErrorResponse,
MCPToolOutputResponse,
MCPToolsDiscoveredResponse,
MemoryForgetCandidatesResponse,
MemoryForgetConfirmResponse,
MemorySearchResponse,
MemoryStoreResponse,
NeedLoginResponse,
NoResultsResponse,
SetupRequirementsResponse,
SuggestedGoalResponse,
TodoWriteResponse,
UnderstandingUpdatedResponse,
)
from backend.copilot.tracking import track_user_message
from backend.data.credit import UsageTransactionMetadata, get_user_credit_model
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.data.workspace import build_files_block, resolve_workspace_files
from backend.util.exceptions import InsufficientBalanceError, NotFoundError
from backend.util.settings import Settings
@@ -81,10 +92,6 @@ logger = logging.getLogger(__name__)
config = ChatConfig()
_UUID_RE = re.compile(
r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
)
async def _validate_and_get_session(
session_id: str,
@@ -103,21 +110,22 @@ router = APIRouter(
def _strip_injected_context(message: dict) -> dict:
"""Hide the server-side `<user_context>` prefix from the API response.
"""Hide server-injected context blocks from the API response.
Returns a **shallow copy** of *message* with the prefix removed from
``content`` (if applicable). The original dict is never mutated, so
callers can safely pass live session dicts without risking side-effects.
Returns a **shallow copy** of *message* with all server-injected XML
blocks removed from ``content`` (if applicable). The original dict is
never mutated, so callers can safely pass live session dicts without
risking side-effects.
The strip is delegated to ``strip_user_context_prefix`` in
``backend.copilot.service`` so the on-the-wire format stays in lockstep
with ``inject_user_context`` (the writer). Only ``user``-role messages
with string content are touched; assistant / multimodal blocks pass
through unchanged.
Handles all three injected block types — ``<memory_context>``,
``<env_context>``, and ``<user_context>`` — regardless of the order they
appear at the start of the message. Only ``user``-role messages with
string content are touched; assistant / multimodal blocks pass through
unchanged.
"""
if message.get("role") == "user" and isinstance(message.get("content"), str):
result = message.copy()
result["content"] = strip_user_context_prefix(message["content"])
result["content"] = strip_injected_context_for_display(message["content"])
return result
return message
@@ -128,7 +136,7 @@ def _strip_injected_context(message: dict) -> dict:
class StreamChatRequest(BaseModel):
"""Request model for streaming chat with optional context."""
message: str
message: str = Field(max_length=64_000)
is_user_message: bool = True
context: dict[str, str] | None = None # {url: str, content: str}
file_ids: list[str] | None = Field(
@@ -146,16 +154,45 @@ class StreamChatRequest(BaseModel):
)
class CreateSessionRequest(BaseModel):
"""Request model for creating a new chat session.
class PeekPendingMessagesResponse(BaseModel):
"""Response for the pending-message peek (GET) endpoint.
Returns a read-only view of the pending buffer — messages are NOT
consumed. The frontend uses this to restore the queued-message
indicator after a page refresh and to decide when to clear it once
a turn has ended.
"""
messages: list[str]
count: int
class CreateSessionRequest(BaseModel):
"""Request model for creating (or get-or-creating) a chat session.
Two modes, selected by the body:
- Default: create a fresh session. ``dry_run`` is a **top-level**
field — do not nest it inside ``metadata``.
- Builder-bound: when ``builder_graph_id`` is set, the endpoint
switches to **get-or-create** keyed on
``(user_id, builder_graph_id)``. The builder panel calls this on
mount so the chat persists across refreshes. Graph ownership is
validated inside :func:`get_or_create_builder_session`. Write-side
scope is enforced per-tool (``edit_agent`` / ``run_agent`` reject
any ``agent_id`` other than the bound graph) and a small blacklist
hides tools that conflict with the panel's scope
(``create_agent`` / ``customize_agent`` / ``get_agent_building_guide``
— see :data:`BUILDER_BLOCKED_TOOLS`). Read-side lookups
(``find_block``, ``find_agent``, ``search_docs``, …) stay open.
``dry_run`` is a **top-level** field — do not nest it inside ``metadata``.
Extra/unknown fields are rejected (422) to prevent silent mis-use.
"""
model_config = ConfigDict(extra="forbid")
dry_run: bool = False
builder_graph_id: str | None = Field(default=None, max_length=128)
class CreateSessionResponse(BaseModel):
@@ -300,29 +337,43 @@ async def create_session(
user_id: Annotated[str, Security(auth.get_user_id)],
request: CreateSessionRequest | None = None,
) -> CreateSessionResponse:
"""
Create a new chat session.
"""Create (or get-or-create) a chat session.
Initiates a new chat session for the authenticated user.
Two modes, selected by the request body:
- Default: create a fresh session for the user. ``dry_run=True`` forces
run_block and run_agent calls to use dry-run simulation.
- Builder-bound: when ``builder_graph_id`` is set, get-or-create keyed
on ``(user_id, builder_graph_id)``. Returns the existing session for
that graph or creates one locked to it. Graph ownership is validated
inside :func:`get_or_create_builder_session`; raises 404 on
unauthorized access. Write-side scope is enforced per-tool
(``edit_agent`` / ``run_agent`` reject any ``agent_id`` other than
the bound graph) and a small blacklist hides tools that conflict
with the panel's scope (see :data:`BUILDER_BLOCKED_TOOLS`).
Args:
user_id: The authenticated user ID parsed from the JWT (required).
request: Optional request body. When provided, ``dry_run=True``
forces run_block and run_agent calls to use dry-run simulation.
request: Optional request body with ``dry_run`` and/or
``builder_graph_id``.
Returns:
CreateSessionResponse: Details of the created session.
CreateSessionResponse: Details of the resulting session.
"""
dry_run = request.dry_run if request else False
builder_graph_id = request.builder_graph_id if request else None
logger.info(
f"Creating session with user_id: "
f"...{user_id[-8:] if len(user_id) > 8 else '<redacted>'}"
f"{', dry_run=True' if dry_run else ''}"
f"{f', builder_graph_id={builder_graph_id}' if builder_graph_id else ''}"
)
session = await create_chat_session(user_id, dry_run=dry_run)
if builder_graph_id:
session = await get_or_create_builder_session(user_id, builder_graph_id)
else:
session = await create_chat_session(user_id, dry_run=dry_run)
return CreateSessionResponse(
id=session.session_id,
@@ -381,6 +432,31 @@ async def delete_session(
return Response(status_code=204)
@router.delete(
"/sessions/{session_id}/stream",
dependencies=[Security(auth.requires_user)],
status_code=204,
)
async def disconnect_session_stream(
session_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> Response:
"""Disconnect all active SSE listeners for a session.
Called by the frontend when the user switches away from a chat so the
backend releases XREAD listeners immediately rather than waiting for
the 5-10 s timeout.
"""
session = await get_chat_session(session_id, user_id)
if not session:
raise HTTPException(
status_code=404,
detail=f"Session {session_id} not found or access denied",
)
await stream_registry.disconnect_all_listeners(session_id)
return Response(status_code=204)
@router.patch(
"/sessions/{session_id}/title",
summary="Update session title",
@@ -432,22 +508,13 @@ async def get_session(
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
When no pagination params are provided, returns the most recent messages.
Args:
session_id: The unique identifier for the desired chat session.
user_id: The authenticated user's ID.
limit: Maximum number of messages to return (1-200, default 50).
before_sequence: Return messages with sequence < this value (cursor).
Returns:
SessionDetailResponse: Details for the requested session, including
active_stream info and pagination metadata.
"""
page = await get_chat_messages_paginated(
session_id, limit, before_sequence, user_id=user_id
)
if page is None:
raise NotFoundError(f"Session {session_id} not found.")
messages = [
_strip_injected_context(message.model_dump()) for message in page.messages
]
@@ -458,10 +525,6 @@ async def get_session(
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
@@ -506,23 +569,27 @@ async def get_session(
)
async def get_copilot_usage(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> CoPilotUsageStatus:
) -> CoPilotUsagePublic:
"""Get CoPilot usage status for the authenticated user.
Returns current token usage vs limits for daily and weekly windows.
Global defaults sourced from LaunchDarkly (falling back to config).
Includes the user's rate-limit tier.
Returns the percentage of the daily/weekly allowance used — not the
raw spend or cap — so clients cannot derive per-turn cost or platform
margins. Global defaults sourced from LaunchDarkly (falling back to
config). Includes the user's rate-limit tier.
"""
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
user_id,
config.daily_cost_limit_microdollars,
config.weekly_cost_limit_microdollars,
)
return await get_usage_status(
status = await get_usage_status(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_cost_limit=daily_limit,
weekly_cost_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
return CoPilotUsagePublic.from_status(status)
class RateLimitResetResponse(BaseModel):
@@ -531,7 +598,9 @@ class RateLimitResetResponse(BaseModel):
success: bool
credits_charged: int = Field(description="Credits charged (in cents)")
remaining_balance: int = Field(description="Credit balance after charge (in cents)")
usage: CoPilotUsageStatus = Field(description="Updated usage status after reset")
usage: CoPilotUsagePublic = Field(
description="Updated usage status after reset (percentages only)"
)
@router.post(
@@ -555,7 +624,7 @@ async def reset_copilot_usage(
) -> RateLimitResetResponse:
"""Reset the daily CoPilot rate limit by spending credits.
Allows users who have hit their daily token limit to spend credits
Allows users who have hit their daily cost limit to spend credits
to reset their daily usage counter and continue working.
Returns 400 if the feature is disabled or the user is not over the limit.
Returns 402 if the user has insufficient credits.
@@ -574,7 +643,9 @@ async def reset_copilot_usage(
)
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
user_id,
config.daily_cost_limit_microdollars,
config.weekly_cost_limit_microdollars,
)
if daily_limit <= 0:
@@ -611,8 +682,8 @@ async def reset_copilot_usage(
# used for limit checks, not returned to the client.)
usage_status = await get_usage_status(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_cost_limit=daily_limit,
weekly_cost_limit=weekly_limit,
tier=tier,
)
if daily_limit > 0 and usage_status.daily.used < daily_limit:
@@ -647,7 +718,7 @@ async def reset_copilot_usage(
# Reset daily usage in Redis. If this fails, refund the credits
# so the user is not charged for a service they did not receive.
if not await reset_daily_usage(user_id, daily_token_limit=daily_limit):
if not await reset_daily_usage(user_id, daily_cost_limit=daily_limit):
# Compensate: refund the charged credits.
refunded = False
try:
@@ -683,11 +754,11 @@ async def reset_copilot_usage(
finally:
await release_reset_lock(user_id)
# Return updated usage status.
# Return updated usage status (public schema — percentages only).
updated_usage = await get_usage_status(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_cost_limit=daily_limit,
weekly_cost_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
@@ -696,7 +767,7 @@ async def reset_copilot_usage(
success=True,
credits_charged=cost,
remaining_balance=remaining,
usage=updated_usage,
usage=CoPilotUsagePublic.from_status(updated_usage),
)
@@ -747,36 +818,52 @@ async def cancel_session_task(
@router.post(
"/sessions/{session_id}/stream",
responses={
202: {
"model": QueuePendingMessageResponse,
"description": (
"Session has a turn in flight — message queued into the pending "
"buffer and will be picked up between tool-call rounds by the "
"executor currently processing the turn."
),
},
404: {"description": "Session not found or access denied"},
429: {"description": "Cost rate-limit or call-frequency cap exceeded"},
},
)
async def stream_chat_post(
session_id: str,
request: StreamChatRequest,
user_id: str = Security(auth.get_user_id),
):
"""
Stream chat responses for a session (POST with context support).
"""Start a new turn OR queue a follow-up — decided server-side.
Streams the AI/completion responses in real time over Server-Sent Events (SSE), including:
- Text fragments as they are generated
- Tool call UI elements (if invoked)
- Tool execution results
- **Session idle**: starts a turn. Returns an SSE stream (``text/event-stream``)
with Vercel AI SDK chunks (text fragments, tool-call UI, tool results).
The generation runs in a background task that survives client disconnects;
reconnect via ``GET /sessions/{session_id}/stream`` to resume.
The AI generation runs in a background task that continues even if the client disconnects.
All chunks are written to a per-turn Redis stream for reconnection support. If the client
disconnects, they can reconnect using GET /sessions/{session_id}/stream to resume.
- **Session has a turn in flight**: pushes the message into the per-session
pending buffer and returns ``202 application/json`` with
``QueuePendingMessageResponse``. The executor running the current turn
drains the buffer between tool-call rounds (baseline) or at the start of
the next turn (SDK). Clients should detect the 202 and surface the
message as a queued-chip in the UI.
Args:
session_id: The chat session identifier to associate with the streamed messages.
request: Request body containing message, is_user_message, and optional context.
session_id: The chat session identifier.
request: Request body with message, is_user_message, and optional context.
user_id: Authenticated user ID.
Returns:
StreamingResponse: SSE-formatted response chunks.
"""
import asyncio
import time
stream_start_time = time.perf_counter()
# Wall-clock arrival time, propagated to the executor so the turn-start
# drain can order pending messages relative to this request (pending
# pushed BEFORE this instant were typed earlier; pending pushed AFTER
# are race-path follow-ups typed while /stream was still processing).
request_arrival_at = time.time()
log_meta = {"component": "ChatStream", "session_id": session_id, "user_id": user_id}
logger.info(
@@ -784,7 +871,28 @@ async def stream_chat_post(
f"user={user_id}, message_len={len(request.message)}",
extra={"json_fields": log_meta},
)
await _validate_and_get_session(session_id, user_id)
session = await _validate_and_get_session(session_id, user_id)
builder_permissions = resolve_session_permissions(session)
# Self-defensive queue-fallback: if a turn is already running, don't race
# it on the cluster lock — drop the message into the pending buffer and
# return 202 so the caller can render a chip. Both UI chips and autopilot
# block follow-ups route through this path; keeping the decision on the
# server means every caller gets uniform behaviour.
if (
request.is_user_message
and request.message
and await is_turn_in_flight(session_id)
):
response = await queue_pending_for_http(
session_id=session_id,
user_id=user_id,
message=request.message,
context=request.context,
file_ids=request.file_ids,
)
return JSONResponse(status_code=202, content=response.model_dump())
logger.info(
f"[TIMING] session validated in {(time.perf_counter() - stream_start_time) * 1000:.1f}ms",
extra={
@@ -795,18 +903,20 @@ async def stream_chat_post(
},
)
# Pre-turn rate limit check (token-based).
# Pre-turn rate limit check (cost-based, microdollars).
# check_rate_limit short-circuits internally when both limits are 0.
# Global defaults sourced from LaunchDarkly, falling back to config.
if user_id:
try:
daily_limit, weekly_limit, _ = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
user_id,
config.daily_cost_limit_microdollars,
config.weekly_cost_limit_microdollars,
)
await check_rate_limit(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_cost_limit=daily_limit,
weekly_cost_limit=weekly_limit,
)
except RateLimitExceeded as e:
raise HTTPException(status_code=429, detail=str(e)) from e
@@ -815,89 +925,75 @@ async def stream_chat_post(
# Also sanitise file_ids so only validated, workspace-scoped IDs are
# forwarded downstream (e.g. to the executor via enqueue_copilot_turn).
sanitized_file_ids: list[str] | None = None
if request.file_ids and user_id:
# Filter to valid UUIDs only to prevent DB abuse
valid_ids = [fid for fid in request.file_ids if _UUID_RE.match(fid)]
if valid_ids:
workspace = await get_or_create_workspace(user_id)
# Batch query instead of N+1
files = await UserWorkspaceFile.prisma().find_many(
where={
"id": {"in": valid_ids},
"workspaceId": workspace.id,
"isDeleted": False,
}
)
# Only keep IDs that actually exist in the user's workspace
sanitized_file_ids = [wf.id for wf in files] or None
file_lines: list[str] = [
f"- {wf.name} ({wf.mimeType}, {round(wf.sizeBytes / 1024, 1)} KB), file_id={wf.id}"
for wf in files
]
if file_lines:
files_block = (
"\n\n[Attached files]\n"
+ "\n".join(file_lines)
+ "\nUse read_workspace_file with the file_id to access file contents."
)
request.message += files_block
if request.file_ids:
files = await resolve_workspace_files(user_id, request.file_ids)
sanitized_file_ids = [wf.id for wf in files] or None
request.message += build_files_block(files)
# Atomically append user message to session BEFORE creating task to avoid
# race condition where GET_SESSION sees task as "running" but message isn't
# saved yet. append_and_save_message re-fetches inside a lock to prevent
# message loss from concurrent requests.
# saved yet. append_and_save_message returns None when a duplicate is
# detected — in that case skip enqueue to avoid processing the message twice.
is_duplicate_message = False
if request.message:
message = ChatMessage(
role="user" if request.is_user_message else "assistant",
content=request.message,
)
if request.is_user_message:
logger.info(f"[STREAM] Saving user message to session {session_id}")
is_duplicate_message = (
await append_and_save_message(session_id, message)
) is None
logger.info(f"[STREAM] User message saved for session {session_id}")
if not is_duplicate_message and request.is_user_message:
track_user_message(
user_id=user_id,
session_id=session_id,
message_length=len(request.message),
)
logger.info(f"[STREAM] Saving user message to session {session_id}")
await append_and_save_message(session_id, message)
logger.info(f"[STREAM] User message saved for session {session_id}")
# Create a task in the stream registry for reconnection support
turn_id = str(uuid4())
log_meta["turn_id"] = turn_id
session_create_start = time.perf_counter()
await stream_registry.create_session(
session_id=session_id,
user_id=user_id,
tool_call_id="chat_stream",
tool_name="chat",
turn_id=turn_id,
)
logger.info(
f"[TIMING] create_session completed in {(time.perf_counter() - session_create_start) * 1000:.1f}ms",
extra={
"json_fields": {
**log_meta,
"duration_ms": (time.perf_counter() - session_create_start) * 1000,
}
},
)
# Per-turn stream is always fresh (unique turn_id), subscribe from beginning
subscribe_from_id = "0-0"
await enqueue_copilot_turn(
session_id=session_id,
user_id=user_id,
message=request.message,
turn_id=turn_id,
is_user_message=request.is_user_message,
context=request.context,
file_ids=sanitized_file_ids,
mode=request.mode,
model=request.model,
)
# Create a task in the stream registry for reconnection support.
# For duplicate messages, skip create_session entirely so the infra-retry
# client subscribes to the *existing* turn's Redis stream and receives the
# in-progress executor output rather than an empty stream.
turn_id = ""
if not is_duplicate_message:
turn_id = str(uuid4())
log_meta["turn_id"] = turn_id
session_create_start = time.perf_counter()
await stream_registry.create_session(
session_id=session_id,
user_id=user_id,
tool_call_id="chat_stream",
tool_name="chat",
turn_id=turn_id,
)
logger.info(
f"[TIMING] create_session completed in {(time.perf_counter() - session_create_start) * 1000:.1f}ms",
extra={
"json_fields": {
**log_meta,
"duration_ms": (time.perf_counter() - session_create_start) * 1000,
}
},
)
await enqueue_copilot_turn(
session_id=session_id,
user_id=user_id,
message=request.message,
turn_id=turn_id,
is_user_message=request.is_user_message,
context=request.context,
file_ids=sanitized_file_ids,
mode=request.mode,
model=request.model,
permissions=builder_permissions,
request_arrival_at=request_arrival_at,
)
else:
logger.info(
f"[STREAM] Duplicate message detected for session {session_id}, skipping enqueue"
)
setup_time = (time.perf_counter() - stream_start_time) * 1000
logger.info(
@@ -905,6 +1001,9 @@ async def stream_chat_post(
extra={"json_fields": {**log_meta, "setup_time_ms": setup_time}},
)
# Per-turn stream is always fresh (unique turn_id), subscribe from beginning
subscribe_from_id = "0-0"
# SSE endpoint that subscribes to the task's stream
async def event_generator() -> AsyncGenerator[str, None]:
import time as time_module
@@ -929,7 +1028,6 @@ async def stream_chat_post(
if subscriber_queue is None:
yield StreamFinish().to_sse()
yield "data: [DONE]\n\n"
return
# Read from the subscriber queue and yield to SSE
@@ -959,7 +1057,6 @@ async def stream_chat_post(
yield chunk.to_sse()
# Check for finish signal
if isinstance(chunk, StreamFinish):
total_time = time_module.perf_counter() - event_gen_start
logger.info(
@@ -974,6 +1071,7 @@ async def stream_chat_post(
},
)
break
except asyncio.TimeoutError:
yield StreamHeartbeat().to_sse()
@@ -988,7 +1086,6 @@ async def stream_chat_post(
}
},
)
pass # Client disconnected - background task continues
except Exception as e:
elapsed = (time_module.perf_counter() - event_gen_start) * 1000
logger.error(
@@ -1042,6 +1139,31 @@ async def stream_chat_post(
)
@router.get(
"/sessions/{session_id}/messages/pending",
response_model=PeekPendingMessagesResponse,
responses={
404: {"description": "Session not found or access denied"},
},
)
async def get_pending_messages(
session_id: str,
user_id: str = Security(auth.get_user_id),
):
"""Peek at the pending-message buffer without consuming it.
Returns the current contents of the session's pending message buffer
so the frontend can restore the queued-message indicator after a page
refresh and clear it correctly once a turn drains the buffer.
"""
await _validate_and_get_session(session_id, user_id)
pending = await peek_pending_messages(session_id)
return PeekPendingMessagesResponse(
messages=[m.content for m in pending],
count=len(pending),
)
@router.get(
"/sessions/{session_id}/stream",
)
@@ -1294,6 +1416,11 @@ ToolResponseUnion = (
| DocPageResponse
| MCPToolsDiscoveredResponse
| MCPToolOutputResponse
| MemoryStoreResponse
| MemorySearchResponse
| MemoryForgetCandidatesResponse
| MemoryForgetConfirmResponse
| TodoWriteResponse
)

View File

@@ -7,6 +7,7 @@ allowing frontend code generators like Orval to create corresponding TypeScript
from pydantic import BaseModel, Field
from backend.data.model import CredentialsType
from backend.integrations.providers import ProviderName
from backend.sdk.registry import AutoRegistry
@@ -47,6 +48,57 @@ class ProviderNamesResponse(BaseModel):
)
class ProviderMetadata(BaseModel):
"""Display metadata for a provider, shown in the settings integrations UI."""
name: str = Field(description="Provider slug (e.g. ``github``)")
description: str | None = Field(
default=None,
description=(
"One-line human-readable summary of what the provider does. "
"Declared via ``ProviderBuilder.with_description(...)`` in the "
"provider's ``_config.py``. ``None`` if not set."
),
)
supported_auth_types: list[CredentialsType] = Field(
default_factory=list,
description=(
"Credential types this provider accepts. Drives which connection "
"tabs the settings UI renders for the provider. Empty list means "
"no auth types declared."
),
)
def get_supported_auth_types(name: str) -> list[CredentialsType]:
"""Return the provider's supported credential types from :class:`AutoRegistry`.
Populated by :meth:`ProviderBuilder.with_supported_auth_types` (or by
``with_oauth`` / ``with_api_key`` / ``with_user_password`` when the provider
uses the full builder chain). Returns an empty list for providers with no
auth types declared.
"""
provider = AutoRegistry.get_provider(name)
if provider is None:
return []
return sorted(provider.supported_auth_types)
def get_provider_description(name: str) -> str | None:
"""Return the provider's description from :class:`AutoRegistry`.
Descriptions are declared via ``ProviderBuilder.with_description(...)`` in
the provider's ``_config.py`` (SDK path) or in
``blocks/_static_provider_configs.py`` (for providers that don't yet have
their own directory). Returns ``None`` for providers with no registered
description.
"""
provider = AutoRegistry.get_provider(name)
if provider is None:
return None
return provider.description
class ProviderConstants(BaseModel):
"""
Model that exposes all provider names as a constant in the OpenAPI schema.

View File

@@ -14,7 +14,7 @@ from fastapi import (
Security,
status,
)
from pydantic import BaseModel, Field, SecretStr, model_validator
from pydantic import BaseModel, Field, model_validator
from starlette.status import HTTP_500_INTERNAL_SERVER_ERROR, HTTP_502_BAD_GATEWAY
from backend.api.features.library.db import set_preset_webhook, update_preset
@@ -29,15 +29,14 @@ from backend.data.integrations import (
wait_for_webhook_event,
)
from backend.data.model import (
APIKeyCredentials,
Credentials,
CredentialsType,
HostScopedCredentials,
OAuth2Credentials,
UserIntegrations,
is_sdk_default,
)
from backend.data.onboarding import OnboardingStep, complete_onboarding_step
from backend.data.user import get_user_integrations
from backend.executor.utils import add_graph_execution
from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
from backend.integrations.credentials_store import (
@@ -48,7 +47,14 @@ from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
create_mcp_oauth_handler,
)
from backend.integrations.managed_credentials import ensure_managed_credentials
from backend.integrations.managed_credentials import (
ensure_managed_credential,
ensure_managed_credentials,
)
from backend.integrations.managed_providers.ayrshare import AyrshareManagedProvider
from backend.integrations.managed_providers.ayrshare import (
settings_available as ayrshare_settings_available,
)
from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
from backend.integrations.providers import ProviderName
from backend.integrations.webhooks import get_webhook_manager
@@ -60,7 +66,14 @@ from backend.util.exceptions import (
)
from backend.util.settings import Settings
from .models import ProviderConstants, ProviderNamesResponse, get_all_provider_names
from .models import (
ProviderConstants,
ProviderMetadata,
ProviderNamesResponse,
get_all_provider_names,
get_provider_description,
get_supported_auth_types,
)
if TYPE_CHECKING:
from backend.integrations.oauth import BaseOAuthHandler
@@ -87,14 +100,23 @@ async def login(
scopes: Annotated[
str, Query(title="Comma-separated list of authorization scopes")
] = "",
credential_id: Annotated[
str | None,
Query(title="ID of existing credential to upgrade scopes for"),
] = None,
) -> LoginResponse:
handler = _get_provider_oauth_handler(request, provider)
requested_scopes = scopes.split(",") if scopes else []
if credential_id:
requested_scopes = await _prepare_scope_upgrade(
user_id, provider, credential_id, requested_scopes
)
# Generate and store a secure random state token along with the scopes
state_token, code_challenge = await creds_manager.store.store_state_token(
user_id, provider, requested_scopes
user_id, provider, requested_scopes, credential_id=credential_id
)
login_url = handler.get_login_url(
requested_scopes, state_token, code_challenge=code_challenge
@@ -216,7 +238,9 @@ async def callback(
)
# TODO: Allow specifying `title` to set on `credentials`
await creds_manager.create(user_id, credentials)
credentials = await _merge_or_create_credential(
user_id, provider, credentials, valid_state.credential_id
)
logger.debug(
f"Successfully processed OAuth callback for user {user_id} "
@@ -226,13 +250,38 @@ async def callback(
return to_meta_response(credentials)
# Bound the first-time sweep so a slow upstream (e.g. Ayrshare) can't hang
# the credential-list endpoint. On timeout we still kick off a fire-and-
# forget sweep so provisioning eventually completes; the user just won't
# see the managed cred until the next refresh.
_MANAGED_PROVISION_TIMEOUT_S = 10.0
async def _ensure_managed_credentials_bounded(user_id: str) -> None:
try:
await asyncio.wait_for(
ensure_managed_credentials(user_id, creds_manager.store),
timeout=_MANAGED_PROVISION_TIMEOUT_S,
)
except asyncio.TimeoutError:
logger.warning(
"Managed credential sweep exceeded %.1fs for user=%s; "
"continuing without it — provisioning will complete in background",
_MANAGED_PROVISION_TIMEOUT_S,
user_id,
)
asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
@router.get("/credentials", summary="List Credentials")
async def list_credentials(
user_id: Annotated[str, Security(get_user_id)],
) -> list[CredentialsMetaResponse]:
# Fire-and-forget: provision missing managed credentials in the background.
# The credential appears on the next page load; listing is never blocked.
asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
# Block on provisioning so managed credentials appear on the first load
# instead of after a refresh, but with a timeout so a slow upstream
# can't hang the endpoint. `_provisioned_users` short-circuits on
# repeat calls.
await _ensure_managed_credentials_bounded(user_id)
credentials = await creds_manager.store.get_all_creds(user_id)
return [
@@ -247,7 +296,7 @@ async def list_credentials_by_provider(
],
user_id: Annotated[str, Security(get_user_id)],
) -> list[CredentialsMetaResponse]:
asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
await _ensure_managed_credentials_bounded(user_id)
credentials = await creds_manager.store.get_creds_by_provider(user_id, provider)
return [
@@ -281,6 +330,115 @@ async def get_credential(
return to_meta_response(credential)
class PickerTokenResponse(BaseModel):
"""Short-lived OAuth access token shipped to the browser for rendering a
provider-hosted picker UI (e.g. Google Drive Picker). Deliberately narrow:
only the fields the client needs to initialize the picker widget. Issued
from the user's own stored credential so ownership and scope gating are
enforced by the credential lookup."""
access_token: str = Field(
description="OAuth access token suitable for the picker SDK call."
)
access_token_expires_at: int | None = Field(
default=None,
description="Unix timestamp at which the access token expires, if known.",
)
# Allowlist of (provider, scopes) tuples that may mint picker tokens. Only
# Drive-picker-capable scopes qualify so a caller can't use this endpoint to
# extract a GitHub / other-provider OAuth token for unrelated purposes. If a
# future provider integrates a hosted picker that needs a raw access token,
# add its specific picker-relevant scopes here.
_PICKER_TOKEN_ALLOWED_SCOPES: dict[ProviderName, frozenset[str]] = {
ProviderName.GOOGLE: frozenset(
[
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/drive",
]
),
}
@router.post(
"/{provider}/credentials/{cred_id}/picker-token",
summary="Issue a short-lived access token for a provider-hosted picker",
operation_id="postV1GetPickerToken",
)
async def get_picker_token(
provider: Annotated[
ProviderName, Path(title="The provider that owns the credentials")
],
cred_id: Annotated[
str, Path(title="The ID of the OAuth2 credentials to mint a token from")
],
user_id: Annotated[str, Security(get_user_id)],
) -> PickerTokenResponse:
"""Return the raw access token for an OAuth2 credential so the frontend
can initialize a provider-hosted picker (e.g. Google Drive Picker).
`GET /{provider}/credentials/{cred_id}` deliberately strips secrets (see
`CredentialsMetaResponse` + `TestGetCredentialReturnsMetaOnly` in
`router_test.py`). That hardening broke the Drive picker, which needs the
raw access token to call `google.picker.Builder.setOAuthToken(...)`. This
endpoint carves a narrow, explicit hole: the caller must own the
credential, it must be OAuth2, and the endpoint returns only the access
token + its expiry — nothing else about the credential. SDK-default
credentials are excluded for the same reason as `get_credential`.
"""
if is_sdk_default(cred_id):
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
)
credential = await creds_manager.get(user_id, cred_id)
if not credential:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
)
if not provider_matches(credential.provider, provider):
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
)
if not isinstance(credential, OAuth2Credentials):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Picker tokens are only available for OAuth2 credentials",
)
if not credential.access_token:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Credential has no access token; reconnect the account",
)
# Gate on provider+scope: only credentials that actually grant access to
# a provider-hosted picker flow may mint a token through this endpoint.
# Prevents using this path to extract bearer tokens for unrelated OAuth
# integrations (e.g. GitHub) that happen to be stored under the same user.
allowed_scopes = _PICKER_TOKEN_ALLOWED_SCOPES.get(provider)
if not allowed_scopes:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=(f"Picker tokens are not available for provider '{provider.value}'"),
)
cred_scopes = set(credential.scopes or [])
if cred_scopes.isdisjoint(allowed_scopes):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=(
"Credential does not grant any scope eligible for the picker. "
"Reconnect with the appropriate scope."
),
)
return PickerTokenResponse(
access_token=credential.access_token.get_secret_value(),
access_token_expires_at=credential.access_token_expires_at,
)
@router.post("/{provider}/credentials", status_code=201, summary="Create Credentials")
async def create_credentials(
user_id: Annotated[str, Security(get_user_id)],
@@ -574,6 +732,186 @@ async def _execute_webhook_preset_trigger(
# Continue processing - webhook should be resilient to individual failures
# -------------------- INCREMENTAL AUTH HELPERS -------------------- #
async def _prepare_scope_upgrade(
user_id: str,
provider: ProviderName,
credential_id: str,
requested_scopes: list[str],
) -> list[str]:
"""Validate an existing credential for scope upgrade and compute scopes.
For providers without native incremental auth (e.g. GitHub), returns the
union of existing + requested scopes. For providers that handle merging
server-side (e.g. Google with ``include_granted_scopes``), returns the
requested scopes unchanged.
Raises HTTPException on validation failure.
"""
# Platform-owned system credentials must never be upgraded — scope
# changes here would leak across every user that shares them.
if is_system_credential(credential_id):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="System credentials cannot be upgraded",
)
existing = await creds_manager.store.get_creds_by_id(user_id, credential_id)
if not existing:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Credential to upgrade not found",
)
if not isinstance(existing, OAuth2Credentials):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Only OAuth2 credentials can be upgraded",
)
if not provider_matches(existing.provider, provider.value):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Credential provider does not match the requested provider",
)
if existing.is_managed:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Managed credentials cannot be upgraded",
)
# Google handles scope merging via include_granted_scopes; others need
# the union of existing + new scopes in the login URL.
if provider != ProviderName.GOOGLE:
requested_scopes = list(set(requested_scopes) | set(existing.scopes))
return requested_scopes
async def _merge_or_create_credential(
user_id: str,
provider: ProviderName,
credentials: OAuth2Credentials,
credential_id: str | None,
) -> OAuth2Credentials:
"""Either upgrade an existing credential or create a new one.
When *credential_id* is set (explicit upgrade), merges scopes and updates
the existing credential. Otherwise, checks for an implicit merge (same
provider + username) before falling back to creating a new credential.
"""
if credential_id:
return await _upgrade_existing_credential(user_id, credential_id, credentials)
# Implicit merge: check for existing credential with same provider+username.
# Skip managed/system credentials and require a non-None username on both
# sides so we never accidentally merge unrelated credentials.
if credentials.username is None:
await creds_manager.create(user_id, credentials)
return credentials
existing_creds = await creds_manager.store.get_creds_by_provider(user_id, provider)
matching = next(
(
c
for c in existing_creds
if isinstance(c, OAuth2Credentials)
and not c.is_managed
and not is_system_credential(c.id)
and c.username is not None
and c.username == credentials.username
),
None,
)
if matching:
# Only merge into the existing credential when the new token
# already covers every scope we're about to advertise on it.
# Without this guard we'd overwrite ``matching.access_token`` with
# a narrower token while storing a wider ``scopes`` list — the
# record would claim authorizations the token does not grant, and
# blocks using the lost scopes would fail with opaque 401/403s
# until the user hits re-auth. On a narrowing login, keep the
# two credentials separate instead.
if set(credentials.scopes).issuperset(set(matching.scopes)):
return await _upgrade_existing_credential(user_id, matching.id, credentials)
await creds_manager.create(user_id, credentials)
return credentials
async def _upgrade_existing_credential(
user_id: str,
existing_cred_id: str,
new_credentials: OAuth2Credentials,
) -> OAuth2Credentials:
"""Merge scopes from *new_credentials* into an existing credential."""
# Defense-in-depth: re-check system and provider invariants right before
# the write. The login-time check in `_prepare_scope_upgrade` can go stale
# by the time the callback runs, and the implicit-merge path bypasses
# login-time validation entirely, so every write-path must enforce these
# on its own.
if is_system_credential(existing_cred_id):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="System credentials cannot be upgraded",
)
existing = await creds_manager.store.get_creds_by_id(user_id, existing_cred_id)
if not existing or not isinstance(existing, OAuth2Credentials):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Credential to upgrade not found",
)
if existing.is_managed:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Managed credentials cannot be upgraded",
)
if not provider_matches(existing.provider, new_credentials.provider):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Credential provider does not match the requested provider",
)
if (
existing.username
and new_credentials.username
and existing.username != new_credentials.username
):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Username mismatch: authenticated as a different user",
)
# Operate on a copy so the caller's ``new_credentials`` object is not
# mutated out from under them. Every caller today immediately discards
# or replaces its reference, but the implicit-merge path in
# ``_merge_or_create_credential`` reads ``credentials.scopes`` before
# calling into us — a future reader after the call would otherwise
# silently see the overwritten values.
merged = new_credentials.model_copy(deep=True)
merged.id = existing.id
merged.title = existing.title
merged.scopes = list(set(existing.scopes) | set(new_credentials.scopes))
merged.metadata = {
**(existing.metadata or {}),
**(new_credentials.metadata or {}),
}
# Preserve the existing refresh_token and username if the incremental
# response doesn't carry them. Providers like Google only return a
# refresh_token on first authorization — dropping it here would orphan
# the credential on the next access-token expiry, forcing the user to
# re-auth from scratch. Username is similarly sticky: if we've already
# resolved it for this credential, keep it rather than silently
# blanking it on an incremental upgrade.
if not merged.refresh_token and existing.refresh_token:
merged.refresh_token = existing.refresh_token
merged.refresh_token_expires_at = existing.refresh_token_expires_at
if not merged.username and existing.username:
merged.username = existing.username
await creds_manager.update(user_id, merged)
return merged
# --------------------------- UTILITIES ---------------------------- #
@@ -784,12 +1122,21 @@ def _get_provider_oauth_handler(
async def get_ayrshare_sso_url(
user_id: Annotated[str, Security(get_user_id)],
) -> AyrshareSSOResponse:
"""
Generate an SSO URL for Ayrshare social media integration.
"""Generate a JWT SSO URL so the user can link their social accounts.
Returns:
dict: Contains the SSO URL for Ayrshare integration
The per-user Ayrshare profile key is provisioned and persisted as a
standard ``is_managed=True`` credential by
:class:`~backend.integrations.managed_providers.ayrshare.AyrshareManagedProvider`.
This endpoint only signs a short-lived JWT pointing at the Ayrshare-
hosted social-linking page; all profile lifecycle logic lives with the
managed provider.
"""
if not ayrshare_settings_available():
raise HTTPException(
status_code=HTTP_500_INTERNAL_SERVER_ERROR,
detail="Ayrshare integration is not configured",
)
try:
client = AyrshareClient()
except MissingConfigError:
@@ -798,66 +1145,63 @@ async def get_ayrshare_sso_url(
detail="Ayrshare integration is not configured",
)
# Ayrshare profile key is stored in the credentials store
# It is generated when creating a new profile, if there is no profile key,
# we create a new profile and store the profile key in the credentials store
user_integrations: UserIntegrations = await get_user_integrations(user_id)
profile_key = user_integrations.managed_credentials.ayrshare_profile_key
if not profile_key:
logger.debug(f"Creating new Ayrshare profile for user {user_id}")
try:
profile = await client.create_profile(
title=f"User {user_id}", messaging_active=True
)
profile_key = profile.profileKey
await creds_manager.store.set_ayrshare_profile_key(user_id, profile_key)
except Exception as e:
logger.error(f"Error creating Ayrshare profile for user {user_id}: {e}")
raise HTTPException(
status_code=HTTP_502_BAD_GATEWAY,
detail="Failed to create Ayrshare profile",
)
else:
logger.debug(f"Using existing Ayrshare profile for user {user_id}")
profile_key_str = (
profile_key.get_secret_value()
if isinstance(profile_key, SecretStr)
else str(profile_key)
# On-demand provisioning: AyrshareManagedProvider opts out of the
# credentials sweep (profile quota is per-user subscription-bound). This
# endpoint is the only trigger that provisions a profile — one Ayrshare
# profile per user who actually opens the connect flow, not one per
# every authenticated user.
provisioned = await ensure_managed_credential(
user_id, creds_manager.store, AyrshareManagedProvider()
)
if not provisioned:
raise HTTPException(
status_code=HTTP_502_BAD_GATEWAY,
detail="Failed to provision Ayrshare profile",
)
ayrshare_creds = [
c
for c in await creds_manager.store.get_creds_by_provider(user_id, "ayrshare")
if c.is_managed and isinstance(c, APIKeyCredentials)
]
if not ayrshare_creds:
logger.error(
"Ayrshare credential provisioning did not produce a credential "
"for user %s",
user_id,
)
raise HTTPException(
status_code=HTTP_502_BAD_GATEWAY,
detail="Failed to provision Ayrshare profile",
)
profile_key_str = ayrshare_creds[0].api_key.get_secret_value()
private_key = settings.secrets.ayrshare_jwt_key
# Ayrshare JWT expiry is 2880 minutes (48 hours)
# Ayrshare JWT max lifetime is 2880 minutes (48 h).
max_expiry_minutes = 2880
try:
logger.debug(f"Generating Ayrshare JWT for user {user_id}")
jwt_response = await client.generate_jwt(
private_key=private_key,
profile_key=profile_key_str,
# `allowed_social` is the set of networks the Ayrshare-hosted
# social-linking page will *offer* the user to connect. Blocks
# exist for more platforms than are listed here; the list is
# deliberately narrower so the rollout can verify each network
# end-to-end before widening the user-visible surface. Keep
# in sync with tested platforms — extend as each is verified
# against the block + Ayrshare's network-specific quirks.
allowed_social=[
# NOTE: We are enabling platforms one at a time
# to speed up the development process
# SocialPlatform.FACEBOOK,
SocialPlatform.TWITTER,
SocialPlatform.LINKEDIN,
SocialPlatform.INSTAGRAM,
SocialPlatform.YOUTUBE,
# SocialPlatform.REDDIT,
# SocialPlatform.TELEGRAM,
# SocialPlatform.GOOGLE_MY_BUSINESS,
# SocialPlatform.PINTEREST,
SocialPlatform.TIKTOK,
# SocialPlatform.BLUESKY,
# SocialPlatform.SNAPCHAT,
# SocialPlatform.THREADS,
],
expires_in=max_expiry_minutes,
verify=True,
)
except Exception as e:
logger.error(f"Error generating Ayrshare JWT for user {user_id}: {e}")
except Exception as exc:
logger.error("Error generating Ayrshare JWT for user %s: %s", user_id, exc)
raise HTTPException(
status_code=HTTP_502_BAD_GATEWAY, detail="Failed to generate JWT"
)
@@ -867,20 +1211,37 @@ async def get_ayrshare_sso_url(
# === PROVIDER DISCOVERY ENDPOINTS ===
@router.get("/providers", response_model=List[str])
async def list_providers() -> List[str]:
@router.get("/providers", response_model=List[ProviderMetadata])
async def list_providers() -> List[ProviderMetadata]:
"""
Get a list of all available provider names.
Get metadata for every available provider.
Returns both statically defined providers (from ProviderName enum)
and dynamically registered providers (from SDK decorators).
Returns both statically defined providers (from ``ProviderName`` enum) and
dynamically registered providers (from SDK decorators). Each entry includes
a ``description`` declared via ``ProviderBuilder.with_description(...)`` in
the provider's ``_config.py``.
Note: The complete list of provider names is also available as a constant
in the generated TypeScript client via PROVIDER_NAMES.
"""
# Get all providers at runtime
# Ensure all block modules (and therefore every provider's _config.py) are
# imported before we read from AutoRegistry. Cached on first call.
try:
from backend.blocks import load_all_blocks
load_all_blocks()
except Exception as e:
logger.warning(f"Failed to load blocks for provider metadata: {e}")
all_providers = get_all_provider_names()
return all_providers
return [
ProviderMetadata(
name=name,
description=get_provider_description(name),
supported_auth_types=get_supported_auth_types(name),
)
for name in all_providers
]
@router.get("/providers/system", response_model=List[str])

View File

@@ -393,7 +393,7 @@ class TestEnsureManagedCredentials:
_PROVIDERS.update(saved)
_provisioned_users.pop("user-1", None)
provider.provision.assert_awaited_once_with("user-1")
provider.provision.assert_awaited_once_with("user-1", store)
store.add_managed_credential.assert_awaited_once_with("user-1", cred)
@pytest.mark.asyncio
@@ -568,3 +568,181 @@ class TestCleanupManagedCredentials:
_PROVIDERS.update(saved)
# No exception raised — cleanup failure is swallowed.
class TestGetPickerToken:
"""POST /{provider}/credentials/{cred_id}/picker-token must:
1. Return the access token for OAuth2 creds the caller owns.
2. 404 for non-owned, non-existent, or wrong-provider creds.
3. 400 for non-OAuth2 creds (API key, host-scoped, user/password).
4. 404 for SDK default creds (same hardening as get_credential).
5. Preserve the `TestGetCredentialReturnsMetaOnly` contract — the
existing meta-only endpoint must still strip secrets even after
this picker-token endpoint exists."""
def test_oauth2_owner_gets_access_token(self):
# Use a Google cred with a drive.file scope — only picker-eligible
# (provider, scope) pairs can mint a token. GitHub-style creds are
# explicitly rejected; see `test_non_picker_provider_rejected_as_400`.
cred = _make_oauth2_cred(
cred_id="cred-gdrive",
provider="google",
)
cred.scopes = ["https://www.googleapis.com/auth/drive.file"]
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/google/credentials/cred-gdrive/picker-token")
assert resp.status_code == 200
data = resp.json()
# The whole point of this endpoint: the access token IS returned here.
assert data["access_token"] == "ghp_secret_token"
# Only the two declared fields come back — nothing else leaks.
assert set(data.keys()) <= {"access_token", "access_token_expires_at"}
def test_non_picker_provider_rejected_as_400(self):
"""Provider allowlist: even with a valid OAuth2 credential, a
non-picker provider (GitHub, etc.) cannot mint a picker token.
Stops this endpoint from being used as a generic bearer-token
extraction path for any stored OAuth cred under the same user."""
cred = _make_oauth2_cred(provider="github")
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/github/credentials/cred-456/picker-token")
assert resp.status_code == 400
assert "not available for provider" in resp.json()["detail"]
assert "ghp_secret_token" not in str(resp.json())
def test_google_oauth_without_drive_scope_rejected(self):
"""Scope allowlist: a Google OAuth2 cred that only carries non-picker
scopes (e.g. gmail.readonly, calendar) cannot mint a picker token.
Forces the frontend to reconnect with a Drive scope before the
picker is available."""
cred = _make_oauth2_cred(provider="google")
cred.scopes = [
"https://www.googleapis.com/auth/gmail.readonly",
"https://www.googleapis.com/auth/calendar",
]
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/google/credentials/cred-456/picker-token")
assert resp.status_code == 400
assert "picker" in resp.json()["detail"].lower()
def test_api_key_credential_rejected_as_400(self):
cred = _make_api_key_cred()
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/openai/credentials/cred-123/picker-token")
assert resp.status_code == 400
# API keys must not silently fall through to a 200 response of some
# other shape — the client should see a clear shape rejection.
body = str(resp.json())
assert "sk-secret-key-value" not in body
def test_user_password_credential_rejected_as_400(self):
cred = _make_user_password_cred()
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/openai/credentials/cred-789/picker-token")
assert resp.status_code == 400
body = str(resp.json())
assert "s3cret-pass" not in body
assert "admin" not in body
def test_host_scoped_credential_rejected_as_400(self):
cred = _make_host_scoped_cred()
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/openai/credentials/cred-host/picker-token")
assert resp.status_code == 400
assert "top-secret" not in str(resp.json())
def test_missing_credential_returns_404(self):
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=None)
resp = client.post("/github/credentials/nonexistent/picker-token")
assert resp.status_code == 404
assert resp.json()["detail"] == "Credentials not found"
def test_wrong_provider_returns_404(self):
"""Symmetric with get_credential: provider mismatch is a generic
404, not a 400, so we don't leak existence of a credential the
caller doesn't own on that provider."""
cred = _make_oauth2_cred(provider="github")
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/google/credentials/cred-456/picker-token")
assert resp.status_code == 404
assert resp.json()["detail"] == "Credentials not found"
def test_sdk_default_returns_404(self):
"""SDK defaults are invisible to the user-facing API — picker-token
must not mint a token for them either."""
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock()
resp = client.post("/openai/credentials/openai-default/picker-token")
assert resp.status_code == 404
mock_mgr.get.assert_not_called()
def test_oauth2_without_access_token_returns_400(self):
"""A stored OAuth2 cred whose access_token is missing can't satisfy
a picker init. Surface a clear reconnect instruction rather than
returning an empty string."""
cred = _make_oauth2_cred()
# Simulate a cred that lost its access token
object.__setattr__(cred, "access_token", None)
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.post("/github/credentials/cred-456/picker-token")
assert resp.status_code == 400
assert "reconnect" in resp.json()["detail"].lower()
def test_meta_only_endpoint_still_strips_access_token(self):
"""Regression guard for the coexistence contract: the new
picker-token endpoint must NOT accidentally leak the token through
the meta-only GET endpoint. TestGetCredentialReturnsMetaOnly
covers this more broadly; this is a fast sanity check co-located
with the new endpoint's tests."""
cred = _make_oauth2_cred()
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.get = AsyncMock(return_value=cred)
resp = client.get("/github/credentials/cred-456")
assert resp.status_code == 200
body = resp.json()
assert "access_token" not in body
assert "refresh_token" not in body
assert "ghp_secret_token" not in str(body)

View File

@@ -12,6 +12,7 @@ import prisma.models
import backend.api.features.library.model as library_model
import backend.data.graph as graph_db
from backend.api.features.library.db import _fetch_schedule_info
from backend.data.graph import GraphModel, GraphSettings
from backend.data.includes import library_agent_include
from backend.util.exceptions import NotFoundError
@@ -117,4 +118,5 @@ async def add_graph_to_library(
f"for store listing version #{store_listing_version_id} "
f"to library for user #{user_id}"
)
return library_model.LibraryAgent.from_db(added_agent)
schedule_info = await _fetch_schedule_info(user_id, graph_id=graph_model.id)
return library_model.LibraryAgent.from_db(added_agent, schedule_info=schedule_info)

View File

@@ -21,13 +21,17 @@ async def test_add_graph_to_library_create_new_agent() -> None:
"backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
return_value=converted_agent,
) as mock_from_db,
patch(
"backend.api.features.library._add_to_library._fetch_schedule_info",
new=AsyncMock(return_value={}),
),
):
mock_prisma.return_value.create = AsyncMock(return_value=created_agent)
result = await add_graph_to_library("slv-id", graph_model, "user-id")
assert result is converted_agent
mock_from_db.assert_called_once_with(created_agent)
mock_from_db.assert_called_once_with(created_agent, schedule_info={})
# Verify create was called with correct data
create_call = mock_prisma.return_value.create.call_args
create_data = create_call.kwargs["data"]
@@ -54,6 +58,10 @@ async def test_add_graph_to_library_unique_violation_updates_existing() -> None:
"backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
return_value=converted_agent,
) as mock_from_db,
patch(
"backend.api.features.library._add_to_library._fetch_schedule_info",
new=AsyncMock(return_value={}),
),
):
mock_prisma.return_value.create = AsyncMock(
side_effect=prisma.errors.UniqueViolationError(
@@ -65,7 +73,7 @@ async def test_add_graph_to_library_unique_violation_updates_existing() -> None:
result = await add_graph_to_library("slv-id", graph_model, "user-id")
assert result is converted_agent
mock_from_db.assert_called_once_with(updated_agent)
mock_from_db.assert_called_once_with(updated_agent, schedule_info={})
# Verify update was called with correct where and data
update_call = mock_prisma.return_value.update.call_args
assert update_call.kwargs["where"] == {

View File

@@ -1,6 +1,7 @@
import asyncio
import itertools
import logging
from datetime import datetime, timezone
from typing import Literal, Optional
import fastapi
@@ -43,6 +44,65 @@ config = Config()
integration_creds_manager = IntegrationCredentialsManager()
async def _fetch_execution_counts(user_id: str, graph_ids: list[str]) -> dict[str, int]:
"""Fetch execution counts per graph in a single batched query."""
if not graph_ids:
return {}
rows = await prisma.models.AgentGraphExecution.prisma().group_by(
by=["agentGraphId"],
where={
"userId": user_id,
"agentGraphId": {"in": graph_ids},
"isDeleted": False,
},
count=True,
)
return {
row["agentGraphId"]: int((row.get("_count") or {}).get("_all") or 0)
for row in rows
}
async def _fetch_schedule_info(
user_id: str, graph_id: Optional[str] = None
) -> dict[str, str]:
"""Fetch a map of graph_id → earliest next_run_time ISO string.
When `graph_id` is provided, the scheduler query is narrowed to that graph,
which is cheaper for single-agent lookups (detail page, post-update, etc.).
"""
try:
scheduler_client = get_scheduler_client()
schedules = await scheduler_client.get_execution_schedules(
graph_id=graph_id,
user_id=user_id,
)
earliest: dict[str, tuple[datetime, str]] = {}
for s in schedules:
parsed = _parse_iso_datetime(s.next_run_time)
if parsed is None:
continue
current = earliest.get(s.graph_id)
if current is None or parsed < current[0]:
earliest[s.graph_id] = (parsed, s.next_run_time)
return {graph_id: iso for graph_id, (_, iso) in earliest.items()}
except Exception:
logger.warning("Failed to fetch schedules for library agents", exc_info=True)
return {}
def _parse_iso_datetime(value: str) -> Optional[datetime]:
"""Parse an ISO 8601 datetime, tolerating `Z` and naive forms (assumed UTC)."""
try:
parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
except ValueError:
logger.warning("Failed to parse schedule next_run_time: %s", value)
return None
if parsed.tzinfo is None:
parsed = parsed.replace(tzinfo=timezone.utc)
return parsed
async def list_library_agents(
user_id: str,
search_term: Optional[str] = None,
@@ -137,12 +197,22 @@ async def list_library_agents(
logger.debug(f"Retrieved {len(library_agents)} library agents for user #{user_id}")
graph_ids = [a.agentGraphId for a in library_agents if a.agentGraphId]
execution_counts, schedule_info = await asyncio.gather(
_fetch_execution_counts(user_id, graph_ids),
_fetch_schedule_info(user_id),
)
# Only pass valid agents to the response
valid_library_agents: list[library_model.LibraryAgent] = []
for agent in library_agents:
try:
library_agent = library_model.LibraryAgent.from_db(agent)
library_agent = library_model.LibraryAgent.from_db(
agent,
execution_count_override=execution_counts.get(agent.agentGraphId),
schedule_info=schedule_info,
)
valid_library_agents.append(library_agent)
except Exception as e:
# Skip this agent if there was an error
@@ -214,12 +284,22 @@ async def list_favorite_library_agents(
f"Retrieved {len(library_agents)} favorite library agents for user #{user_id}"
)
graph_ids = [a.agentGraphId for a in library_agents if a.agentGraphId]
execution_counts, schedule_info = await asyncio.gather(
_fetch_execution_counts(user_id, graph_ids),
_fetch_schedule_info(user_id),
)
# Only pass valid agents to the response
valid_library_agents: list[library_model.LibraryAgent] = []
for agent in library_agents:
try:
library_agent = library_model.LibraryAgent.from_db(agent)
library_agent = library_model.LibraryAgent.from_db(
agent,
execution_count_override=execution_counts.get(agent.agentGraphId),
schedule_info=schedule_info,
)
valid_library_agents.append(library_agent)
except Exception as e:
# Skip this agent if there was an error
@@ -285,6 +365,12 @@ async def get_library_agent(id: str, user_id: str) -> library_model.LibraryAgent
where={"userId": store_listing.owningUserId}
)
schedule_info = (
await _fetch_schedule_info(user_id, graph_id=library_agent.AgentGraph.id)
if library_agent.AgentGraph
else {}
)
return library_model.LibraryAgent.from_db(
library_agent,
sub_graphs=(
@@ -294,6 +380,7 @@ async def get_library_agent(id: str, user_id: str) -> library_model.LibraryAgent
),
store_listing=store_listing,
profile=profile,
schedule_info=schedule_info,
)
@@ -329,7 +416,10 @@ async def get_library_agent_by_store_version_id(
},
include=library_agent_include(user_id),
)
return library_model.LibraryAgent.from_db(agent) if agent else None
if not agent:
return None
schedule_info = await _fetch_schedule_info(user_id, graph_id=agent.agentGraphId)
return library_model.LibraryAgent.from_db(agent, schedule_info=schedule_info)
async def get_library_agent_by_graph_id(
@@ -358,7 +448,10 @@ async def get_library_agent_by_graph_id(
assert agent.AgentGraph # make type checker happy
# Include sub-graphs so we can make a full credentials input schema
sub_graphs = await graph_db.get_sub_graphs(agent.AgentGraph)
return library_model.LibraryAgent.from_db(agent, sub_graphs=sub_graphs)
schedule_info = await _fetch_schedule_info(user_id, graph_id=agent.agentGraphId)
return library_model.LibraryAgent.from_db(
agent, sub_graphs=sub_graphs, schedule_info=schedule_info
)
async def add_generated_agent_image(
@@ -500,7 +593,11 @@ async def create_library_agent(
for agent, graph in zip(library_agents, graph_entries):
asyncio.create_task(add_generated_agent_image(graph, user_id, agent.id))
return [library_model.LibraryAgent.from_db(agent) for agent in library_agents]
schedule_info = await _fetch_schedule_info(user_id)
return [
library_model.LibraryAgent.from_db(agent, schedule_info=schedule_info)
for agent in library_agents
]
async def update_agent_version_in_library(
@@ -562,7 +659,8 @@ async def update_agent_version_in_library(
f"Failed to update library agent for {agent_graph_id} v{agent_graph_version}"
)
return library_model.LibraryAgent.from_db(lib)
schedule_info = await _fetch_schedule_info(user_id, graph_id=agent_graph_id)
return library_model.LibraryAgent.from_db(lib, schedule_info=schedule_info)
async def create_graph_in_library(
@@ -645,6 +743,7 @@ async def update_library_agent_version_and_settings(
graph=agent_graph,
hitl_safe_mode=library.settings.human_in_the_loop_safe_mode,
sensitive_action_safe_mode=library.settings.sensitive_action_safe_mode,
builder_chat_session_id=library.settings.builder_chat_session_id,
)
if updated_settings != library.settings:
library = await update_library_agent(
@@ -1467,7 +1566,11 @@ async def bulk_move_agents_to_folder(
),
)
return [library_model.LibraryAgent.from_db(agent) for agent in agents]
schedule_info = await _fetch_schedule_info(user_id)
return [
library_model.LibraryAgent.from_db(agent, schedule_info=schedule_info)
for agent in agents
]
def collect_tree_ids(
@@ -1701,7 +1804,7 @@ async def create_preset_from_graph_execution(
raise NotFoundError(
f"Graph #{graph_execution.graph_id} not found or accessible"
)
elif len(graph.aggregate_credentials_inputs()) > 0:
elif len(graph.regular_credentials_inputs) > 0:
raise ValueError(
f"Graph execution #{graph_exec_id} can't be turned into a preset "
"because it was run before this feature existed "

View File

@@ -65,6 +65,11 @@ async def test_get_library_agents(mocker):
)
mock_library_agent.return_value.count = mocker.AsyncMock(return_value=1)
mocker.patch(
"backend.api.features.library.db._fetch_execution_counts",
new=mocker.AsyncMock(return_value={}),
)
# Call function
result = await db.list_library_agents("test-user")
@@ -353,3 +358,136 @@ async def test_create_library_agent_uses_upsert():
# Verify update branch restores soft-deleted/archived agents
assert data["update"]["isDeleted"] is False
assert data["update"]["isArchived"] is False
@pytest.mark.asyncio
async def test_list_favorite_library_agents(mocker):
mock_library_agents = [
prisma.models.LibraryAgent(
id="fav1",
userId="test-user",
agentGraphId="agent-fav",
settings="{}", # type: ignore
agentGraphVersion=1,
isCreatedByUser=False,
isDeleted=False,
isArchived=False,
createdAt=datetime.now(),
updatedAt=datetime.now(),
isFavorite=True,
useGraphIsActiveVersion=True,
AgentGraph=prisma.models.AgentGraph(
id="agent-fav",
version=1,
name="Favorite Agent",
description="My Favorite",
userId="other-user",
isActive=True,
createdAt=datetime.now(),
),
)
]
mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
mock_library_agent.return_value.find_many = mocker.AsyncMock(
return_value=mock_library_agents
)
mock_library_agent.return_value.count = mocker.AsyncMock(return_value=1)
mocker.patch(
"backend.api.features.library.db._fetch_execution_counts",
new=mocker.AsyncMock(return_value={"agent-fav": 7}),
)
result = await db.list_favorite_library_agents("test-user")
assert len(result.agents) == 1
assert result.agents[0].id == "fav1"
assert result.agents[0].name == "Favorite Agent"
assert result.agents[0].graph_id == "agent-fav"
assert result.pagination.total_items == 1
assert result.pagination.total_pages == 1
assert result.pagination.current_page == 1
assert result.pagination.page_size == 50
@pytest.mark.asyncio
async def test_list_library_agents_skips_failed_agent(mocker):
"""Agents that fail parsing should be skipped — covers the except branch."""
mock_library_agents = [
prisma.models.LibraryAgent(
id="ua-bad",
userId="test-user",
agentGraphId="agent-bad",
settings="{}", # type: ignore
agentGraphVersion=1,
isCreatedByUser=False,
isDeleted=False,
isArchived=False,
createdAt=datetime.now(),
updatedAt=datetime.now(),
isFavorite=False,
useGraphIsActiveVersion=True,
AgentGraph=prisma.models.AgentGraph(
id="agent-bad",
version=1,
name="Bad Agent",
description="",
userId="other-user",
isActive=True,
createdAt=datetime.now(),
),
)
]
mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
mock_library_agent.return_value.find_many = mocker.AsyncMock(
return_value=mock_library_agents
)
mock_library_agent.return_value.count = mocker.AsyncMock(return_value=1)
mocker.patch(
"backend.api.features.library.db._fetch_execution_counts",
new=mocker.AsyncMock(return_value={}),
)
mocker.patch(
"backend.api.features.library.model.LibraryAgent.from_db",
side_effect=Exception("parse error"),
)
result = await db.list_library_agents("test-user")
assert len(result.agents) == 0
assert result.pagination.total_items == 1
@pytest.mark.asyncio
async def test_fetch_execution_counts_empty_graph_ids():
result = await db._fetch_execution_counts("user-1", [])
assert result == {}
@pytest.mark.asyncio
async def test_fetch_execution_counts_uses_group_by(mocker):
mock_prisma = mocker.patch("prisma.models.AgentGraphExecution.prisma")
mock_prisma.return_value.group_by = mocker.AsyncMock(
return_value=[
{"agentGraphId": "graph-1", "_count": {"_all": 5}},
{"agentGraphId": "graph-2", "_count": {"_all": 2}},
]
)
result = await db._fetch_execution_counts(
"user-1", ["graph-1", "graph-2", "graph-3"]
)
assert result == {"graph-1": 5, "graph-2": 2}
mock_prisma.return_value.group_by.assert_called_once_with(
by=["agentGraphId"],
where={
"userId": "user-1",
"agentGraphId": {"in": ["graph-1", "graph-2", "graph-3"]},
"isDeleted": False,
},
count=True,
)

View File

@@ -214,6 +214,14 @@ class LibraryAgent(pydantic.BaseModel):
folder_name: str | None = None # Denormalized for display
recommended_schedule_cron: str | None = None
is_scheduled: bool = pydantic.Field(
default=False,
description="Whether this agent has active execution schedules",
)
next_scheduled_run: str | None = pydantic.Field(
default=None,
description="ISO 8601 timestamp of the next scheduled run, if any",
)
settings: GraphSettings = pydantic.Field(default_factory=GraphSettings)
marketplace_listing: Optional["MarketplaceListing"] = None
@@ -223,6 +231,8 @@ class LibraryAgent(pydantic.BaseModel):
sub_graphs: Optional[list[prisma.models.AgentGraph]] = None,
store_listing: Optional[prisma.models.StoreListing] = None,
profile: Optional[prisma.models.Profile] = None,
execution_count_override: Optional[int] = None,
schedule_info: Optional[dict[str, str]] = None,
) -> "LibraryAgent":
"""
Factory method that constructs a LibraryAgent from a Prisma LibraryAgent
@@ -258,10 +268,14 @@ class LibraryAgent(pydantic.BaseModel):
status = status_result.status
new_output = status_result.new_output
execution_count = len(executions)
execution_count = (
execution_count_override
if execution_count_override is not None
else len(executions)
)
success_rate: float | None = None
avg_correctness_score: float | None = None
if execution_count > 0:
if executions and execution_count > 0:
success_count = sum(
1
for e in executions
@@ -354,6 +368,10 @@ class LibraryAgent(pydantic.BaseModel):
folder_id=agent.folderId,
folder_name=agent.Folder.name if agent.Folder else None,
recommended_schedule_cron=agent.AgentGraph.recommendedScheduleCron,
is_scheduled=bool(schedule_info and agent.agentGraphId in schedule_info),
next_scheduled_run=(
schedule_info.get(agent.agentGraphId) if schedule_info else None
),
settings=_parse_settings(agent.settings),
marketplace_listing=marketplace_listing_data,
)

View File

@@ -1,11 +1,66 @@
import datetime
import prisma.enums
import prisma.models
import pytest
from . import model as library_model
def _make_library_agent(
*,
graph_id: str = "g1",
executions: list | None = None,
) -> prisma.models.LibraryAgent:
return prisma.models.LibraryAgent(
id="la1",
userId="u1",
agentGraphId=graph_id,
settings="{}", # type: ignore
agentGraphVersion=1,
isCreatedByUser=True,
isDeleted=False,
isArchived=False,
createdAt=datetime.datetime.now(),
updatedAt=datetime.datetime.now(),
isFavorite=False,
useGraphIsActiveVersion=True,
AgentGraph=prisma.models.AgentGraph(
id=graph_id,
version=1,
name="Agent",
description="Desc",
userId="u1",
isActive=True,
createdAt=datetime.datetime.now(),
Executions=executions,
),
)
def test_from_db_execution_count_override_covers_success_rate():
"""Covers execution_count_override is not None branch and executions/count > 0 block."""
now = datetime.datetime.now(datetime.timezone.utc)
exec1 = prisma.models.AgentGraphExecution(
id="exec-1",
agentGraphId="g1",
agentGraphVersion=1,
userId="u1",
executionStatus=prisma.enums.AgentExecutionStatus.COMPLETED,
createdAt=now,
updatedAt=now,
isDeleted=False,
isShared=False,
)
agent = _make_library_agent(executions=[exec1])
result = library_model.LibraryAgent.from_db(agent, execution_count_override=1)
assert result.execution_count == 1
assert result.success_rate is not None
assert result.success_rate == 100.0
@pytest.mark.asyncio
async def test_agent_preset_from_db(test_user_id: str):
# Create mock DB agent

View File

@@ -0,0 +1 @@
"""Platform bot linking — user-facing REST routes."""

View File

@@ -0,0 +1,158 @@
"""User-facing platform_linking REST routes (JWT auth)."""
import logging
from typing import Annotated
from autogpt_libs import auth
from fastapi import APIRouter, HTTPException, Path, Security
from backend.data.db_accessors import platform_linking_db
from backend.platform_linking.models import (
ConfirmLinkResponse,
ConfirmUserLinkResponse,
DeleteLinkResponse,
LinkTokenInfoResponse,
PlatformLinkInfo,
PlatformUserLinkInfo,
)
from backend.util.exceptions import (
LinkAlreadyExistsError,
LinkFlowMismatchError,
LinkTokenExpiredError,
NotAuthorizedError,
NotFoundError,
)
logger = logging.getLogger(__name__)
router = APIRouter()
TokenPath = Annotated[
str,
Path(max_length=64, pattern=r"^[A-Za-z0-9_-]+$"),
]
def _translate(exc: Exception) -> HTTPException:
if isinstance(exc, NotFoundError):
return HTTPException(status_code=404, detail=str(exc))
if isinstance(exc, NotAuthorizedError):
return HTTPException(status_code=403, detail=str(exc))
if isinstance(exc, LinkAlreadyExistsError):
return HTTPException(status_code=409, detail=str(exc))
if isinstance(exc, LinkTokenExpiredError):
return HTTPException(status_code=410, detail=str(exc))
if isinstance(exc, LinkFlowMismatchError):
return HTTPException(status_code=400, detail=str(exc))
return HTTPException(status_code=500, detail="Internal error.")
@router.get(
"/tokens/{token}/info",
response_model=LinkTokenInfoResponse,
dependencies=[Security(auth.requires_user)],
summary="Get display info for a link token",
)
async def get_link_token_info_route(token: TokenPath) -> LinkTokenInfoResponse:
try:
return await platform_linking_db().get_link_token_info(token)
except (NotFoundError, LinkTokenExpiredError) as exc:
raise _translate(exc) from exc
@router.post(
"/tokens/{token}/confirm",
response_model=ConfirmLinkResponse,
dependencies=[Security(auth.requires_user)],
summary="Confirm a SERVER link token (user must be authenticated)",
)
async def confirm_link_token(
token: TokenPath,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> ConfirmLinkResponse:
try:
return await platform_linking_db().confirm_server_link(token, user_id)
except (
NotFoundError,
LinkFlowMismatchError,
LinkTokenExpiredError,
LinkAlreadyExistsError,
) as exc:
raise _translate(exc) from exc
@router.post(
"/user-tokens/{token}/confirm",
response_model=ConfirmUserLinkResponse,
dependencies=[Security(auth.requires_user)],
summary="Confirm a USER link token (user must be authenticated)",
)
async def confirm_user_link_token(
token: TokenPath,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> ConfirmUserLinkResponse:
try:
return await platform_linking_db().confirm_user_link(token, user_id)
except (
NotFoundError,
LinkFlowMismatchError,
LinkTokenExpiredError,
LinkAlreadyExistsError,
) as exc:
raise _translate(exc) from exc
@router.get(
"/links",
response_model=list[PlatformLinkInfo],
dependencies=[Security(auth.requires_user)],
summary="List all platform servers linked to the authenticated user",
)
async def list_my_links(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> list[PlatformLinkInfo]:
return await platform_linking_db().list_server_links(user_id)
@router.get(
"/user-links",
response_model=list[PlatformUserLinkInfo],
dependencies=[Security(auth.requires_user)],
summary="List all DM links for the authenticated user",
)
async def list_my_user_links(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> list[PlatformUserLinkInfo]:
return await platform_linking_db().list_user_links(user_id)
@router.delete(
"/links/{link_id}",
response_model=DeleteLinkResponse,
dependencies=[Security(auth.requires_user)],
summary="Unlink a platform server",
)
async def delete_link(
link_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> DeleteLinkResponse:
try:
return await platform_linking_db().delete_server_link(link_id, user_id)
except (NotFoundError, NotAuthorizedError) as exc:
raise _translate(exc) from exc
@router.delete(
"/user-links/{link_id}",
response_model=DeleteLinkResponse,
dependencies=[Security(auth.requires_user)],
summary="Unlink a DM / user link",
)
async def delete_user_link_route(
link_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> DeleteLinkResponse:
try:
return await platform_linking_db().delete_user_link(link_id, user_id)
except (NotFoundError, NotAuthorizedError) as exc:
raise _translate(exc) from exc

View File

@@ -0,0 +1,264 @@
"""Route tests: domain exceptions → HTTPException status codes."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastapi import HTTPException
from backend.util.exceptions import (
LinkAlreadyExistsError,
LinkFlowMismatchError,
LinkTokenExpiredError,
NotAuthorizedError,
NotFoundError,
)
def _db_mock(**method_configs):
"""Return a mock of the accessor's return value with the given AsyncMocks."""
db = MagicMock()
for name, mock in method_configs.items():
setattr(db, name, mock)
return db
class TestTokenInfoRouteTranslation:
@pytest.mark.asyncio
async def test_not_found_maps_to_404(self):
from backend.api.features.platform_linking.routes import (
get_link_token_info_route,
)
db = _db_mock(
get_link_token_info=AsyncMock(side_effect=NotFoundError("missing"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await get_link_token_info_route(token="abc")
assert exc.value.status_code == 404
@pytest.mark.asyncio
async def test_expired_maps_to_410(self):
from backend.api.features.platform_linking.routes import (
get_link_token_info_route,
)
db = _db_mock(
get_link_token_info=AsyncMock(side_effect=LinkTokenExpiredError("expired"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await get_link_token_info_route(token="abc")
assert exc.value.status_code == 410
class TestConfirmLinkRouteTranslation:
@pytest.mark.asyncio
@pytest.mark.parametrize(
"exc,expected_status",
[
(NotFoundError("missing"), 404),
(LinkFlowMismatchError("wrong flow"), 400),
(LinkTokenExpiredError("expired"), 410),
(LinkAlreadyExistsError("already"), 409),
],
)
async def test_translation(self, exc: Exception, expected_status: int):
from backend.api.features.platform_linking.routes import confirm_link_token
db = _db_mock(confirm_server_link=AsyncMock(side_effect=exc))
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as ctx:
await confirm_link_token(token="abc", user_id="u1")
assert ctx.value.status_code == expected_status
class TestConfirmUserLinkRouteTranslation:
@pytest.mark.asyncio
@pytest.mark.parametrize(
"exc,expected_status",
[
(NotFoundError("missing"), 404),
(LinkFlowMismatchError("wrong flow"), 400),
(LinkTokenExpiredError("expired"), 410),
(LinkAlreadyExistsError("already"), 409),
],
)
async def test_translation(self, exc: Exception, expected_status: int):
from backend.api.features.platform_linking.routes import confirm_user_link_token
db = _db_mock(confirm_user_link=AsyncMock(side_effect=exc))
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as ctx:
await confirm_user_link_token(token="abc", user_id="u1")
assert ctx.value.status_code == expected_status
class TestDeleteLinkRouteTranslation:
@pytest.mark.asyncio
async def test_not_found_maps_to_404(self):
from backend.api.features.platform_linking.routes import delete_link
db = _db_mock(
delete_server_link=AsyncMock(side_effect=NotFoundError("missing"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await delete_link(link_id="x", user_id="u1")
assert exc.value.status_code == 404
@pytest.mark.asyncio
async def test_not_owned_maps_to_403(self):
from backend.api.features.platform_linking.routes import delete_link
db = _db_mock(
delete_server_link=AsyncMock(side_effect=NotAuthorizedError("nope"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await delete_link(link_id="x", user_id="u1")
assert exc.value.status_code == 403
class TestDeleteUserLinkRouteTranslation:
@pytest.mark.asyncio
async def test_not_found_maps_to_404(self):
from backend.api.features.platform_linking.routes import delete_user_link_route
db = _db_mock(delete_user_link=AsyncMock(side_effect=NotFoundError("missing")))
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await delete_user_link_route(link_id="x", user_id="u1")
assert exc.value.status_code == 404
@pytest.mark.asyncio
async def test_not_owned_maps_to_403(self):
from backend.api.features.platform_linking.routes import delete_user_link_route
db = _db_mock(
delete_user_link=AsyncMock(side_effect=NotAuthorizedError("nope"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
with pytest.raises(HTTPException) as exc:
await delete_user_link_route(link_id="x", user_id="u1")
assert exc.value.status_code == 403
# ── Adversarial: malformed token path params ──────────────────────────
class TestAdversarialTokenPath:
# TokenPath enforces `^[A-Za-z0-9_-]+$` + max_length=64.
@pytest.fixture
def client(self):
import fastapi
from autogpt_libs.auth import get_user_id, requires_user
from fastapi.testclient import TestClient
import backend.api.features.platform_linking.routes as routes_mod
app = fastapi.FastAPI()
app.dependency_overrides[requires_user] = lambda: None
app.dependency_overrides[get_user_id] = lambda: "caller-user"
app.include_router(routes_mod.router, prefix="/api/platform-linking")
return TestClient(app)
def test_rejects_token_with_special_chars(self, client):
response = client.get("/api/platform-linking/tokens/bad%24token/info")
assert response.status_code == 422
def test_rejects_token_with_path_traversal(self, client):
for probe in ("..%2F..", "foo..bar", "foo%2Fbar"):
response = client.get(f"/api/platform-linking/tokens/{probe}/info")
assert response.status_code in (
404,
422,
), f"path-traversal probe {probe!r} returned {response.status_code}"
def test_rejects_token_too_long(self, client):
long_token = "a" * 65
response = client.get(f"/api/platform-linking/tokens/{long_token}/info")
assert response.status_code == 422
def test_accepts_token_at_max_length(self, client):
token = "a" * 64
db = _db_mock(
get_link_token_info=AsyncMock(side_effect=NotFoundError("missing"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
response = client.get(f"/api/platform-linking/tokens/{token}/info")
assert response.status_code == 404
def test_accepts_urlsafe_b64_token_shape(self, client):
db = _db_mock(
get_link_token_info=AsyncMock(side_effect=NotFoundError("missing"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
response = client.get("/api/platform-linking/tokens/abc-_XYZ123-_abc/info")
assert response.status_code == 404
def test_confirm_rejects_malformed_token(self, client):
response = client.post("/api/platform-linking/tokens/bad%24token/confirm")
assert response.status_code == 422
class TestAdversarialDeleteLinkId:
"""DELETE link_id has no regex — ensure weird values are handled via
NotFoundError (no crash, no cross-user leak)."""
@pytest.fixture
def client(self):
import fastapi
from autogpt_libs.auth import get_user_id, requires_user
from fastapi.testclient import TestClient
import backend.api.features.platform_linking.routes as routes_mod
app = fastapi.FastAPI()
app.dependency_overrides[requires_user] = lambda: None
app.dependency_overrides[get_user_id] = lambda: "caller-user"
app.include_router(routes_mod.router, prefix="/api/platform-linking")
return TestClient(app)
def test_weird_link_id_returns_404(self, client):
db = _db_mock(
delete_server_link=AsyncMock(side_effect=NotFoundError("missing"))
)
with patch(
"backend.api.features.platform_linking.routes.platform_linking_db",
return_value=db,
):
for link_id in ("'; DROP TABLE links;--", "../../etc/passwd", ""):
response = client.delete(f"/api/platform-linking/links/{link_id}")
assert response.status_code in (404, 405)

View File

@@ -189,7 +189,7 @@ async def test_create_store_submission(mocker):
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="Europe/Delft",
subscriptionTier=prisma.enums.SubscriptionTier.FREE, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
subscriptionTier=prisma.enums.SubscriptionTier.BASIC, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
)
mock_agent = prisma.models.AgentGraph(
id="agent-id",

View File

@@ -5,7 +5,8 @@ import time
import uuid
from collections import defaultdict
from datetime import datetime, timezone
from typing import Annotated, Any, Literal, Sequence, get_args
from typing import Annotated, Any, Literal, Sequence, cast, get_args
from urllib.parse import urlparse
import pydantic
import stripe
@@ -25,10 +26,11 @@ from fastapi import (
)
from fastapi.concurrency import run_in_threadpool
from prisma.enums import SubscriptionTier
from pydantic import BaseModel
from pydantic import BaseModel, Field
from starlette.status import HTTP_204_NO_CONTENT, HTTP_404_NOT_FOUND
from typing_extensions import Optional, TypedDict
from backend.api.features.workspace.routes import create_file_download_response
from backend.api.model import (
CreateAPIKeyRequest,
CreateAPIKeyResponse,
@@ -42,23 +44,31 @@ from backend.api.model import (
UploadFileResponse,
)
from backend.blocks import get_block, get_blocks
from backend.copilot.rate_limit import get_tier_multipliers
from backend.data import execution as execution_db
from backend.data import graph as graph_db
from backend.data.auth import api_key as api_key_db
from backend.data.block import BlockInput, CompletedBlockOutput
from backend.data.credit import (
AutoTopUpConfig,
PendingChangeUnknown,
RefundRequest,
TransactionHistory,
UserCredit,
cancel_stripe_subscription,
create_subscription_checkout,
get_auto_top_up,
get_pending_subscription_change,
get_proration_credit_cents,
get_subscription_price_id,
get_user_credit_model,
handle_subscription_payment_failure,
modify_stripe_subscription_for_tier,
release_pending_subscription_schedule,
set_auto_top_up,
set_subscription_tier,
sync_subscription_from_stripe,
sync_subscription_schedule_from_stripe,
)
from backend.data.graph import GraphSettings
from backend.data.model import CredentialsMetaInput, UserOnboarding
@@ -88,6 +98,7 @@ from backend.data.user import (
update_user_notification_preference,
update_user_timezone,
)
from backend.data.workspace import get_workspace_file_by_id
from backend.executor import scheduler
from backend.executor import utils as execution_utils
from backend.integrations.webhooks.graph_lifecycle_hooks import (
@@ -689,19 +700,97 @@ async def get_user_auto_top_up(
class SubscriptionTierRequest(BaseModel):
tier: Literal["FREE", "PRO", "BUSINESS"]
tier: Literal["BASIC", "PRO", "MAX", "BUSINESS"]
success_url: str = ""
cancel_url: str = ""
class SubscriptionCheckoutResponse(BaseModel):
url: str
class SubscriptionStatusResponse(BaseModel):
tier: str
monthly_cost: int
tier_costs: dict[str, int]
tier: Literal["BASIC", "PRO", "MAX", "BUSINESS", "ENTERPRISE"]
monthly_cost: int # amount in cents (Stripe convention)
tier_costs: dict[str, int] # tier name -> amount in cents
tier_multipliers: dict[str, float] = Field(
default_factory=dict,
description=(
"Tier → rate-limit multiplier. Covers the same tiers listed in"
" ``tier_costs`` so the frontend can render rate-limit badges"
" relative to the lowest visible tier without knowing backend"
" defaults."
),
)
proration_credit_cents: int # unused portion of current sub to convert on upgrade
pending_tier: Optional[Literal["BASIC", "PRO", "MAX", "BUSINESS"]] = None
pending_tier_effective_at: Optional[datetime] = None
url: str = Field(
default="",
description=(
"Populated only when POST /credits/subscription starts a Stripe Checkout"
" Session (BASIC → paid upgrade). Empty string in all other branches —"
" the client redirects to this URL when non-empty."
),
)
def _validate_checkout_redirect_url(url: str) -> bool:
"""Return True if `url` matches the configured frontend origin.
Prevents open-redirect: attackers must not be able to supply arbitrary
success_url/cancel_url that Stripe will redirect users to after checkout.
Pre-parse rejection rules (applied before urlparse):
- Backslashes (``\\``) are normalised differently across parsers/browsers.
- Control characters (U+0000U+001F) are not valid in URLs and may confuse
some URL-parsing implementations.
"""
# Reject characters that can confuse URL parsers before any parsing.
if "\\" in url:
return False
if any(ord(c) < 0x20 for c in url):
return False
allowed = settings.config.frontend_base_url or settings.config.platform_base_url
if not allowed:
# No configured origin — refuse to validate rather than allow arbitrary URLs.
return False
try:
parsed = urlparse(url)
allowed_parsed = urlparse(allowed)
except ValueError:
return False
if parsed.scheme not in ("http", "https"):
return False
# Reject ``user:pass@host`` authority tricks — ``@`` in the netloc component
# can trick browsers into connecting to a different host than displayed.
# ``@`` in query/fragment is harmless and must be allowed.
if "@" in parsed.netloc:
return False
return (
parsed.scheme == allowed_parsed.scheme
and parsed.netloc == allowed_parsed.netloc
)
@cached(ttl_seconds=300, maxsize=32, cache_none=False)
async def _get_stripe_price_amount(price_id: str) -> int | None:
"""Return the unit_amount (cents) for a Stripe Price ID, cached for 5 minutes.
Returns ``None`` on transient Stripe errors. ``cache_none=False`` opts out
of caching the ``None`` sentinel so the next request retries Stripe instead
of being served a stale "no price" for the rest of the TTL window. Callers
should treat ``None`` as an unknown price and fall back to 0.
Stripe prices rarely change; caching avoids a ~200-600 ms Stripe round-trip on
every GET /credits/subscription page load and reduces quota consumption.
"""
try:
price = await run_in_threadpool(stripe.Price.retrieve, price_id)
return price.unit_amount or 0
except stripe.StripeError:
logger.warning(
"Failed to retrieve Stripe price %s — returning None (not cached)",
price_id,
)
return None
@v1_router.get(
@@ -715,34 +804,80 @@ async def get_subscription_status(
user_id: Annotated[str, Security(get_user_id)],
) -> SubscriptionStatusResponse:
user = await get_user_by_id(user_id)
tier = user.subscription_tier or SubscriptionTier.FREE
tier = user.subscription_tier or SubscriptionTier.BASIC
paid_tiers = [SubscriptionTier.PRO, SubscriptionTier.BUSINESS]
priceable_tiers = [
SubscriptionTier.BASIC,
SubscriptionTier.PRO,
SubscriptionTier.MAX,
SubscriptionTier.BUSINESS,
]
price_ids = await asyncio.gather(
*[get_subscription_price_id(t) for t in paid_tiers]
*[get_subscription_price_id(t) for t in priceable_tiers]
)
tier_costs: dict[str, int] = {"FREE": 0, "ENTERPRISE": 0}
for t, price_id in zip(paid_tiers, price_ids):
cost = 0
if price_id:
try:
price = await run_in_threadpool(stripe.Price.retrieve, price_id)
cost = price.unit_amount or 0
except stripe.StripeError:
pass
tier_costs[t.value] = cost
async def _cost(pid: str | None) -> int:
return (await _get_stripe_price_amount(pid) or 0) if pid else 0
return SubscriptionStatusResponse(
costs = await asyncio.gather(*[_cost(pid) for pid in price_ids])
tier_costs: dict[str, int] = {}
for t, pid, cost in zip(priceable_tiers, price_ids, costs):
if pid:
tier_costs[t.value] = cost
# Expose the effective rate-limit multipliers alongside prices so the
# frontend can render "Nx rate limits" relative to the lowest visible
# tier without hard-coding backend defaults. Only emit entries for tiers
# that land in ``tier_costs`` — rows hidden at the price layer must stay
# hidden in the multiplier layer too.
multipliers = await get_tier_multipliers()
tier_multipliers: dict[str, float] = {
t.value: multipliers.get(t, 1.0)
for t in priceable_tiers
if t.value in tier_costs
}
current_monthly_cost = tier_costs.get(tier.value, 0)
proration_credit = await get_proration_credit_cents(user_id, current_monthly_cost)
try:
pending = await get_pending_subscription_change(user_id)
except (stripe.StripeError, PendingChangeUnknown):
# Swallow Stripe-side failures (rate limits, transient network) AND
# PendingChangeUnknown (LaunchDarkly price-id lookup failed). Both
# propagate past the cache so the next request retries fresh instead
# of serving a stale None for the TTL window. Let real bugs (KeyError,
# AttributeError, etc.) propagate so they surface in Sentry.
logger.exception(
"get_subscription_status: failed to resolve pending change for user %s",
user_id,
)
pending = None
response = SubscriptionStatusResponse(
tier=tier.value,
monthly_cost=tier_costs.get(tier.value, 0),
monthly_cost=current_monthly_cost,
tier_costs=tier_costs,
tier_multipliers=tier_multipliers,
proration_credit_cents=proration_credit,
)
if pending is not None:
pending_tier_enum, pending_effective_at = pending
if pending_tier_enum in (
SubscriptionTier.BASIC,
SubscriptionTier.PRO,
SubscriptionTier.MAX,
SubscriptionTier.BUSINESS,
):
response.pending_tier = pending_tier_enum.value
response.pending_tier_effective_at = pending_effective_at
return response
@v1_router.post(
path="/credits/subscription",
summary="Start a Stripe Checkout session to upgrade subscription tier",
summary="Update subscription tier or start a Stripe Checkout session",
operation_id="updateSubscriptionTier",
tags=["credits"],
dependencies=[Security(requires_user)],
@@ -750,40 +885,155 @@ async def get_subscription_status(
async def update_subscription_tier(
request: SubscriptionTierRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> SubscriptionCheckoutResponse:
# Pydantic validates tier is one of FREE/PRO/BUSINESS via Literal type.
) -> SubscriptionStatusResponse:
# Pydantic validates tier is one of BASIC/PRO/MAX/BUSINESS via Literal type.
tier = SubscriptionTier(request.tier)
# ENTERPRISE tier is admin-managed — block self-service changes from ENTERPRISE users.
user = await get_user_by_id(user_id)
if (user.subscription_tier or SubscriptionTier.FREE) == SubscriptionTier.ENTERPRISE:
if (
user.subscription_tier or SubscriptionTier.BASIC
) == SubscriptionTier.ENTERPRISE:
raise HTTPException(
status_code=403,
detail="ENTERPRISE subscription changes must be managed by an administrator",
)
# Same-tier request = "stay on my current tier" = cancel any pending
# scheduled change (paid→paid downgrade or paid→BASIC cancel). This is the
# collapsed behaviour that replaces the old /credits/subscription/cancel-pending
# route. Safe when no pending change exists: release_pending_subscription_schedule
# returns False and we simply return the current status.
if (user.subscription_tier or SubscriptionTier.BASIC) == tier:
try:
await release_pending_subscription_schedule(user_id)
except stripe.StripeError as e:
logger.exception(
"Stripe error releasing pending subscription change for user %s: %s",
user_id,
e,
)
raise HTTPException(
status_code=502,
detail=(
"Unable to cancel the pending subscription change right now. "
"Please try again or contact support."
),
)
return await get_subscription_status(user_id)
payment_enabled = await is_feature_enabled(
Flag.ENABLE_PLATFORM_PAYMENT, user_id, default=False
)
# Downgrade to FREE: cancel active Stripe subscription, then update the DB tier.
if tier == SubscriptionTier.FREE:
current_tier = user.subscription_tier or SubscriptionTier.BASIC
target_price_id, current_tier_price_id = await asyncio.gather(
get_subscription_price_id(tier),
get_subscription_price_id(current_tier),
)
# Legacy cancel: target BASIC + stripe-price-id-basic unset. Schedule Stripe
# cancellation at period end; cancel_at_period_end=True lets the webhook flip
# the DB tier. No active sub (admin-granted) or payment disabled → DB flip.
# Once stripe-price-id-basic is configured, BASIC becomes a real sub and falls
# through to the modify/checkout flow below.
if tier == SubscriptionTier.BASIC and target_price_id is None:
if payment_enabled:
await cancel_stripe_subscription(user_id)
try:
had_subscription = await cancel_stripe_subscription(user_id)
except stripe.StripeError as e:
logger.exception(
"Stripe error cancelling subscription for user %s: %s",
user_id,
e,
)
raise HTTPException(
status_code=502,
detail=(
"Unable to cancel your subscription right now. "
"Please try again or contact support."
),
)
if not had_subscription:
await set_subscription_tier(user_id, tier)
return await get_subscription_status(user_id)
await set_subscription_tier(user_id, tier)
return SubscriptionCheckoutResponse(url="")
return await get_subscription_status(user_id)
# Beta users (payment not enabled) → update tier directly without Stripe.
if not payment_enabled:
await set_subscription_tier(user_id, tier)
return SubscriptionCheckoutResponse(url="")
raise HTTPException(
status_code=422,
detail=f"Subscription not available for tier {tier.value}",
)
# Paid upgrade → create Stripe Checkout Session.
# Target has no LD price — not provisionable (matches the GET hiding).
if target_price_id is None:
raise HTTPException(
status_code=422,
detail=f"Subscription not available for tier {tier.value}",
)
# User has an active Stripe subscription (current tier has an LD price):
# modify it in-place. modify_stripe_subscription_for_tier returns False when no
# active sub exists — that's only a "DB-only flip is OK" signal for admin-granted
# paid tiers (PRO/BUSINESS with no Stripe record). Priced-BASIC users without a
# sub must still go through Checkout so they set up payment.
if current_tier_price_id is not None:
try:
modified = await modify_stripe_subscription_for_tier(user_id, tier)
if modified:
return await get_subscription_status(user_id)
if current_tier != SubscriptionTier.BASIC:
await set_subscription_tier(user_id, tier)
return await get_subscription_status(user_id)
except ValueError as e:
raise HTTPException(status_code=422, detail=str(e))
except stripe.StripeError as e:
logger.exception(
"Stripe error modifying subscription for user %s: %s", user_id, e
)
raise HTTPException(
status_code=502,
detail=(
"Unable to update your subscription right now. "
"Please try again or contact support."
),
)
# No active Stripe subscription → create Stripe Checkout Session.
if not request.success_url or not request.cancel_url:
raise HTTPException(
status_code=422,
detail="success_url and cancel_url are required for paid tier upgrades",
)
# Open-redirect protection: both URLs must point to the configured frontend
# origin, otherwise an attacker could use our Stripe integration as a
# redirector to arbitrary phishing sites.
#
# Fail early with a clear 503 if the server is misconfigured (neither
# frontend_base_url nor platform_base_url set), so operators get an
# actionable error instead of the misleading "must match the platform
# frontend origin" 422 that _validate_checkout_redirect_url would otherwise
# produce when `allowed` is empty.
if not (settings.config.frontend_base_url or settings.config.platform_base_url):
logger.error(
"update_subscription_tier: neither frontend_base_url nor "
"platform_base_url is configured; cannot validate checkout redirect URLs"
)
raise HTTPException(
status_code=503,
detail=(
"Payment redirect URLs cannot be validated: "
"frontend_base_url or platform_base_url must be set on the server."
),
)
if not _validate_checkout_redirect_url(
request.success_url
) or not _validate_checkout_redirect_url(request.cancel_url):
raise HTTPException(
status_code=422,
detail="success_url and cancel_url must match the platform frontend origin",
)
try:
url = await create_subscription_checkout(
user_id=user_id,
@@ -791,54 +1041,113 @@ async def update_subscription_tier(
success_url=request.success_url,
cancel_url=request.cancel_url,
)
except (ValueError, stripe.StripeError) as e:
except ValueError as e:
raise HTTPException(status_code=422, detail=str(e))
except stripe.StripeError as e:
logger.exception(
"Stripe error creating checkout session for user %s: %s", user_id, e
)
raise HTTPException(
status_code=502,
detail=(
"Unable to start checkout right now. "
"Please try again or contact support."
),
)
return SubscriptionCheckoutResponse(url=url)
status = await get_subscription_status(user_id)
status.url = url
return status
@v1_router.post(
path="/credits/stripe_webhook", summary="Handle Stripe webhooks", tags=["credits"]
)
async def stripe_webhook(request: Request):
webhook_secret = settings.secrets.stripe_webhook_secret
if not webhook_secret:
# Guard: an empty secret allows HMAC forgery (attacker can compute a valid
# signature over the same empty key). Reject all webhook calls when unconfigured.
logger.error(
"stripe_webhook: STRIPE_WEBHOOK_SECRET is not configured — "
"rejecting request to prevent signature bypass"
)
raise HTTPException(status_code=503, detail="Webhook not configured")
# Get the raw request body
payload = await request.body()
# Get the signature header
sig_header = request.headers.get("stripe-signature")
try:
event = stripe.Webhook.construct_event(
payload, sig_header, settings.secrets.stripe_webhook_secret
)
except ValueError as e:
event = stripe.Webhook.construct_event(payload, sig_header, webhook_secret)
except ValueError:
# Invalid payload
raise HTTPException(
status_code=400, detail=f"Invalid payload: {str(e) or type(e).__name__}"
)
except stripe.SignatureVerificationError as e:
raise HTTPException(status_code=400, detail="Invalid payload")
except stripe.SignatureVerificationError:
# Invalid signature
raise HTTPException(
status_code=400, detail=f"Invalid signature: {str(e) or type(e).__name__}"
raise HTTPException(status_code=400, detail="Invalid signature")
# Defensive payload extraction. A malformed payload (missing/non-dict
# `data.object`, missing `id`) would otherwise raise KeyError/TypeError
# AFTER signature verification — which Stripe interprets as a delivery
# failure and retries forever, while spamming Sentry with no useful info.
# Acknowledge with 200 and a warning so Stripe stops retrying.
event_type = event.get("type", "")
event_data = event.get("data") or {}
data_object = event_data.get("object") if isinstance(event_data, dict) else None
if not isinstance(data_object, dict):
logger.warning(
"stripe_webhook: %s missing or non-dict data.object; ignoring",
event_type,
)
return Response(status_code=200)
if (
event["type"] == "checkout.session.completed"
or event["type"] == "checkout.session.async_payment_succeeded"
if event_type in (
"checkout.session.completed",
"checkout.session.async_payment_succeeded",
):
await UserCredit().fulfill_checkout(session_id=event["data"]["object"]["id"])
session_id = data_object.get("id")
if not session_id:
logger.warning(
"stripe_webhook: %s missing data.object.id; ignoring", event_type
)
return Response(status_code=200)
await UserCredit().fulfill_checkout(session_id=session_id)
if event["type"] in (
if event_type in (
"customer.subscription.created",
"customer.subscription.updated",
"customer.subscription.deleted",
):
await sync_subscription_from_stripe(event["data"]["object"])
await sync_subscription_from_stripe(data_object)
if event["type"] == "charge.dispute.created":
await UserCredit().handle_dispute(event["data"]["object"])
# `subscription_schedule.updated` is deliberately omitted: our own
# `SubscriptionSchedule.create` + `.modify` calls in
# `_schedule_downgrade_at_period_end` would fire that event right back at us
# and loop redundant traffic through this handler. We only care about state
# transitions (released / completed); phase advance to the new price is
# already covered by `customer.subscription.updated`.
if event_type in (
"subscription_schedule.released",
"subscription_schedule.completed",
):
await sync_subscription_schedule_from_stripe(data_object)
if event["type"] == "refund.created" or event["type"] == "charge.dispute.closed":
await UserCredit().deduct_credits(event["data"]["object"])
if event_type == "invoice.payment_failed":
await handle_subscription_payment_failure(data_object)
# `handle_dispute` and `deduct_credits` expect Stripe SDK typed objects
# (Dispute/Refund). The Stripe webhook payload's `data.object` is a
# StripeObject (a dict subclass) carrying that runtime shape, so we cast
# to satisfy the type checker without changing runtime behaviour.
if event_type == "charge.dispute.created":
await UserCredit().handle_dispute(cast(stripe.Dispute, data_object))
if event_type == "refund.created" or event_type == "charge.dispute.closed":
await UserCredit().deduct_credits(
cast("stripe.Refund | stripe.Dispute", data_object)
)
return Response(status_code=200)
@@ -1422,6 +1731,10 @@ async def enable_execution_sharing(
# Generate a unique share token
share_token = str(uuid.uuid4())
# Remove stale allowlist records before updating the token — prevents a
# window where old records + new token could coexist.
await execution_db.delete_shared_execution_files(execution_id=graph_exec_id)
# Update the execution with share info
await execution_db.update_graph_execution_share_status(
execution_id=graph_exec_id,
@@ -1431,6 +1744,14 @@ async def enable_execution_sharing(
shared_at=datetime.now(timezone.utc),
)
# Create allowlist of workspace files referenced in outputs
await execution_db.create_shared_execution_files(
execution_id=graph_exec_id,
share_token=share_token,
user_id=user_id,
outputs=execution.outputs,
)
# Return the share URL
frontend_url = settings.config.frontend_base_url or "http://localhost:3000"
share_url = f"{frontend_url}/share/{share_token}"
@@ -1456,6 +1777,9 @@ async def disable_execution_sharing(
if not execution:
raise HTTPException(status_code=404, detail="Execution not found")
# Remove shared file allowlist records
await execution_db.delete_shared_execution_files(execution_id=graph_exec_id)
# Remove share info
await execution_db.update_graph_execution_share_status(
execution_id=graph_exec_id,
@@ -1481,6 +1805,43 @@ async def get_shared_execution(
return execution
@v1_router.get(
"/public/shared/{share_token}/files/{file_id}/download",
summary="Download a file from a shared execution",
operation_id="download_shared_file",
tags=["graphs"],
)
async def download_shared_file(
share_token: Annotated[
str,
Path(pattern=r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"),
],
file_id: Annotated[
str,
Path(pattern=r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"),
],
) -> Response:
"""Download a workspace file from a shared execution (no auth required).
Validates that the file was explicitly exposed when sharing was enabled.
Returns a uniform 404 for all failure modes to prevent enumeration attacks.
"""
# Single-query validation against the allowlist
execution_id = await execution_db.get_shared_execution_file(
share_token=share_token, file_id=file_id
)
if not execution_id:
raise HTTPException(status_code=404, detail="Not found")
# Look up the actual file (no workspace scoping needed — the allowlist
# already validated that this file belongs to the shared execution)
file = await get_workspace_file_by_id(file_id)
if not file:
raise HTTPException(status_code=404, detail="Not found")
return await create_file_download_response(file, inline=True)
########################################################
##################### Schedules ########################
########################################################

View File

@@ -0,0 +1,157 @@
"""Tests for the public shared file download endpoint."""
from datetime import datetime, timezone
from unittest.mock import AsyncMock, patch
import pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
from starlette.responses import Response
from backend.api.features.v1 import v1_router
from backend.data.workspace import WorkspaceFile
app = FastAPI()
app.include_router(v1_router, prefix="/api")
VALID_TOKEN = "550e8400-e29b-41d4-a716-446655440000"
VALID_FILE_ID = "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
def _make_workspace_file(**overrides) -> WorkspaceFile:
defaults = {
"id": VALID_FILE_ID,
"workspace_id": "ws-001",
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"name": "image.png",
"path": "/image.png",
"storage_path": "local://uploads/image.png",
"mime_type": "image/png",
"size_bytes": 4,
"checksum": None,
"is_deleted": False,
"deleted_at": None,
"metadata": {},
}
defaults.update(overrides)
return WorkspaceFile(**defaults)
def _mock_download_response(**kwargs):
"""Return an AsyncMock that resolves to a Response with inline disposition."""
async def _handler(file, *, inline=False):
return Response(
content=b"\x89PNG",
media_type="image/png",
headers={
"Content-Disposition": (
'inline; filename="image.png"'
if inline
else 'attachment; filename="image.png"'
),
"Content-Length": "4",
},
)
return _handler
class TestDownloadSharedFile:
"""Tests for GET /api/public/shared/{token}/files/{id}/download."""
@pytest.fixture(autouse=True)
def _client(self):
self.client = TestClient(app, raise_server_exceptions=False)
def test_valid_token_and_file_returns_inline_content(self):
with (
patch(
"backend.api.features.v1.execution_db.get_shared_execution_file",
new_callable=AsyncMock,
return_value="exec-123",
),
patch(
"backend.api.features.v1.get_workspace_file_by_id",
new_callable=AsyncMock,
return_value=_make_workspace_file(),
),
patch(
"backend.api.features.v1.create_file_download_response",
side_effect=_mock_download_response(),
),
):
response = self.client.get(
f"/api/public/shared/{VALID_TOKEN}/files/{VALID_FILE_ID}/download"
)
assert response.status_code == 200
assert response.content == b"\x89PNG"
assert "inline" in response.headers["Content-Disposition"]
def test_invalid_token_format_returns_422(self):
response = self.client.get(
f"/api/public/shared/not-a-uuid/files/{VALID_FILE_ID}/download"
)
assert response.status_code == 422
def test_token_not_in_allowlist_returns_404(self):
with patch(
"backend.api.features.v1.execution_db.get_shared_execution_file",
new_callable=AsyncMock,
return_value=None,
):
response = self.client.get(
f"/api/public/shared/{VALID_TOKEN}/files/{VALID_FILE_ID}/download"
)
assert response.status_code == 404
def test_file_missing_from_workspace_returns_404(self):
with (
patch(
"backend.api.features.v1.execution_db.get_shared_execution_file",
new_callable=AsyncMock,
return_value="exec-123",
),
patch(
"backend.api.features.v1.get_workspace_file_by_id",
new_callable=AsyncMock,
return_value=None,
),
):
response = self.client.get(
f"/api/public/shared/{VALID_TOKEN}/files/{VALID_FILE_ID}/download"
)
assert response.status_code == 404
def test_uniform_404_prevents_enumeration(self):
"""Both failure modes produce identical 404 — no information leak."""
with patch(
"backend.api.features.v1.execution_db.get_shared_execution_file",
new_callable=AsyncMock,
return_value=None,
):
resp_no_allow = self.client.get(
f"/api/public/shared/{VALID_TOKEN}/files/{VALID_FILE_ID}/download"
)
with (
patch(
"backend.api.features.v1.execution_db.get_shared_execution_file",
new_callable=AsyncMock,
return_value="exec-123",
),
patch(
"backend.api.features.v1.get_workspace_file_by_id",
new_callable=AsyncMock,
return_value=None,
),
):
resp_no_file = self.client.get(
f"/api/public/shared/{VALID_TOKEN}/files/{VALID_FILE_ID}/download"
)
assert resp_no_allow.status_code == 404
assert resp_no_file.status_code == 404
assert resp_no_allow.json() == resp_no_file.json()

View File

@@ -29,7 +29,9 @@ from backend.util.workspace import WorkspaceManager
from backend.util.workspace_storage import get_workspace_storage
def _sanitize_filename_for_header(filename: str) -> str:
def _sanitize_filename_for_header(
filename: str, disposition: str = "attachment"
) -> str:
"""
Sanitize filename for Content-Disposition header to prevent header injection.
@@ -44,11 +46,11 @@ def _sanitize_filename_for_header(filename: str) -> str:
# Check if filename has non-ASCII characters
try:
sanitized.encode("ascii")
return f'attachment; filename="{sanitized}"'
return f'{disposition}; filename="{sanitized}"'
except UnicodeEncodeError:
# Use RFC5987 encoding for UTF-8 filenames
encoded = quote(sanitized, safe="")
return f"attachment; filename*=UTF-8''{encoded}"
return f"{disposition}; filename*=UTF-8''{encoded}"
logger = logging.getLogger(__name__)
@@ -58,19 +60,26 @@ router = fastapi.APIRouter(
)
def _create_streaming_response(content: bytes, file: WorkspaceFile) -> Response:
def _create_streaming_response(
content: bytes, file: WorkspaceFile, *, inline: bool = False
) -> Response:
"""Create a streaming response for file content."""
disposition = _sanitize_filename_for_header(
file.name, disposition="inline" if inline else "attachment"
)
return Response(
content=content,
media_type=file.mime_type,
headers={
"Content-Disposition": _sanitize_filename_for_header(file.name),
"Content-Disposition": disposition,
"Content-Length": str(len(content)),
},
)
async def _create_file_download_response(file: WorkspaceFile) -> Response:
async def create_file_download_response(
file: WorkspaceFile, *, inline: bool = False
) -> Response:
"""
Create a download response for a workspace file.
@@ -82,7 +91,7 @@ async def _create_file_download_response(file: WorkspaceFile) -> Response:
# For local storage, stream the file directly
if file.storage_path.startswith("local://"):
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
return _create_streaming_response(content, file, inline=inline)
# For GCS, try to redirect to signed URL, fall back to streaming
try:
@@ -90,7 +99,7 @@ async def _create_file_download_response(file: WorkspaceFile) -> Response:
# If we got back an API path (fallback), stream directly instead
if url.startswith("/api/"):
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
return _create_streaming_response(content, file, inline=inline)
return fastapi.responses.RedirectResponse(url=url, status_code=302)
except Exception as e:
# Log the signed URL failure with context
@@ -102,7 +111,7 @@ async def _create_file_download_response(file: WorkspaceFile) -> Response:
# Fall back to streaming directly from GCS
try:
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
return _create_streaming_response(content, file, inline=inline)
except Exception as fallback_error:
logger.error(
f"Fallback streaming also failed for file {file.id} "
@@ -169,7 +178,7 @@ async def download_file(
if file is None:
raise fastapi.HTTPException(status_code=404, detail="File not found")
return await _create_file_download_response(file)
return await create_file_download_response(file)
@router.delete(

View File

@@ -600,3 +600,221 @@ def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
mock_instance.list_files.assert_called_once_with(
limit=11, offset=50, include_all_sessions=True
)
# -- _sanitize_filename_for_header tests --
class TestSanitizeFilenameForHeader:
def test_simple_ascii_attachment(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
assert _sanitize_filename_for_header("report.pdf") == (
'attachment; filename="report.pdf"'
)
def test_inline_disposition(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
assert _sanitize_filename_for_header("image.png", disposition="inline") == (
'inline; filename="image.png"'
)
def test_strips_cr_lf_null(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header("a\rb\nc\x00d.txt")
assert "\r" not in result
assert "\n" not in result
assert "\x00" not in result
assert 'filename="abcd.txt"' in result
def test_escapes_quotes(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header('file"name.txt')
assert 'filename="file\\"name.txt"' in result
def test_header_injection_blocked(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header("evil.txt\r\nX-Injected: true")
# CR/LF stripped — the remaining text is safely inside the quoted value
assert "\r" not in result
assert "\n" not in result
assert result == 'attachment; filename="evil.txtX-Injected: true"'
def test_unicode_uses_rfc5987(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header("日本語.pdf")
assert "filename*=UTF-8''" in result
assert "attachment" in result
def test_unicode_inline(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header("图片.png", disposition="inline")
assert result.startswith("inline; filename*=UTF-8''")
def test_empty_filename(self):
from backend.api.features.workspace.routes import _sanitize_filename_for_header
result = _sanitize_filename_for_header("")
assert result == 'attachment; filename=""'
# -- _create_streaming_response tests --
class TestCreateStreamingResponse:
def test_attachment_disposition_by_default(self):
from backend.api.features.workspace.routes import _create_streaming_response
file = _make_file(name="data.bin", mime_type="application/octet-stream")
response = _create_streaming_response(b"binary-data", file)
assert (
response.headers["Content-Disposition"] == 'attachment; filename="data.bin"'
)
assert response.headers["Content-Type"] == "application/octet-stream"
assert response.headers["Content-Length"] == "11"
assert response.body == b"binary-data"
def test_inline_disposition(self):
from backend.api.features.workspace.routes import _create_streaming_response
file = _make_file(name="photo.png", mime_type="image/png")
response = _create_streaming_response(b"\x89PNG", file, inline=True)
assert response.headers["Content-Disposition"] == 'inline; filename="photo.png"'
assert response.headers["Content-Type"] == "image/png"
def test_inline_sanitizes_filename(self):
from backend.api.features.workspace.routes import _create_streaming_response
file = _make_file(name='evil"\r\n.txt', mime_type="text/plain")
response = _create_streaming_response(b"data", file, inline=True)
assert "\r" not in response.headers["Content-Disposition"]
assert "\n" not in response.headers["Content-Disposition"]
assert "inline" in response.headers["Content-Disposition"]
def test_content_length_matches_body(self):
from backend.api.features.workspace.routes import _create_streaming_response
content = b"x" * 1000
file = _make_file(name="big.bin", mime_type="application/octet-stream")
response = _create_streaming_response(content, file)
assert response.headers["Content-Length"] == "1000"
# -- create_file_download_response tests --
class TestCreateFileDownloadResponse:
@pytest.mark.asyncio
async def test_local_storage_returns_streaming_response(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.retrieve.return_value = b"file contents"
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(
storage_path="local://uploads/test.txt",
mime_type="text/plain",
)
response = await create_file_download_response(file)
assert response.status_code == 200
assert response.body == b"file contents"
assert "attachment" in response.headers["Content-Disposition"]
@pytest.mark.asyncio
async def test_local_storage_inline(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.retrieve.return_value = b"\x89PNG"
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(
storage_path="local://uploads/photo.png",
mime_type="image/png",
name="photo.png",
)
response = await create_file_download_response(file, inline=True)
assert "inline" in response.headers["Content-Disposition"]
@pytest.mark.asyncio
async def test_gcs_redirect(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.get_download_url.return_value = (
"https://storage.googleapis.com/signed-url"
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(storage_path="gcs://bucket/file.pdf")
response = await create_file_download_response(file)
assert response.status_code == 302
assert (
response.headers["location"] == "https://storage.googleapis.com/signed-url"
)
@pytest.mark.asyncio
async def test_gcs_api_fallback_streams_directly(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.get_download_url.return_value = "/api/fallback"
mock_storage.retrieve.return_value = b"fallback content"
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(storage_path="gcs://bucket/file.txt")
response = await create_file_download_response(file)
assert response.status_code == 200
assert response.body == b"fallback content"
@pytest.mark.asyncio
async def test_gcs_signed_url_failure_falls_back_to_streaming(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.get_download_url.side_effect = RuntimeError("GCS error")
mock_storage.retrieve.return_value = b"streamed"
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(storage_path="gcs://bucket/file.txt")
response = await create_file_download_response(file)
assert response.status_code == 200
assert response.body == b"streamed"
@pytest.mark.asyncio
async def test_gcs_total_failure_raises(self, mocker):
from backend.api.features.workspace.routes import create_file_download_response
mock_storage = AsyncMock()
mock_storage.get_download_url.side_effect = RuntimeError("GCS error")
mock_storage.retrieve.side_effect = RuntimeError("Also failed")
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_storage",
return_value=mock_storage,
)
file = _make_file(storage_path="gcs://bucket/file.txt")
with pytest.raises(RuntimeError, match="Also failed"):
await create_file_download_response(file)

View File

@@ -17,6 +17,7 @@ from fastapi.routing import APIRoute
from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.diagnostics_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.platform_cost_routes
import backend.api.features.admin.rate_limit_admin_routes
@@ -31,6 +32,7 @@ import backend.api.features.library.routes
import backend.api.features.mcp.routes as mcp_routes
import backend.api.features.oauth
import backend.api.features.otto.routes
import backend.api.features.platform_linking.routes
import backend.api.features.postmark.postmark
import backend.api.features.store.model
import backend.api.features.store.routes
@@ -320,6 +322,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/credits",
)
app.include_router(
backend.api.features.admin.diagnostics_admin_routes.router,
tags=["v2", "admin"],
prefix="/api",
)
app.include_router(
backend.api.features.admin.execution_analytics_routes.router,
tags=["v2", "admin"],
@@ -372,6 +379,11 @@ app.include_router(
tags=["oauth"],
prefix="/api/oauth",
)
app.include_router(
backend.api.features.platform_linking.routes.router,
tags=["platform-linking"],
prefix="/api/platform-linking",
)
app.mount("/external-api", external_api)

View File

@@ -42,11 +42,13 @@ def main(**kwargs):
from backend.data.db_manager import DatabaseManager
from backend.executor import ExecutionManager, Scheduler
from backend.notifications import NotificationManager
from backend.platform_linking.manager import PlatformLinkingManager
run_processes(
DatabaseManager().set_log_level("warning"),
Scheduler(),
NotificationManager(),
PlatformLinkingManager(),
WebsocketServer(),
AgentServer(),
ExecutionManager(),

View File

@@ -96,27 +96,64 @@ class BlockCategory(Enum):
class BlockCostType(str, Enum):
RUN = "run" # cost X credits per run
BYTE = "byte" # cost X credits per byte
SECOND = "second" # cost X credits per second
# RUN : cost_amount credits per run.
# BYTE : cost_amount credits per byte of input data.
# SECOND : cost_amount credits per cost_divisor walltime seconds.
# ITEMS : cost_amount credits per cost_divisor items (from stats).
# COST_USD : cost_amount credits per USD of stats.provider_cost.
# TOKENS : per-(model, provider) rate table lookup; see TOKEN_COST.
RUN = "run"
BYTE = "byte"
SECOND = "second"
ITEMS = "items"
COST_USD = "cost_usd"
TOKENS = "tokens"
@property
def is_dynamic(self) -> bool:
"""Real charge is computed post-flight from stats.
Dynamic types (SECOND/ITEMS/COST_USD/TOKENS) return 0 pre-flight and
settle against stats via charge_reconciled_usage once the block runs.
"""
return self in _DYNAMIC_COST_TYPES
_DYNAMIC_COST_TYPES: frozenset[BlockCostType] = frozenset(
{
BlockCostType.SECOND,
BlockCostType.ITEMS,
BlockCostType.COST_USD,
BlockCostType.TOKENS,
}
)
class BlockCost(BaseModel):
cost_amount: int
cost_filter: BlockInput
cost_type: BlockCostType
# cost_divisor: interpret cost_amount as "credits per cost_divisor units".
# Only meaningful for SECOND / ITEMS. TOKENS routes through TOKEN_COST
# rate tables (per-model input/output/cache pricing) and ignores
# cost_divisor entirely. Defaults to 1 so existing RUN/BYTE entries stay
# point-wise. Example: cost_amount=1, cost_divisor=10 under SECOND means
# "1 credit per 10 seconds of walltime".
cost_divisor: int = 1
def __init__(
self,
cost_amount: int,
cost_type: BlockCostType = BlockCostType.RUN,
cost_filter: Optional[BlockInput] = None,
cost_divisor: int = 1,
**data: Any,
) -> None:
super().__init__(
cost_amount=cost_amount,
cost_filter=cost_filter or {},
cost_type=cost_type,
cost_divisor=max(1, cost_divisor),
**data,
)
@@ -168,9 +205,31 @@ class BlockSchema(BaseModel):
return cls.cached_jsonschema
@classmethod
def validate_data(cls, data: BlockInput) -> str | None:
def validate_data(
cls,
data: BlockInput,
exclude_fields: set[str] | None = None,
) -> str | None:
schema = cls.jsonschema()
if exclude_fields:
# Drop the excluded fields from both the properties and the
# ``required`` list so jsonschema doesn't flag them as missing.
# Used by the dry-run path to skip credentials validation while
# still validating the remaining block inputs.
schema = {
**schema,
"properties": {
k: v
for k, v in schema.get("properties", {}).items()
if k not in exclude_fields
},
"required": [
r for r in schema.get("required", []) if r not in exclude_fields
],
}
data = {k: v for k, v in data.items() if k not in exclude_fields}
return json.validate_with_jsonschema(
schema=cls.jsonschema(),
schema=schema,
data={k: v for k, v in data.items() if v is not None},
)
@@ -311,6 +370,8 @@ class BlockSchema(BaseModel):
"credentials_provider": [config.get("provider", "google")],
"credentials_types": [config.get("type", "oauth2")],
"credentials_scopes": config.get("scopes"),
"is_auto_credential": True,
"input_field_name": info["field_name"],
}
result[kwarg_name] = CredentialsFieldInfo.model_validate(
auto_schema, by_alias=True
@@ -421,19 +482,6 @@ class BlockWebhookConfig(BlockManualWebhookConfig):
class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
_optimized_description: ClassVar[str | None] = None
def extra_runtime_cost(self, execution_stats: NodeExecutionStats) -> int:
"""Return extra runtime cost to charge after this block run completes.
Called by the executor after a block finishes with COMPLETED status.
The return value is the number of additional base-cost credits to
charge beyond the single credit already collected by charge_usage
at the start of execution. Defaults to 0 (no extra charges).
Override in blocks (e.g. OrchestratorBlock) that make multiple LLM
calls within one run and should be billed per call.
"""
return 0
def __init__(
self,
id: str = "",
@@ -717,11 +765,16 @@ class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
# (e.g. AgentExecutorBlock) get proper input validation.
is_dry_run = getattr(kwargs.get("execution_context"), "dry_run", False)
if is_dry_run:
# Credential fields may be absent (LLM-built agents often skip
# wiring them) or nullified earlier in the pipeline. Validate
# the non-credential inputs against a schema with those fields
# excluded — stripping only the data while keeping them in the
# ``required`` list would falsely report ``'credentials' is a
# required property``.
cred_field_names = set(self.input_schema.get_credentials_fields().keys())
non_cred_data = {
k: v for k, v in input_data.items() if k not in cred_field_names
}
if error := self.input_schema.validate_data(non_cred_data):
if error := self.input_schema.validate_data(
input_data, exclude_fields=cred_field_names
):
raise BlockInputError(
message=f"Unable to execute block with invalid input data: {error}",
block_name=self.name,
@@ -735,6 +788,61 @@ class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
block_id=self.id,
)
# Ensure auto-credential kwargs are present before we hand off to
# run(). A missing auto-credential means the upstream field (e.g.
# a Google Drive picker) didn't embed a _credentials_id, or the
# executor couldn't resolve it. Without this guard, run() would
# crash with a TypeError (missing required kwarg) or an opaque
# AttributeError deep inside the provider SDK.
#
# Only raise when the field is ALSO not populated in input_data.
# ``_acquire_auto_credentials`` intentionally skips setting the
# kwarg in two legitimate cases — ``_credentials_id`` is ``None``
# (chained from upstream) or the field is missing from
# ``input_data`` at prep time (connected from upstream block).
# In both cases the upstream block is expected to populate the
# field value by execute time; raising here would break the
# documented ``AgentGoogleDriveFileInputBlock`` chaining pattern.
# Dry-run skips because the executor intentionally runs blocks
# without resolved creds for schema validation.
if not is_dry_run:
for (
kwarg_name,
info,
) in self.input_schema.get_auto_credentials_fields().items():
kwargs.setdefault(kwarg_name, None)
if kwargs[kwarg_name] is not None:
continue
# Upstream-chained pattern: the field was populated by a
# prior node (e.g. AgentGoogleDriveFileInputBlock) whose
# output carries a resolved ``_credentials_id``.
# ``_acquire_auto_credentials`` deliberately doesn't set
# the kwarg in that case because the value isn't available
# at prep time; the executor fills it in before we reach
# ``_execute``. Trust it if the ``_credentials_id`` KEY
# is present — its value may be explicitly ``None`` in
# the chained case (see sentry thread
# PRRT_kwDOJKSTjM58sJfA). Checking truthiness here would
# falsely preempt run() for every valid chained graph
# that ships ``_credentials_id=None`` in the picker
# object. Mirror ``_acquire_auto_credentials``'s own
# skip rule, which treats ``cred_id is None`` as a
# chained-skip signal.
field_name = info["field_name"]
field_value = input_data.get(field_name)
if isinstance(field_value, dict) and "_credentials_id" in field_value:
continue
raise BlockExecutionError(
message=(
f"Missing credentials for '{kwarg_name}'. "
"Select a file via the picker (which carries "
"its credentials), or connect credentials for "
"this block."
),
block_name=self.name,
block_id=self.id,
)
# Use the validated input data
async for output_name, output_data in self.run(
self.input_schema(**{k: v for k, v in input_data.items() if v is not None}),

View File

@@ -0,0 +1,56 @@
"""Provider descriptions for services that don't yet have their own ``_config.py``.
Every provider in ``_STATIC_PROVIDER_CONFIGS`` below is declared here because
its block code currently lives either in a single shared file (e.g. the 8 LLM
providers in ``blocks/llm.py``) or in a single-file block that has no dedicated
directory (e.g. ``blocks/reddit.py``).
This file gets loaded by the block auto-loader in ``blocks/__init__.py``
(``rglob("*.py")`` picks it up) so the ``ProviderBuilder(...).build()`` calls
run at startup and populate ``AutoRegistry`` before the first API request.
**Migration path:** when a provider graduates into its own directory with a
proper ``_config.py`` (following the SDK pattern, e.g. ``blocks/linear/_config.py``),
delete its entry here. The metadata will still be served by
``GET /integrations/providers`` — it just moves to live next to the provider's
auth and webhook config.
"""
from backend.data.model import CredentialsType
from backend.sdk import ProviderBuilder
_STATIC_PROVIDER_CONFIGS: dict[str, tuple[str, tuple[CredentialsType, ...]]] = {
# LLM providers that share blocks/llm.py
"aiml_api": ("Unified access to 100+ AI models", ("api_key",)),
"anthropic": ("Claude language models", ("api_key",)),
"groq": ("Fast LLM inference", ("api_key",)),
"llama_api": ("Llama model hosting", ("api_key",)),
"ollama": ("Run open-source LLMs locally", ("api_key",)),
"open_router": ("One API for every LLM", ("api_key",)),
"openai": ("GPT models and embeddings", ("api_key",)),
"v0": ("AI-generated UI components", ("api_key",)),
# Single-file providers (one provider per standalone blocks/*.py file)
"d_id": ("AI avatar and video generation", ("api_key",)),
"e2b": ("Sandboxed code execution", ("api_key",)),
"google_maps": ("Places, directions, geocoding", ("api_key",)),
"http": ("Generic HTTP requests", ("api_key", "host_scoped")),
"ideogram": ("Text-to-image generation", ("api_key",)),
"medium": ("Publish stories and posts", ("api_key",)),
"mem0": ("Long-term memory for agents", ("api_key",)),
"openweathermap": ("Weather data and forecasts", ("api_key",)),
"pinecone": ("Managed vector database", ("api_key",)),
"reddit": ("Subreddits, posts, and comments", ("oauth2",)),
"revid": ("AI-generated short-form video", ("api_key",)),
"screenshotone": ("Automated website screenshots", ("api_key",)),
"smtp": ("Send email via SMTP", ("user_password",)),
"unreal_speech": ("Low-cost text-to-speech", ("api_key",)),
"webshare_proxy": ("Rotating proxies for scraping", ("api_key",)),
}
for _name, (_description, _auth_types) in _STATIC_PROVIDER_CONFIGS.items():
(
ProviderBuilder(_name)
.with_description(_description)
.with_supported_auth_types(*_auth_types)
.build()
)

View File

@@ -171,7 +171,10 @@ class AgentExecutorBlock(Block):
)
self.merge_stats(
NodeExecutionStats(
extra_cost=event.stats.cost if event.stats else 0,
# Sub-graph already debited each of its own nodes; we
# roll up its total so graph_stats.cost reflects the
# full sub-graph spend.
reconciled_cost_delta=(event.stats.cost if event.stats else 0),
extra_steps=event.stats.node_exec_count if event.stats else 0,
)
)

View File

@@ -4,11 +4,17 @@ Shared configuration for all AgentMail blocks.
from agentmail import AsyncAgentMail
from backend.sdk import APIKeyCredentials, ProviderBuilder, SecretStr
from backend.sdk import APIKeyCredentials, BlockCostType, ProviderBuilder, SecretStr
# AgentMail is in beta with no published paid tier yet, but ~37 blocks
# without any BLOCK_COSTS entry means they currently execute wallet-free.
# 1 cr/call is a conservative interim floor so no AgentMail work leaks
# past billing. Revisit once AgentMail publishes usage-based pricing.
agent_mail = (
ProviderBuilder("agent_mail")
.with_description("Managed email accounts for agents")
.with_api_key("AGENTMAIL_API_KEY", "AgentMail API Key")
.with_base_cost(1, BlockCostType.RUN)
.build()
)

View File

@@ -10,6 +10,7 @@ from ._webhook import AirtableWebhookManager
# Configure the Airtable provider with API key authentication
airtable = (
ProviderBuilder("airtable")
.with_description("Bases, tables, and records")
.with_api_key("AIRTABLE_API_KEY", "Airtable Personal Access Token")
.with_webhook_manager(AirtableWebhookManager)
.with_base_cost(1, BlockCostType.RUN)

View File

@@ -0,0 +1,15 @@
"""Provider registration for Apollo.
Registers the provider description shown in the settings integrations UI.
Apollo doesn't use a full :class:`ProviderBuilder` chain (auth is set up in
``_auth.py``), so this file only declares metadata.
"""
from backend.sdk import ProviderBuilder
apollo = (
ProviderBuilder("apollo")
.with_description("Sales intelligence and prospecting")
.with_supported_auth_types("api_key")
.build()
)

View File

@@ -23,6 +23,7 @@ from backend.copilot.permissions import (
validate_block_identifiers,
)
from backend.data.model import SchemaField
from backend.util.exceptions import BlockExecutionError
if TYPE_CHECKING:
from backend.data.execution import ExecutionContext
@@ -32,9 +33,36 @@ logger = logging.getLogger(__name__)
# Block ID shared between autopilot.py and copilot prompting.py.
AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
# Identifiers used when registering an AutoPilotBlock turn with the
# stream registry — distinguishes block-originated turns from sub-session
# or HTTP SSE turns in logs / observability.
_AUTOPILOT_TOOL_CALL_ID = "autopilot_block"
_AUTOPILOT_TOOL_NAME = "autopilot_block"
class SubAgentRecursionError(RuntimeError):
"""Raised when the sub-agent nesting depth limit is exceeded."""
# Ceiling on how long AutoPilotBlock.execute_copilot will wait for the
# enqueued turn's terminal event. Graph blocks run synchronously from
# the caller's perspective so we wait effectively as long as needed; 6h
# matches the previous abandoned-task cap and is much longer than any
# legitimate AutoPilot turn.
_AUTOPILOT_BLOCK_MAX_WAIT_SECONDS = 6 * 60 * 60 # 6 hours
class SubAgentRecursionError(BlockExecutionError):
"""Raised when the AutoPilot sub-agent nesting depth limit is exceeded.
Inherits :class:`BlockExecutionError` — this is a known, handled
runtime failure at the block level (caller nested AutoPilotBlocks
beyond the configured limit). Surfaces with the block_name /
block_id the block framework expects, instead of being wrapped in
``BlockUnknownError``.
"""
def __init__(self, message: str) -> None:
super().__init__(
message=message,
block_name="AutoPilotBlock",
block_id=AUTOPILOT_BLOCK_ID,
)
class ToolCallEntry(TypedDict):
@@ -268,11 +296,15 @@ class AutoPilotBlock(Block):
user_id: str,
permissions: "CopilotPermissions | None" = None,
) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
"""Invoke the copilot and collect all stream results.
"""Invoke the copilot on the copilot_executor queue and aggregate the
result.
Delegates to :func:`collect_copilot_response` — the shared helper that
consumes ``stream_chat_completion_sdk`` without wrapping it in an
``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
Delegates to :func:`run_copilot_turn_via_queue` — the shared
primitive used by ``run_sub_session`` too — which creates the
stream_registry meta record, enqueues the job, and waits on the
Redis stream for the terminal event. Any available
copilot_executor worker picks up the job, so this call survives
the graph-executor worker dying mid-turn (RabbitMQ redelivers).
Args:
prompt: The user task/instruction.
@@ -285,8 +317,8 @@ class AutoPilotBlock(Block):
Returns:
A tuple of (response_text, tool_calls, history_json, session_id, usage).
"""
from backend.copilot.sdk.collect import (
collect_copilot_response, # avoid circular import
from backend.copilot.sdk.session_waiter import (
run_copilot_turn_via_queue, # avoid circular import
)
tokens = _check_recursion(max_recursion_depth)
@@ -299,14 +331,35 @@ class AutoPilotBlock(Block):
if system_context:
effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
result = await collect_copilot_response(
outcome, result = await run_copilot_turn_via_queue(
session_id=session_id,
message=effective_prompt,
user_id=user_id,
message=effective_prompt,
# Graph block execution is synchronous from the caller's
# perspective — wait effectively as long as needed. The
# SDK enforces its own idle-based timeout inside the
# stream_registry pipeline.
timeout=_AUTOPILOT_BLOCK_MAX_WAIT_SECONDS,
permissions=effective_permissions,
tool_call_id=_AUTOPILOT_TOOL_CALL_ID,
tool_name=_AUTOPILOT_TOOL_NAME,
)
if outcome == "failed":
raise RuntimeError(
"AutoPilot turn failed — see the session's transcript"
)
if outcome == "running":
raise RuntimeError(
"AutoPilot turn did not complete within "
f"{_AUTOPILOT_BLOCK_MAX_WAIT_SECONDS}s — session "
f"{session_id}"
)
# Build a lightweight conversation summary from streamed data.
# Build a lightweight conversation summary from the aggregated data.
# When ``result.queued`` is True the prompt rode on an already-
# in-flight turn (``run_copilot_turn_via_queue`` queued it and
# waited on the existing turn's stream); the aggregated result
# is still valid, so the same rendering path applies.
turn_messages: list[dict[str, Any]] = [
{"role": "user", "content": effective_prompt},
]
@@ -315,7 +368,7 @@ class AutoPilotBlock(Block):
{
"role": "assistant",
"content": result.response_text,
"tool_calls": result.tool_calls,
"tool_calls": [tc.model_dump() for tc in result.tool_calls],
}
)
else:
@@ -326,11 +379,11 @@ class AutoPilotBlock(Block):
tool_calls: list[ToolCallEntry] = [
{
"tool_call_id": tc["tool_call_id"],
"tool_name": tc["tool_name"],
"input": tc["input"],
"output": tc["output"],
"success": tc["success"],
"tool_call_id": tc.tool_call_id,
"tool_name": tc.tool_name,
"input": tc.input,
"output": tc.output,
"success": tc.success,
}
for tc in result.tool_calls
]

View File

@@ -0,0 +1,26 @@
"""Shared provider config for Ayrshare social-media blocks.
The "credential" exposed to blocks is the **per-user Ayrshare profile key**,
not the org-level ``AYRSHARE_API_KEY``. Profile keys are provisioned per
user by :class:`~backend.integrations.managed_providers.ayrshare.AyrshareManagedProvider`
and stored in the normal credentials list with ``is_managed=True``, so every
Ayrshare block fits the standard credential flow:
credentials: CredentialsMetaInput = ayrshare.credentials_field(...)
``run_block`` / ``resolve_block_credentials`` take care of the rest.
``with_managed_api_key()`` registers ``api_key`` as a supported auth type
without the env-var-backed default credential that ``with_api_key()`` would
create — the org-level ``AYRSHARE_API_KEY`` is the admin key and must never
reach a block as a "profile key".
"""
from backend.sdk import ProviderBuilder
ayrshare = (
ProviderBuilder("ayrshare")
.with_description("Post to every social network")
.with_managed_api_key()
.build()
)

View File

@@ -0,0 +1,18 @@
from backend.sdk import BlockCost, BlockCostType
# Ayrshare is a subscription proxy ($149/mo Business). Per-post credit charges
# prevent a single heavy user from absorbing the fixed cost and align with the
# upload cost of each post variant.
# cost_filter matches on input_data.is_video BEFORE run() executes, so the flag
# has to be correct at input-eval time. Video-only platforms (YouTube, Snapchat)
# override the base default to True; platforms that accept both (TikTok, etc.)
# rely on the caller setting is_video explicitly for accurate billing.
# First match wins in block_usage_cost, so list the video tier first.
AYRSHARE_POST_COSTS = (
BlockCost(
cost_amount=5, cost_type=BlockCostType.RUN, cost_filter={"is_video": True}
),
BlockCost(
cost_amount=2, cost_type=BlockCostType.RUN, cost_filter={"is_video": False}
),
)

View File

@@ -4,22 +4,25 @@ from typing import Optional
from pydantic import BaseModel, Field
from backend.blocks._base import BlockSchemaInput
from backend.data.model import SchemaField, UserIntegrations
from backend.data.model import CredentialsMetaInput, SchemaField
from backend.integrations.ayrshare import AyrshareClient
from backend.util.clients import get_database_manager_async_client
from backend.util.exceptions import MissingConfigError
async def get_profile_key(user_id: str):
user_integrations: UserIntegrations = (
await get_database_manager_async_client().get_user_integrations(user_id)
)
return user_integrations.managed_credentials.ayrshare_profile_key
from ._config import ayrshare
class BaseAyrshareInput(BlockSchemaInput):
"""Base input model for Ayrshare social media posts with common fields."""
credentials: CredentialsMetaInput = ayrshare.credentials_field(
description=(
"Ayrshare profile credential. AutoGPT provisions this managed "
"credential automatically — the user does not create it. After "
"it's in place, the user links each social account via the "
"Ayrshare SSO popup in the Builder."
),
)
post: str = SchemaField(
description="The post text to be published", default="", advanced=False
)
@@ -29,7 +32,9 @@ class BaseAyrshareInput(BlockSchemaInput):
advanced=False,
)
is_video: bool = SchemaField(
description="Whether the media is a video", default=False, advanced=True
description="Whether the media is a video. Set to True when uploading a video so billing applies the video tier.",
default=False,
advanced=True,
)
schedule_date: Optional[datetime] = SchemaField(
description="UTC datetime for scheduling (YYYY-MM-DDThh:mm:ssZ)",

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToBlueskyBlock(Block):
"""Block for posting to Bluesky with Bluesky-specific options."""
@@ -57,16 +61,10 @@ class PostToBlueskyBlock(Block):
self,
input_data: "PostToBlueskyBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Bluesky with Bluesky-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -106,7 +104,7 @@ class PostToBlueskyBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
bluesky_options=bluesky_options if bluesky_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,21 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import (
BaseAyrshareInput,
CarouselItem,
create_ayrshare_client,
get_profile_key,
)
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, CarouselItem, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToFacebookBlock(Block):
"""Block for posting to Facebook with Facebook-specific options."""
@@ -120,15 +119,10 @@ class PostToFacebookBlock(Block):
self,
input_data: "PostToFacebookBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Facebook with Facebook-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -204,7 +198,7 @@ class PostToFacebookBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
facebook_options=facebook_options if facebook_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToGMBBlock(Block):
"""Block for posting to Google My Business with GMB-specific options."""
@@ -110,14 +114,13 @@ class PostToGMBBlock(Block):
)
async def run(
self, input_data: "PostToGMBBlock.Input", *, user_id: str, **kwargs
self,
input_data: "PostToGMBBlock.Input",
*,
credentials: APIKeyCredentials,
**kwargs
) -> BlockOutput:
"""Post to Google My Business with GMB-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -202,7 +205,7 @@ class PostToGMBBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
gmb_options=gmb_options if gmb_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -2,22 +2,21 @@ from typing import Any
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import (
BaseAyrshareInput,
InstagramUserTag,
create_ayrshare_client,
get_profile_key,
)
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, InstagramUserTag, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToInstagramBlock(Block):
"""Block for posting to Instagram with Instagram-specific options."""
@@ -112,15 +111,10 @@ class PostToInstagramBlock(Block):
self,
input_data: "PostToInstagramBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Instagram with Instagram-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -241,7 +235,7 @@ class PostToInstagramBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
instagram_options=instagram_options if instagram_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToLinkedInBlock(Block):
"""Block for posting to LinkedIn with LinkedIn-specific options."""
@@ -112,15 +116,10 @@ class PostToLinkedInBlock(Block):
self,
input_data: "PostToLinkedInBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to LinkedIn with LinkedIn-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -214,7 +213,7 @@ class PostToLinkedInBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
linkedin_options=linkedin_options if linkedin_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,21 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import (
BaseAyrshareInput,
PinterestCarouselOption,
create_ayrshare_client,
get_profile_key,
)
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, PinterestCarouselOption, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToPinterestBlock(Block):
"""Block for posting to Pinterest with Pinterest-specific options."""
@@ -92,15 +91,10 @@ class PostToPinterestBlock(Block):
self,
input_data: "PostToPinterestBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Pinterest with Pinterest-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -206,7 +200,7 @@ class PostToPinterestBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
pinterest_options=pinterest_options if pinterest_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToRedditBlock(Block):
"""Block for posting to Reddit."""
@@ -35,12 +39,12 @@ class PostToRedditBlock(Block):
)
async def run(
self, input_data: "PostToRedditBlock.Input", *, user_id: str, **kwargs
self,
input_data: "PostToRedditBlock.Input",
*,
credentials: APIKeyCredentials,
**kwargs
) -> BlockOutput:
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured."
@@ -61,7 +65,7 @@ class PostToRedditBlock(Block):
random_post=input_data.random_post,
random_media_url=input_data.random_media_url,
notes=input_data.notes,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToSnapchatBlock(Block):
"""Block for posting to Snapchat with Snapchat-specific options."""
@@ -31,6 +35,14 @@ class PostToSnapchatBlock(Block):
advanced=False,
)
# Snapchat is video-only; override the base default so the @cost filter
# selects the 5-credit video tier instead of the 2-credit image tier.
is_video: bool = SchemaField(
description="Whether the media is a video (always True for Snapchat)",
default=True,
advanced=True,
)
# Snapchat-specific options
story_type: str = SchemaField(
description="Type of Snapchat content: 'story' (24-hour Stories), 'saved_story' (Saved Stories), or 'spotlight' (Spotlight posts)",
@@ -62,15 +74,10 @@ class PostToSnapchatBlock(Block):
self,
input_data: "PostToSnapchatBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Snapchat with Snapchat-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -121,7 +128,7 @@ class PostToSnapchatBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
snapchat_options=snapchat_options if snapchat_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToTelegramBlock(Block):
"""Block for posting to Telegram with Telegram-specific options."""
@@ -57,15 +61,10 @@ class PostToTelegramBlock(Block):
self,
input_data: "PostToTelegramBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Telegram with Telegram-specific validation."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -108,7 +107,7 @@ class PostToTelegramBlock(Block):
random_post=input_data.random_post,
random_media_url=input_data.random_media_url,
notes=input_data.notes,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToThreadsBlock(Block):
"""Block for posting to Threads with Threads-specific options."""
@@ -50,15 +54,10 @@ class PostToThreadsBlock(Block):
self,
input_data: "PostToThreadsBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to Threads with Threads-specific validation."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -103,7 +102,7 @@ class PostToThreadsBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
threads_options=threads_options if threads_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -2,15 +2,18 @@ from enum import Enum
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
class TikTokVisibility(str, Enum):
@@ -19,6 +22,7 @@ class TikTokVisibility(str, Enum):
FOLLOWERS = "followers"
@cost(*AYRSHARE_POST_COSTS)
class PostToTikTokBlock(Block):
"""Block for posting to TikTok with TikTok-specific options."""
@@ -113,14 +117,13 @@ class PostToTikTokBlock(Block):
)
async def run(
self, input_data: "PostToTikTokBlock.Input", *, user_id: str, **kwargs
self,
input_data: "PostToTikTokBlock.Input",
*,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to TikTok with TikTok-specific validation and options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -235,7 +238,7 @@ class PostToTikTokBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
tiktok_options=tiktok_options if tiktok_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -1,16 +1,20 @@
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
@cost(*AYRSHARE_POST_COSTS)
class PostToXBlock(Block):
"""Block for posting to X / Twitter with Twitter-specific options."""
@@ -115,15 +119,10 @@ class PostToXBlock(Block):
self,
input_data: "PostToXBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to X / Twitter with enhanced X-specific options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -233,7 +232,7 @@ class PostToXBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
twitter_options=twitter_options if twitter_options else None,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -3,15 +3,18 @@ from typing import Any
from backend.integrations.ayrshare import PostIds, PostResponse, SocialPlatform
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaOutput,
BlockType,
SchemaField,
cost,
)
from ._util import BaseAyrshareInput, create_ayrshare_client, get_profile_key
from ._cost import AYRSHARE_POST_COSTS
from ._util import BaseAyrshareInput, create_ayrshare_client
class YouTubeVisibility(str, Enum):
@@ -20,6 +23,7 @@ class YouTubeVisibility(str, Enum):
UNLISTED = "unlisted"
@cost(*AYRSHARE_POST_COSTS)
class PostToYouTubeBlock(Block):
"""Block for posting to YouTube with YouTube-specific options."""
@@ -39,6 +43,14 @@ class PostToYouTubeBlock(Block):
advanced=False,
)
# YouTube is video-only; override the base default so the @cost filter
# selects the 5-credit video tier instead of the 2-credit image tier.
is_video: bool = SchemaField(
description="Whether the media is a video (always True for YouTube)",
default=True,
advanced=True,
)
# YouTube-specific required options
title: str = SchemaField(
description="Video title (max 100 chars, required). Cannot contain < or > characters.",
@@ -137,16 +149,10 @@ class PostToYouTubeBlock(Block):
self,
input_data: "PostToYouTubeBlock.Input",
*,
user_id: str,
credentials: APIKeyCredentials,
**kwargs,
) -> BlockOutput:
"""Post to YouTube with YouTube-specific validation and options."""
profile_key = await get_profile_key(user_id)
if not profile_key:
yield "error", "Please link a social account via Ayrshare"
return
client = create_ayrshare_client()
if not client:
yield "error", "Ayrshare integration is not configured. Please set up the AYRSHARE_API_KEY."
@@ -302,7 +308,7 @@ class PostToYouTubeBlock(Block):
random_media_url=input_data.random_media_url,
notes=input_data.notes,
youtube_options=youtube_options,
profile_key=profile_key.get_secret_value(),
profile_key=credentials.api_key.get_secret_value(),
)
yield "post_result", response
if response.postIds:

View File

@@ -7,6 +7,7 @@ from backend.sdk import BlockCostType, ProviderBuilder
# Configure the Meeting BaaS provider with API key authentication
baas = (
ProviderBuilder("baas")
.with_description("Meeting recording and transcription")
.with_api_key("MEETING_BAAS_API_KEY", "Meeting BaaS API Key")
.with_base_cost(5, BlockCostType.RUN) # Higher cost for meeting recording service
.build()

View File

@@ -4,21 +4,34 @@ Meeting BaaS bot (recording) blocks.
from typing import Optional
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockCost,
BlockCostType,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
cost,
)
from ._api import MeetingBaasAPI
from ._config import baas
# Meeting BaaS recording rate: $0.69 per hour.
_MEETING_BAAS_USD_PER_SECOND = 0.69 / 3600
# Join bills a flat 30 cr commit (covers median short meeting);
# FetchMeetingData bills the duration-scaled remainder from the
# `duration_seconds` field on the API response. Long meetings no
# longer under-bill.
@cost(BlockCost(cost_type=BlockCostType.RUN, cost_amount=30))
class BaasBotJoinMeetingBlock(Block):
"""
Deploy a bot immediately or at a scheduled start_time to join and record a meeting.
@@ -134,6 +147,7 @@ class BaasBotLeaveMeetingBlock(Block):
yield "left", left
@cost(BlockCost(cost_type=BlockCostType.COST_USD, cost_amount=150))
class BaasBotFetchMeetingDataBlock(Block):
"""
Pull MP4 URL, transcript & metadata for a completed meeting.
@@ -176,9 +190,21 @@ class BaasBotFetchMeetingDataBlock(Block):
include_transcripts=input_data.include_transcripts,
)
bot_meta = data.get("bot_data", {}).get("bot", {}) or {}
# Bill recording duration via COST_USD so multi-hour meetings
# scale past the Join block's flat 30 cr deposit.
duration_seconds = float(bot_meta.get("duration_seconds") or 0)
if duration_seconds > 0:
self.merge_stats(
NodeExecutionStats(
provider_cost=duration_seconds * _MEETING_BAAS_USD_PER_SECOND,
provider_cost_type="cost_usd",
)
)
yield "mp4_url", data.get("mp4", "")
yield "transcript", data.get("bot_data", {}).get("transcripts", [])
yield "metadata", data.get("bot_data", {}).get("bot", {})
yield "metadata", bot_meta
class BaasBotDeleteRecordingBlock(Block):

View File

@@ -0,0 +1,86 @@
"""Unit tests for Meeting BaaS duration-based cost emission."""
from unittest.mock import AsyncMock, patch
import pytest
from pydantic import SecretStr
from backend.blocks.baas.bots import (
_MEETING_BAAS_USD_PER_SECOND,
BaasBotFetchMeetingDataBlock,
)
from backend.data.model import APIKeyCredentials, NodeExecutionStats
TEST_CREDENTIALS = APIKeyCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="baas",
title="Mock BaaS API Key",
api_key=SecretStr("mock-baas-api-key"),
expires_at=None,
)
def test_usd_per_second_derives_from_published_rate():
"""$0.69/hour published rate → ~$0.000192/second."""
assert _MEETING_BAAS_USD_PER_SECOND == pytest.approx(0.69 / 3600)
@pytest.mark.asyncio
@pytest.mark.parametrize(
"duration_seconds, expected_usd",
[
(3600, 0.69), # 1 hour
(1800, 0.345), # 30 min
(0, None), # no recording → no emission
(None, None), # missing duration field → no emission
],
)
async def test_fetch_meeting_data_emits_duration_cost_usd(
duration_seconds, expected_usd
):
"""FetchMeetingData extracts duration_seconds from bot metadata and
emits provider_cost / cost_usd scaled by the published $0.69/hr rate.
Emission is skipped when duration is 0 or missing.
"""
block = BaasBotFetchMeetingDataBlock()
bot_meta = {"id": "bot-xyz"}
if duration_seconds is not None:
bot_meta["duration_seconds"] = duration_seconds
mock_api = AsyncMock()
mock_api.get_meeting_data.return_value = {
"mp4": "https://example/recording.mp4",
"bot_data": {"bot": bot_meta, "transcripts": []},
}
captured: list[NodeExecutionStats] = []
with (
patch("backend.blocks.baas.bots.MeetingBaasAPI", return_value=mock_api),
patch.object(block, "merge_stats", side_effect=captured.append),
):
outputs = []
async for name, val in block.run(
block.input_schema(
credentials={
"id": TEST_CREDENTIALS.id,
"provider": TEST_CREDENTIALS.provider,
"type": TEST_CREDENTIALS.type,
},
bot_id="bot-xyz",
include_transcripts=False,
),
credentials=TEST_CREDENTIALS,
):
outputs.append((name, val))
# Always yields the 3 outputs regardless of duration.
names = [n for n, _ in outputs]
assert "mp4_url" in names and "metadata" in names
if expected_usd is None:
assert captured == []
else:
assert len(captured) == 1
assert captured[0].provider_cost == pytest.approx(expected_usd)
assert captured[0].provider_cost_type == "cost_usd"

View File

@@ -2,7 +2,8 @@ from backend.sdk import BlockCostType, ProviderBuilder
bannerbear = (
ProviderBuilder("bannerbear")
.with_description("Auto-generate images and videos")
.with_api_key("BANNERBEAR_API_KEY", "Bannerbear API Key")
.with_base_cost(1, BlockCostType.RUN)
.with_base_cost(3, BlockCostType.RUN)
.build()
)

View File

@@ -433,7 +433,7 @@ class TestJinaEmbeddingBlockCostTracking:
class TestUnrealTextToSpeechBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_character_count(self):
"""provider_cost equals len(text) with type='characters'."""
"""provider_cost = len(text) * $0.000016 with type='cost_usd'."""
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
from backend.blocks.text_to_speech_block import (
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
@@ -461,12 +461,12 @@ class TestUnrealTextToSpeechBlockCostTracking:
mock_merge.assert_called_once()
stats = mock_merge.call_args[0][0]
assert stats.provider_cost == float(len(test_text))
assert stats.provider_cost_type == "characters"
assert stats.provider_cost == pytest.approx(len(test_text) * 0.000016)
assert stats.provider_cost_type == "cost_usd"
@pytest.mark.asyncio
async def test_empty_text_gives_zero_characters(self):
"""An empty text string results in provider_cost=0.0."""
"""An empty text string results in provider_cost=0.0 (cost_usd)."""
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
from backend.blocks.text_to_speech_block import (
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
@@ -494,7 +494,7 @@ class TestUnrealTextToSpeechBlockCostTracking:
mock_merge.assert_called_once()
stats = mock_merge.call_args[0][0]
assert stats.provider_cost == 0.0
assert stats.provider_cost_type == "characters"
assert stats.provider_cost_type == "cost_usd"
# ---------------------------------------------------------------------------

View File

@@ -17,6 +17,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -431,6 +432,7 @@ class ClaudeCodeBlock(Block):
# The JSON output contains the result
output_data = json.loads(raw_output)
response = output_data.get("result", raw_output)
self._record_cli_cost(output_data)
# Build conversation history entry
turn_entry = f"User: {prompt}\nClaude: {response}"
@@ -484,6 +486,23 @@ class ClaudeCodeBlock(Block):
escaped = prompt.replace("'", "'\"'\"'")
return f"'{escaped}'"
def _record_cli_cost(self, output_data: dict) -> None:
"""Feed Claude Code CLI's `total_cost_usd` to the COST_USD resolver.
The CLI rolls up Anthropic LLM + internal tool-call spend into
``total_cost_usd`` on its JSON response; piping it through
``merge_stats`` lets the wallet reflect real spend.
"""
total_cost_usd = output_data.get("total_cost_usd")
if total_cost_usd is None:
return
self.merge_stats(
NodeExecutionStats(
provider_cost=float(total_cost_usd),
provider_cost_type="cost_usd",
)
)
async def run(
self,
input_data: Input,

View File

@@ -0,0 +1,106 @@
"""Unit tests for ClaudeCodeBlock COST_USD billing migration.
Verifies:
- Block emits provider_cost / cost_usd when Claude Code CLI returns
total_cost_usd.
- block_usage_cost resolves the COST_USD entry to the expected ceil(usd *
cost_amount) credit charge.
- Missing total_cost_usd gracefully produces provider_cost=None (no bill).
"""
from unittest.mock import MagicMock, patch
import pytest
from backend.blocks._base import BlockCostType
from backend.blocks.claude_code import ClaudeCodeBlock
from backend.data.block_cost_config import BLOCK_COSTS
from backend.data.model import NodeExecutionStats
from backend.executor.utils import block_usage_cost
def test_claude_code_registered_as_cost_usd_150():
"""Sanity: BLOCK_COSTS holds the COST_USD, 150 cr/$ entry."""
entries = BLOCK_COSTS[ClaudeCodeBlock]
assert len(entries) == 1
entry = entries[0]
assert entry.cost_type == BlockCostType.COST_USD
assert entry.cost_amount == 150
@pytest.mark.parametrize(
"total_cost_usd, expected_credits",
[
(0.50, 75), # $0.50 × 150 = 75 cr
(1.00, 150), # $1.00 × 150 = 150 cr
(0.0134, 3), # ceil(0.0134 × 150) = ceil(2.01) = 3
(2.00, 300), # $2 × 150 = 300 cr
(0.001, 1), # ceil(0.001 × 150) = ceil(0.15) = 1 — no 0-cr leak on
# sub-cent runs
],
)
def test_cost_usd_resolver_applies_150_multiplier(total_cost_usd, expected_credits):
"""block_usage_cost with cost_usd stats returns ceil(usd * 150)."""
block = ClaudeCodeBlock()
# cost_filter requires matching e2b_credentials; supply the ones the
# registration uses so _is_cost_filter_match accepts the input.
entry = BLOCK_COSTS[ClaudeCodeBlock][0]
input_data = {"e2b_credentials": entry.cost_filter["e2b_credentials"]}
stats = NodeExecutionStats(
provider_cost=total_cost_usd,
provider_cost_type="cost_usd",
)
cost, matching_filter = block_usage_cost(
block=block, input_data=input_data, stats=stats
)
assert cost == expected_credits
assert matching_filter == entry.cost_filter
def test_cost_usd_resolver_returns_zero_when_stats_missing_cost():
"""Pre-flight (no stats) or unbilled run (provider_cost None) → 0."""
block = ClaudeCodeBlock()
entry = BLOCK_COSTS[ClaudeCodeBlock][0]
input_data = {"e2b_credentials": entry.cost_filter["e2b_credentials"]}
# No stats at all → pre-flight path, returns 0.
pre_cost, _ = block_usage_cost(block=block, input_data=input_data)
assert pre_cost == 0
# Stats present but no provider_cost → resolver can't bill.
stats = NodeExecutionStats()
post_cost, _ = block_usage_cost(block=block, input_data=input_data, stats=stats)
assert post_cost == 0
def test_record_cli_cost_emits_provider_cost_when_total_cost_present():
"""``_record_cli_cost`` (the helper called from ``execute_claude_code``)
must emit a single ``merge_stats`` with provider_cost + cost_usd tag
when the CLI JSON payload carries ``total_cost_usd``.
"""
block = ClaudeCodeBlock()
captured: list[NodeExecutionStats] = []
with patch.object(block, "merge_stats", side_effect=captured.append):
block._record_cli_cost(
{
"result": "hello from claude",
"total_cost_usd": 0.0421,
"usage": {"input_tokens": 1234, "output_tokens": 56},
}
)
assert len(captured) == 1
stats = captured[0]
assert stats.provider_cost == pytest.approx(0.0421)
assert stats.provider_cost_type == "cost_usd"
def test_record_cli_cost_skips_merge_when_total_cost_absent():
"""If the CLI payload lacks ``total_cost_usd`` (legacy / non-JSON
output), ``_record_cli_cost`` must not call ``merge_stats`` — otherwise
we'd pollute telemetry with a ``cost_usd`` emission that has no real
cost attached.
"""
block = ClaudeCodeBlock()
mock = MagicMock()
with patch.object(block, "merge_stats", mock):
block._record_cli_cost({"result": "hello"})
mock.assert_not_called()

View File

@@ -151,6 +151,17 @@ class CodeGenerationBlock(Block):
)
self.execution_stats = NodeExecutionStats()
# GPT-5.1-Codex published pricing: $1.25 / 1M input, $10 / 1M output.
_INPUT_USD_PER_1M = 1.25
_OUTPUT_USD_PER_1M = 10.0
@staticmethod
def _compute_token_usd(input_tokens: int, output_tokens: int) -> float:
return (
input_tokens * CodeGenerationBlock._INPUT_USD_PER_1M
+ output_tokens * CodeGenerationBlock._OUTPUT_USD_PER_1M
) / 1_000_000
async def call_codex(
self,
*,
@@ -189,13 +200,15 @@ class CodeGenerationBlock(Block):
response_id = response.id or ""
# Update usage stats
self.execution_stats.input_token_count = (
response.usage.input_tokens if response.usage else 0
)
self.execution_stats.output_token_count = (
response.usage.output_tokens if response.usage else 0
)
input_tokens = response.usage.input_tokens if response.usage else 0
output_tokens = response.usage.output_tokens if response.usage else 0
self.execution_stats.input_token_count = input_tokens
self.execution_stats.output_token_count = output_tokens
self.execution_stats.llm_call_count += 1
self.execution_stats.provider_cost = self._compute_token_usd(
input_tokens, output_tokens
)
self.execution_stats.provider_cost_type = "cost_usd"
return CodexCallResult(
response=text_output,

View File

@@ -0,0 +1,10 @@
"""Provider registration for Compass — metadata only (auth lives elsewhere)."""
from backend.sdk import ProviderBuilder
compass = (
ProviderBuilder("compass")
.with_description("Geospatial context for agents")
.with_supported_auth_types("api_key")
.build()
)

View File

@@ -0,0 +1,226 @@
"""Coverage tests for the cost-leak fixes in this PR.
Each block's ``run()`` / helper emits provider_cost + cost_usd (or items)
via merge_stats so the post-flight resolver bills real provider spend.
Tests here drive that emission path directly so a regression on any one
block surfaces immediately.
"""
from unittest.mock import patch
import pytest
from pydantic import SecretStr
from backend.blocks._base import BlockCostType
from backend.blocks.ai_condition import AIConditionBlock
from backend.data.block_cost_config import BLOCK_COSTS, LLM_COST
from backend.data.model import APIKeyCredentials, NodeExecutionStats
# -------- AIConditionBlock registration --------
def test_ai_condition_registered_under_llm_cost():
"""AIConditionBlock was running wallet-free before this PR; verify it
now resolves through the same per-model LLM_COST table as every other
LLM block.
"""
assert BLOCK_COSTS[AIConditionBlock] is LLM_COST
# -------- Pinecone insert ITEMS emission --------
@pytest.mark.asyncio
async def test_pinecone_insert_emits_items_provider_cost():
from backend.blocks.pinecone import PineconeInsertBlock
block = PineconeInsertBlock()
captured: list[NodeExecutionStats] = []
class _FakeIndex:
def upsert(self, **_):
return None
class _FakePinecone:
def __init__(self, *_, **__):
pass
def Index(self, _name):
return _FakeIndex()
with (
patch("backend.blocks.pinecone.Pinecone", _FakePinecone),
patch.object(block, "merge_stats", side_effect=captured.append),
):
input_data = block.input_schema(
credentials={
"id": "00000000-0000-0000-0000-000000000000",
"provider": "pinecone",
"type": "api_key",
},
index="my-index",
chunks=["alpha", "beta", "gamma"],
embeddings=[[0.1] * 4, [0.2] * 4, [0.3] * 4],
namespace="",
metadata={},
)
creds = APIKeyCredentials(
id="00000000-0000-0000-0000-000000000000",
provider="pinecone",
title="mock",
api_key=SecretStr("mock-key"),
expires_at=None,
)
outputs = [(n, v) async for n, v in block.run(input_data, credentials=creds)]
assert any(name == "upsert_response" for name, _ in outputs)
assert len(captured) == 1
stats = captured[0]
assert stats.provider_cost == pytest.approx(3.0)
assert stats.provider_cost_type == "items"
# -------- Narration model-aware per-char rate --------
@pytest.mark.parametrize(
"model_id, expected_rate_per_char",
[
("eleven_flash_v2_5", 0.000167 * 0.5),
("eleven_turbo_v2_5", 0.000167 * 0.5),
("eleven_multilingual_v2", 0.000167 * 1.0),
("eleven_turbo_v2", 0.000167 * 1.0),
],
)
def test_narration_per_char_rate_scales_with_model(model_id, expected_rate_per_char):
"""Drive VideoNarrationBlock._record_script_cost directly so a regression
that drops the model-aware branching (e.g. hardcoding 1.0 cr/char for
all models) makes this test fail.
"""
from backend.blocks.video.narration import VideoNarrationBlock
block = VideoNarrationBlock()
captured: list[NodeExecutionStats] = []
with patch.object(block, "merge_stats", side_effect=captured.append):
block._record_script_cost("x" * 5000, model_id)
assert len(captured) == 1
stats = captured[0]
assert stats.provider_cost == pytest.approx(5000 * expected_rate_per_char)
assert stats.provider_cost_type == "cost_usd"
# -------- Perplexity None-guard on x-total-cost --------
@pytest.mark.parametrize(
"openrouter_cost, expect_type",
[
(0.0421, "cost_usd"), # concrete positive USD → tagged
(None, None), # header missing → no tag (keeps gap observable)
(0.0, None), # zero → no tag (wouldn't bill anything anyway)
],
)
def test_perplexity_record_openrouter_cost_tags_only_on_concrete_value(
openrouter_cost, expect_type
):
"""Drive PerplexityBlock._record_openrouter_cost directly to verify the
None/0 guard. A regression that tags cost_usd unconditionally would
silently floor the user's bill to 0 via the resolver — this test
would catch it.
"""
from backend.blocks.perplexity import PerplexityBlock
block = PerplexityBlock()
with patch(
"backend.blocks.perplexity.extract_openrouter_cost",
return_value=openrouter_cost,
):
block._record_openrouter_cost(response=object())
assert block.execution_stats.provider_cost == openrouter_cost
assert block.execution_stats.provider_cost_type == expect_type
# -------- Codex COST_USD registration --------
def test_codex_registered_as_cost_usd_150():
from backend.blocks.codex import CodeGenerationBlock
entries = BLOCK_COSTS[CodeGenerationBlock]
assert len(entries) == 1
entry = entries[0]
assert entry.cost_type == BlockCostType.COST_USD
assert entry.cost_amount == 150
@pytest.mark.parametrize(
"input_tokens, output_tokens, expected_usd",
[
# GPT-5.1-Codex: $1.25 / 1M input, $10 / 1M output.
(1_000_000, 0, 1.25),
(0, 1_000_000, 10.0),
(100_000, 10_000, 0.225), # 0.125 + 0.100
(0, 0, 0.0),
],
)
def test_codex_computes_provider_cost_usd_from_token_counts(
input_tokens, output_tokens, expected_usd
):
"""Drive CodeGenerationBlock._compute_token_usd directly. A regression
to the wrong rate constants (e.g. swapping the $1.25 input rate for
GPT-4o's $2.50) would fail this test.
"""
from backend.blocks.codex import CodeGenerationBlock
assert CodeGenerationBlock._compute_token_usd(
input_tokens, output_tokens
) == pytest.approx(expected_usd)
# -------- ClaudeCode COST_USD registration sanity (already tested in claude_code_cost_test.py) --------
# -------- Perplexity COST_USD registration for all 3 tiers --------
def test_perplexity_sonar_all_tiers_registered_as_cost_usd_150():
from backend.blocks.perplexity import PerplexityBlock
entries = BLOCK_COSTS[PerplexityBlock]
# 3 tiers (SONAR, SONAR_PRO, SONAR_DEEP_RESEARCH) all COST_USD 150.
assert len(entries) == 3
for entry in entries:
assert entry.cost_type == BlockCostType.COST_USD
assert entry.cost_amount == 150
# -------- Narration COST_USD registration --------
def test_narration_registered_as_cost_usd_150():
from backend.blocks.video.narration import VideoNarrationBlock
entries = BLOCK_COSTS[VideoNarrationBlock]
assert len(entries) == 1
assert entries[0].cost_type == BlockCostType.COST_USD
assert entries[0].cost_amount == 150
# -------- Pinecone registrations --------
def test_pinecone_registrations():
from backend.blocks.pinecone import (
PineconeInitBlock,
PineconeInsertBlock,
PineconeQueryBlock,
)
assert BLOCK_COSTS[PineconeInitBlock][0].cost_type == BlockCostType.RUN
assert BLOCK_COSTS[PineconeQueryBlock][0].cost_type == BlockCostType.RUN
# Insert scales with item count.
assert BLOCK_COSTS[PineconeInsertBlock][0].cost_type == BlockCostType.ITEMS
assert BLOCK_COSTS[PineconeInsertBlock][0].cost_amount == 1

View File

@@ -19,6 +19,10 @@ class DataForSeoClient:
trusted_origins=["https://api.dataforseo.com"],
raise_for_status=False,
)
# USD cost reported by DataForSEO on the most recent successful call.
# Populated by keyword_suggestions / related_keywords so the caller
# can surface it via NodeExecutionStats.provider_cost for billing.
self.last_cost_usd: float = 0.0
def _get_headers(self) -> Dict[str, str]:
"""Generate the authorization header using Basic Auth."""
@@ -97,6 +101,9 @@ class DataForSeoClient:
if data.get("tasks") and len(data["tasks"]) > 0:
task = data["tasks"][0]
if task.get("status_code") == 20000: # Success code
# DataForSEO reports per-task USD cost; stash it so callers
# can populate NodeExecutionStats.provider_cost.
self.last_cost_usd = float(task.get("cost") or 0.0)
return task.get("result", [])
else:
error_msg = task.get("status_message", "Task failed")
@@ -174,6 +181,9 @@ class DataForSeoClient:
if data.get("tasks") and len(data["tasks"]) > 0:
task = data["tasks"][0]
if task.get("status_code") == 20000: # Success code
# DataForSEO reports per-task USD cost; stash it so callers
# can populate NodeExecutionStats.provider_cost.
self.last_cost_usd = float(task.get("cost") or 0.0)
return task.get("result", [])
else:
error_msg = task.get("status_message", "Task failed")

View File

@@ -7,11 +7,17 @@ from backend.sdk import BlockCostType, ProviderBuilder
# Build the DataForSEO provider with username/password authentication
dataforseo = (
ProviderBuilder("dataforseo")
.with_description("SEO and SERP data")
.with_user_password(
username_env_var="DATAFORSEO_USERNAME",
password_env_var="DATAFORSEO_PASSWORD",
title="DataForSEO Credentials",
)
.with_base_cost(1, BlockCostType.RUN)
# DataForSEO reports USD cost per task (e.g. $0.001/keyword returned).
# DataForSeoClient stashes it on last_cost_usd; each block emits it via
# merge_stats so the COST_USD resolver bills against real spend.
# 1000 platform credits per USD → 1 credit per $0.001 (≈ 1 credit/
# returned keyword on the standard tier).
.with_base_cost(1000, BlockCostType.COST_USD)
.build()
)

View File

@@ -4,6 +4,7 @@ DataForSEO Google Keyword Suggestions block.
from typing import Any, Dict, List, Optional
from backend.data.model import NodeExecutionStats
from backend.sdk import (
Block,
BlockCategory,
@@ -110,8 +111,10 @@ class DataForSeoKeywordSuggestionsBlock(Block):
test_output=[
(
"suggestion",
lambda x: hasattr(x, "keyword")
and x.keyword == "digital marketing strategy",
lambda x: (
hasattr(x, "keyword")
and x.keyword == "digital marketing strategy"
),
),
("suggestions", lambda x: isinstance(x, list) and len(x) == 1),
("total_count", 1),
@@ -167,6 +170,16 @@ class DataForSeoKeywordSuggestionsBlock(Block):
results = await self._fetch_keyword_suggestions(client, input_data)
# DataForSEO reports per-task USD cost on the response. Feed it
# into NodeExecutionStats so the COST_USD resolver bills the
# real provider spend at reconciliation time.
self.merge_stats(
NodeExecutionStats(
provider_cost=client.last_cost_usd,
provider_cost_type="cost_usd",
)
)
# Process and format the results
suggestions = []
if results and len(results) > 0:

View File

@@ -4,6 +4,7 @@ DataForSEO Google Related Keywords block.
from typing import Any, Dict, List, Optional
from backend.data.model import NodeExecutionStats
from backend.sdk import (
Block,
BlockCategory,
@@ -177,6 +178,16 @@ class DataForSeoRelatedKeywordsBlock(Block):
results = await self._fetch_related_keywords(client, input_data)
# DataForSEO reports per-task USD cost on the response. Feed it
# into NodeExecutionStats so the COST_USD resolver bills the
# real provider spend at reconciliation time.
self.merge_stats(
NodeExecutionStats(
provider_cost=client.last_cost_usd,
provider_cost_type="cost_usd",
)
)
# Process and format the results
related_keywords = []
if results and len(results) > 0:

View File

@@ -0,0 +1,10 @@
"""Provider registration for Discord — metadata only (auth lives in ``_auth.py``)."""
from backend.sdk import ProviderBuilder
discord = (
ProviderBuilder("discord")
.with_description("Messages, channels, and servers")
.with_supported_auth_types("api_key", "oauth2")
.build()
)

View File

@@ -0,0 +1,10 @@
"""Provider registration for ElevenLabs — metadata only (auth lives in ``_auth.py``)."""
from backend.sdk import ProviderBuilder
elevenlabs = (
ProviderBuilder("elevenlabs")
.with_description("Realistic AI voice synthesis")
.with_supported_auth_types("api_key")
.build()
)

View File

@@ -0,0 +1,10 @@
"""Provider registration for Enrichlayer — metadata only (auth lives in ``_auth.py``)."""
from backend.sdk import ProviderBuilder
enrichlayer = (
ProviderBuilder("enrichlayer")
.with_description("Enrich leads with company data")
.with_supported_auth_types("api_key")
.build()
)

View File

@@ -9,8 +9,14 @@ from ._webhook import ExaWebhookManager
# Configure the Exa provider once for all blocks
exa = (
ProviderBuilder("exa")
.with_description("Neural web search")
.with_api_key("EXA_API_KEY", "Exa API Key")
.with_webhook_manager(ExaWebhookManager)
.with_base_cost(1, BlockCostType.RUN)
# Exa returns `cost_dollars.total` on every response and ExaSearchBlock
# (plus ~45 sibling blocks that share this provider config) already
# populates NodeExecutionStats.provider_cost with it. Bill 100 credits
# per USD (~$0.01/credit): cheap searches stay at 12 credits, a Deep
# Research run at $0.20 lands at 20 credits, matching provider spend.
.with_base_cost(100, BlockCostType.COST_USD)
.build()
)

View File

@@ -17,6 +17,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
class AnswerCitation(BaseModel):
@@ -111,3 +112,7 @@ class ExaAnswerBlock(Block):
yield "citations", citations
for citation in citations:
yield "citation", citation
# Current SDK AnswerResponse dataclass omits cost_dollars; helper
# no-ops today, but keeps billing wired when exa_py adds the field.
merge_exa_cost(self, response)

View File

@@ -9,7 +9,6 @@ from typing import Union
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -23,6 +22,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
class CodeContextResponse(BaseModel):
@@ -118,9 +118,5 @@ class ExaCodeContextBlock(Block):
yield "search_time", context.search_time
yield "output_tokens", context.output_tokens
# Parse cost_dollars (API returns as string, e.g. "0.005")
try:
cost_usd = float(context.cost_dollars)
self.merge_stats(NodeExecutionStats(provider_cost=cost_usd))
except (ValueError, TypeError):
pass
# API returns costDollars as a bare numeric string like "0.005".
merge_exa_cost(self, data)

View File

@@ -4,7 +4,6 @@ from typing import Optional
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -24,6 +23,7 @@ from .helpers import (
HighlightSettings,
LivecrawlTypes,
SummarySettings,
merge_exa_cost,
)
@@ -224,6 +224,4 @@ class ExaContentsBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)
merge_exa_cost(self, response)

View File

@@ -143,7 +143,9 @@ class TestExaContentsCostTracking:
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
block.Input(
urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT
), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
@@ -172,7 +174,9 @@ class TestExaContentsCostTracking:
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
block.Input(
urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT
), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
@@ -201,7 +205,9 @@ class TestExaContentsCostTracking:
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
block.Input(
urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT
), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
@@ -297,7 +303,9 @@ class TestExaSimilarCostTracking:
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
block.Input(
url="https://example.com", credentials=TEST_CREDENTIALS_INPUT
), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
@@ -326,7 +334,9 @@ class TestExaSimilarCostTracking:
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
block.Input(
url="https://example.com", credentials=TEST_CREDENTIALS_INPUT
), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass

View File

@@ -1,7 +1,8 @@
from enum import Enum
from typing import Any, Dict, Literal, Optional, Union
from backend.sdk import BaseModel, MediaFileType, SchemaField
from backend.data.model import NodeExecutionStats
from backend.sdk import BaseModel, Block, MediaFileType, SchemaField
class LivecrawlTypes(str, Enum):
@@ -319,7 +320,7 @@ class CostDollars(BaseModel):
# Helper functions for payload processing
def process_text_field(
text: Union[bool, TextEnabled, TextDisabled, TextAdvanced, None]
text: Union[bool, TextEnabled, TextDisabled, TextAdvanced, None],
) -> Optional[Union[bool, Dict[str, Any]]]:
"""Process text field for API payload."""
if text is None:
@@ -400,7 +401,7 @@ def process_contents_settings(contents: Optional[ContentSettings]) -> Dict[str,
def process_context_field(
context: Union[bool, dict, ContextEnabled, ContextDisabled, ContextAdvanced, None]
context: Union[bool, dict, ContextEnabled, ContextDisabled, ContextAdvanced, None],
) -> Optional[Union[bool, Dict[str, int]]]:
"""Process context field for API payload."""
if context is None:
@@ -448,3 +449,65 @@ def add_optional_fields(
payload[api_field] = value.value
else:
payload[api_field] = value
def extract_exa_cost_usd(response: Any) -> Optional[float]:
"""Return ``cost_dollars.total`` (USD) from an Exa SDK response, or None.
Handles dataclass/pydantic responses (``response.cost_dollars.total``),
dicts with camelCase keys (``response["costDollars"]["total"]``), dicts
with snake_case keys, and bare numeric strings. Returns None whenever the
shape is missing cost info — the caller then skips merge_stats.
"""
if response is None:
return None
# Dataclass / pydantic: response.cost_dollars
cost_obj = getattr(response, "cost_dollars", None)
# Dict payloads: try both camelCase and snake_case
if cost_obj is None and isinstance(response, dict):
cost_obj = response.get("costDollars") or response.get("cost_dollars")
if cost_obj is None:
return None
# Already a scalar (code_context endpoint returns a string)
if isinstance(cost_obj, (int, float)):
return max(0.0, float(cost_obj))
if isinstance(cost_obj, str):
try:
return max(0.0, float(cost_obj))
except ValueError:
return None
# Nested object/dict: grab the `total` field
total = getattr(cost_obj, "total", None)
if total is None and isinstance(cost_obj, dict):
total = cost_obj.get("total")
if total is None:
return None
try:
return max(0.0, float(total))
except (TypeError, ValueError):
return None
def merge_exa_cost(block: Block, response: Any) -> None:
"""Pull ``cost_dollars.total`` off an Exa response and merge it into stats.
No-op when the response shape has no cost info (e.g. webset CRUD where
the SDK does not expose per-call pricing) — emission happens only when
Exa actually reports a USD amount.
"""
cost_usd = extract_exa_cost_usd(response)
if cost_usd is None:
return
block.merge_stats(
NodeExecutionStats(
provider_cost=cost_usd,
provider_cost_type="cost_usd",
)
)

View File

@@ -0,0 +1,65 @@
"""Unit tests for exa/helpers cost-extraction + merge helpers."""
from types import SimpleNamespace
from unittest.mock import MagicMock
import pytest
from backend.blocks.exa.helpers import extract_exa_cost_usd, merge_exa_cost
from backend.data.model import NodeExecutionStats
@pytest.mark.parametrize(
"response, expected",
[
# Dataclass / SimpleNamespace with cost_dollars.total
(SimpleNamespace(cost_dollars=SimpleNamespace(total=0.05)), 0.05),
# Dict camelCase
({"costDollars": {"total": 0.10}}, 0.10),
# Dict snake_case
({"cost_dollars": {"total": 0.07}}, 0.07),
# code_context endpoint shape: plain numeric string
(SimpleNamespace(cost_dollars="0.005"), 0.005),
# Scalar float on cost_dollars directly
(SimpleNamespace(cost_dollars=0.02), 0.02),
# Scalar int on cost_dollars
(SimpleNamespace(cost_dollars=3), 3.0),
# Missing cost info — returns None
({}, None),
(SimpleNamespace(other="foo"), None),
(None, None),
# Nested total=None
(SimpleNamespace(cost_dollars=SimpleNamespace(total=None)), None),
# Invalid numeric string
(SimpleNamespace(cost_dollars="not-a-number"), None),
# Negative values clamp to 0
(SimpleNamespace(cost_dollars=SimpleNamespace(total=-1.0)), 0.0),
],
)
def test_extract_exa_cost_usd_handles_all_shapes(response, expected):
assert extract_exa_cost_usd(response) == expected
def test_merge_exa_cost_emits_stats_when_cost_present():
block = MagicMock()
response = SimpleNamespace(cost_dollars=SimpleNamespace(total=0.0421))
merge_exa_cost(block, response)
block.merge_stats.assert_called_once()
stats: NodeExecutionStats = block.merge_stats.call_args.args[0]
assert stats.provider_cost == pytest.approx(0.0421)
assert stats.provider_cost_type == "cost_usd"
def test_merge_exa_cost_noops_when_no_cost():
"""Webset CRUD endpoints don't surface cost_dollars today — the helper
must silently skip instead of emitting a 0-cost telemetry record."""
block = MagicMock()
merge_exa_cost(block, SimpleNamespace(other_field="nothing"))
block.merge_stats.assert_not_called()
def test_merge_exa_cost_noops_when_response_is_none():
block = MagicMock()
merge_exa_cost(block, None)
block.merge_stats.assert_not_called()

View File

@@ -12,7 +12,6 @@ from typing import Any, Dict, List, Optional
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -26,6 +25,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
class ResearchModel(str, Enum):
@@ -233,11 +233,7 @@ class ExaCreateResearchBlock(Block):
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
self.merge_stats(
NodeExecutionStats(
provider_cost=research.cost_dollars.total
)
)
merge_exa_cost(self, research)
return
await asyncio.sleep(check_interval)
@@ -352,9 +348,7 @@ class ExaGetResearchBlock(Block):
yield "cost_searches", research.cost_dollars.num_searches
yield "cost_pages", research.cost_dollars.num_pages
yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
self.merge_stats(
NodeExecutionStats(provider_cost=research.cost_dollars.total)
)
merge_exa_cost(self, research)
yield "error_message", research.error
@@ -441,9 +435,7 @@ class ExaWaitForResearchBlock(Block):
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
self.merge_stats(
NodeExecutionStats(provider_cost=research.cost_dollars.total)
)
merge_exa_cost(self, research)
return

View File

@@ -4,7 +4,6 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -21,6 +20,7 @@ from .helpers import (
ContentSettings,
CostDollars,
ExaSearchResults,
merge_exa_cost,
process_contents_settings,
)
@@ -207,6 +207,4 @@ class ExaSearchBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)
merge_exa_cost(self, response)

View File

@@ -3,7 +3,6 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -20,6 +19,7 @@ from .helpers import (
ContentSettings,
CostDollars,
ExaSearchResults,
merge_exa_cost,
process_contents_settings,
)
@@ -168,6 +168,4 @@ class ExaFindSimilarBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)
merge_exa_cost(self, response)

View File

@@ -39,6 +39,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
class SearchEntityType(str, Enum):
@@ -394,6 +395,7 @@ class ExaCreateWebsetBlock(Block):
metadata=input_data.metadata,
)
)
merge_exa_cost(self, webset)
webset_result = Webset.model_validate(webset.model_dump(by_alias=True))
@@ -404,6 +406,7 @@ class ExaCreateWebsetBlock(Block):
timeout=input_data.polling_timeout,
poll_interval=5,
)
merge_exa_cost(self, final_webset)
completion_time = time.time() - start_time
item_count = 0
@@ -479,6 +482,7 @@ class ExaCreateOrFindWebsetBlock(Block):
try:
webset = await aexa.websets.get(id=input_data.external_id)
merge_exa_cost(self, webset)
webset_result = Webset.model_validate(webset.model_dump(by_alias=True))
yield "webset", webset_result
@@ -501,6 +505,7 @@ class ExaCreateOrFindWebsetBlock(Block):
metadata=input_data.metadata,
)
)
merge_exa_cost(self, webset)
webset_result = Webset.model_validate(webset.model_dump(by_alias=True))
@@ -555,6 +560,7 @@ class ExaUpdateWebsetBlock(Block):
payload["metadata"] = input_data.metadata
sdk_webset = await aexa.websets.update(id=input_data.webset_id, params=payload)
merge_exa_cost(self, sdk_webset)
status_str = (
sdk_webset.status.value
@@ -566,8 +572,9 @@ class ExaUpdateWebsetBlock(Block):
yield "status", status_str
yield "external_id", sdk_webset.external_id
yield "metadata", sdk_webset.metadata or {}
yield "updated_at", (
sdk_webset.updated_at.isoformat() if sdk_webset.updated_at else ""
yield (
"updated_at",
(sdk_webset.updated_at.isoformat() if sdk_webset.updated_at else ""),
)
@@ -621,6 +628,7 @@ class ExaListWebsetsBlock(Block):
cursor=input_data.cursor,
limit=input_data.limit,
)
merge_exa_cost(self, response)
websets_data = [
w.model_dump(by_alias=True, exclude_none=True) for w in response.data
@@ -679,6 +687,7 @@ class ExaGetWebsetBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, sdk_webset)
status_str = (
sdk_webset.status.value
@@ -706,11 +715,13 @@ class ExaGetWebsetBlock(Block):
yield "enrichments", enrichments_data
yield "monitors", monitors_data
yield "metadata", sdk_webset.metadata or {}
yield "created_at", (
sdk_webset.created_at.isoformat() if sdk_webset.created_at else ""
yield (
"created_at",
(sdk_webset.created_at.isoformat() if sdk_webset.created_at else ""),
)
yield "updated_at", (
sdk_webset.updated_at.isoformat() if sdk_webset.updated_at else ""
yield (
"updated_at",
(sdk_webset.updated_at.isoformat() if sdk_webset.updated_at else ""),
)
@@ -749,6 +760,7 @@ class ExaDeleteWebsetBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
deleted_webset = await aexa.websets.delete(id=input_data.webset_id)
merge_exa_cost(self, deleted_webset)
status_str = (
deleted_webset.status.value
@@ -799,6 +811,7 @@ class ExaCancelWebsetBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
canceled_webset = await aexa.websets.cancel(id=input_data.webset_id)
merge_exa_cost(self, canceled_webset)
status_str = (
canceled_webset.status.value
@@ -969,6 +982,7 @@ class ExaPreviewWebsetBlock(Block):
payload["entity"] = entity
sdk_preview = await aexa.websets.preview(params=payload)
merge_exa_cost(self, sdk_preview)
preview = PreviewWebsetModel.from_sdk(sdk_preview)
@@ -1052,6 +1066,7 @@ class ExaWebsetStatusBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
status = (
webset.status.value
@@ -1186,6 +1201,7 @@ class ExaWebsetSummaryBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
# Extract basic info
webset_id = webset.id
@@ -1214,6 +1230,7 @@ class ExaWebsetSummaryBlock(Block):
items_response = await aexa.websets.items.list(
webset_id=input_data.webset_id, limit=input_data.sample_size
)
merge_exa_cost(self, items_response)
sample_items_data = [
item.model_dump(by_alias=True, exclude_none=True)
for item in items_response.data
@@ -1363,6 +1380,7 @@ class ExaWebsetReadyCheckBlock(Block):
# Get webset details
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
status = (
webset.status.value

View File

@@ -25,6 +25,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
# Mirrored model for stability
@@ -205,6 +206,7 @@ class ExaCreateEnrichmentBlock(Block):
sdk_enrichment = await aexa.websets.enrichments.create(
webset_id=input_data.webset_id, params=payload
)
merge_exa_cost(self, sdk_enrichment)
enrichment_id = sdk_enrichment.id
status = (
@@ -226,6 +228,7 @@ class ExaCreateEnrichmentBlock(Block):
current_enrich = await aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=enrichment_id
)
merge_exa_cost(self, current_enrich)
current_status = (
current_enrich.status.value
if hasattr(current_enrich.status, "value")
@@ -235,6 +238,7 @@ class ExaCreateEnrichmentBlock(Block):
if current_status in ["completed", "failed", "cancelled"]:
# Estimate items from webset searches
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
if webset.searches:
for search in webset.searches:
if search.progress:
@@ -332,6 +336,7 @@ class ExaGetEnrichmentBlock(Block):
sdk_enrichment = await aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
merge_exa_cost(self, sdk_enrichment)
enrichment = WebsetEnrichmentModel.from_sdk(sdk_enrichment)
@@ -425,6 +430,7 @@ class ExaUpdateEnrichmentBlock(Block):
try:
response = await Requests().patch(url, headers=headers, json=payload)
data = response.json()
# PATCH /websets/{id}/enrichments/{id} doesn't return costDollars.
yield "enrichment_id", data.get("id", "")
yield "status", data.get("status", "")
@@ -477,6 +483,7 @@ class ExaDeleteEnrichmentBlock(Block):
deleted_enrichment = await aexa.websets.enrichments.delete(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
merge_exa_cost(self, deleted_enrichment)
yield "enrichment_id", deleted_enrichment.id
yield "success", "true"
@@ -528,12 +535,14 @@ class ExaCancelEnrichmentBlock(Block):
canceled_enrichment = await aexa.websets.enrichments.cancel(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
merge_exa_cost(self, canceled_enrichment)
# Try to estimate how many items were enriched before cancellation
items_enriched = 0
items_response = await aexa.websets.items.list(
webset_id=input_data.webset_id, limit=100
)
merge_exa_cost(self, items_response)
for sdk_item in items_response.data:
# Check if this enrichment is present

View File

@@ -29,6 +29,7 @@ from backend.sdk import (
from ._config import exa
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
from .helpers import merge_exa_cost
# Mirrored model for stability - don't use SDK types directly in block outputs
@@ -297,6 +298,7 @@ class ExaCreateImportBlock(Block):
sdk_import = await aexa.websets.imports.create(
params=payload, csv_data=input_data.csv_data
)
merge_exa_cost(self, sdk_import)
import_obj = ImportModel.from_sdk(sdk_import)
@@ -361,6 +363,7 @@ class ExaGetImportBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_import = await aexa.websets.imports.get(import_id=input_data.import_id)
merge_exa_cost(self, sdk_import)
import_obj = ImportModel.from_sdk(sdk_import)
@@ -430,6 +433,7 @@ class ExaListImportsBlock(Block):
cursor=input_data.cursor,
limit=input_data.limit,
)
merge_exa_cost(self, response)
# Convert SDK imports to our stable models
imports = [ImportModel.from_sdk(i) for i in response.data]
@@ -477,6 +481,7 @@ class ExaDeleteImportBlock(Block):
deleted_import = await aexa.websets.imports.delete(
import_id=input_data.import_id
)
merge_exa_cost(self, deleted_import)
yield "import_id", deleted_import.id
yield "success", "true"
@@ -599,7 +604,7 @@ class ExaExportWebsetBlock(Block):
try:
all_items = []
# Use SDK's list_all iterator to fetch items
# list_all paginates internally; cost_dollars is not surfaced per-page
item_iterator = aexa.websets.items.list_all(
webset_id=input_data.webset_id, limit=input_data.max_items
)

View File

@@ -30,6 +30,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
# Mirrored model for enrichment results
@@ -181,6 +182,7 @@ class ExaGetWebsetItemBlock(Block):
sdk_item = await aexa.websets.items.get(
webset_id=input_data.webset_id, id=input_data.item_id
)
merge_exa_cost(self, sdk_item)
item = WebsetItemModel.from_sdk(sdk_item)
@@ -293,6 +295,7 @@ class ExaListWebsetItemsBlock(Block):
cursor=input_data.cursor,
limit=input_data.limit,
)
merge_exa_cost(self, response)
items = [WebsetItemModel.from_sdk(item) for item in response.data]
@@ -343,6 +346,7 @@ class ExaDeleteWebsetItemBlock(Block):
deleted_item = await aexa.websets.items.delete(
webset_id=input_data.webset_id, id=input_data.item_id
)
merge_exa_cost(self, deleted_item)
yield "item_id", deleted_item.id
yield "success", "true"
@@ -404,6 +408,7 @@ class ExaBulkWebsetItemsBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
all_items: List[WebsetItemModel] = []
# list_all paginates internally; cost_dollars is not surfaced per-page
item_iterator = aexa.websets.items.list_all(
webset_id=input_data.webset_id, limit=input_data.max_items
)
@@ -476,6 +481,7 @@ class ExaWebsetItemsSummaryBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
entity_type = "unknown"
if webset.searches:
@@ -498,6 +504,7 @@ class ExaWebsetItemsSummaryBlock(Block):
items_response = await aexa.websets.items.list(
webset_id=input_data.webset_id, limit=input_data.sample_size
)
merge_exa_cost(self, items_response)
# Convert to our stable models
sample_items = [
WebsetItemModel.from_sdk(item) for item in items_response.data
@@ -574,6 +581,7 @@ class ExaGetNewItemsBlock(Block):
cursor=input_data.since_cursor,
limit=input_data.max_items,
)
merge_exa_cost(self, response)
# Convert SDK items to our stable models
new_items = [WebsetItemModel.from_sdk(item) for item in response.data]

View File

@@ -25,6 +25,7 @@ from backend.sdk import (
from ._config import exa
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
from .helpers import merge_exa_cost
# Mirrored model for stability - don't use SDK types directly in block outputs
@@ -321,6 +322,7 @@ class ExaCreateMonitorBlock(Block):
payload["metadata"] = input_data.metadata
sdk_monitor = await aexa.websets.monitors.create(params=payload)
merge_exa_cost(self, sdk_monitor)
monitor = MonitorModel.from_sdk(sdk_monitor)
@@ -385,6 +387,7 @@ class ExaGetMonitorBlock(Block):
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_monitor = await aexa.websets.monitors.get(monitor_id=input_data.monitor_id)
merge_exa_cost(self, sdk_monitor)
monitor = MonitorModel.from_sdk(sdk_monitor)
@@ -479,6 +482,7 @@ class ExaUpdateMonitorBlock(Block):
sdk_monitor = await aexa.websets.monitors.update(
monitor_id=input_data.monitor_id, params=payload
)
merge_exa_cost(self, sdk_monitor)
# Convert to our stable model
monitor = MonitorModel.from_sdk(sdk_monitor)
@@ -525,6 +529,7 @@ class ExaDeleteMonitorBlock(Block):
deleted_monitor = await aexa.websets.monitors.delete(
monitor_id=input_data.monitor_id
)
merge_exa_cost(self, deleted_monitor)
yield "monitor_id", deleted_monitor.id
yield "success", "true"
@@ -586,6 +591,7 @@ class ExaListMonitorsBlock(Block):
limit=input_data.limit,
webset_id=input_data.webset_id,
)
merge_exa_cost(self, response)
# Convert SDK monitors to our stable models
monitors = [MonitorModel.from_sdk(m) for m in response.data]

View File

@@ -25,6 +25,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
# Import WebsetItemModel for use in enrichment samples
# This is safe as websets_items doesn't import from websets_polling
@@ -126,6 +127,7 @@ class ExaWaitForWebsetBlock(Block):
timeout=input_data.timeout,
poll_interval=input_data.check_interval,
)
merge_exa_cost(self, final_webset)
elapsed = time.time() - start_time
@@ -165,6 +167,7 @@ class ExaWaitForWebsetBlock(Block):
while time.time() - start_time < input_data.timeout:
# Get current webset status
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
current_status = (
webset.status.value
if hasattr(webset.status, "value")
@@ -210,6 +213,7 @@ class ExaWaitForWebsetBlock(Block):
# Timeout reached
elapsed = time.time() - start_time
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
final_status = (
webset.status.value
if hasattr(webset.status, "value")
@@ -348,6 +352,7 @@ class ExaWaitForSearchBlock(Block):
search = await aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
merge_exa_cost(self, search)
# Extract status
status = (
@@ -404,6 +409,7 @@ class ExaWaitForSearchBlock(Block):
search = await aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
merge_exa_cost(self, search)
final_status = (
search.status.value
if hasattr(search.status, "value")
@@ -506,6 +512,7 @@ class ExaWaitForEnrichmentBlock(Block):
enrichment = await aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
merge_exa_cost(self, enrichment)
# Extract status
status = (
@@ -523,16 +530,20 @@ class ExaWaitForEnrichmentBlock(Block):
items_enriched = 0
if input_data.sample_results and status == "completed":
sample_data, items_enriched = (
await self._get_sample_enrichments(
input_data.webset_id, input_data.enrichment_id, aexa
)
(
sample_data,
items_enriched,
) = await self._get_sample_enrichments(
input_data.webset_id, input_data.enrichment_id, aexa
)
yield "enrichment_id", input_data.enrichment_id
yield "final_status", status
yield "items_enriched", items_enriched
yield "enrichment_title", enrichment.title or enrichment.description or ""
yield (
"enrichment_title",
enrichment.title or enrichment.description or "",
)
yield "elapsed_time", elapsed
if input_data.sample_results:
yield "sample_data", sample_data
@@ -551,6 +562,7 @@ class ExaWaitForEnrichmentBlock(Block):
enrichment = await aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
merge_exa_cost(self, enrichment)
final_status = (
enrichment.status.value
if hasattr(enrichment.status, "value")
@@ -576,6 +588,7 @@ class ExaWaitForEnrichmentBlock(Block):
"""Get sample enriched data and count."""
# Get a few items to see enrichment results using SDK
response = await aexa.websets.items.list(webset_id=webset_id, limit=5)
merge_exa_cost(self, response)
sample_data: list[SampleEnrichmentModel] = []
enriched_count = 0

View File

@@ -24,6 +24,7 @@ from backend.sdk import (
)
from ._config import exa
from .helpers import merge_exa_cost
# Mirrored model for stability
@@ -320,6 +321,7 @@ class ExaCreateWebsetSearchBlock(Block):
sdk_search = await aexa.websets.searches.create(
webset_id=input_data.webset_id, params=payload
)
merge_exa_cost(self, sdk_search)
search_id = sdk_search.id
status = (
@@ -353,6 +355,7 @@ class ExaCreateWebsetSearchBlock(Block):
current_search = await aexa.websets.searches.get(
webset_id=input_data.webset_id, id=search_id
)
merge_exa_cost(self, current_search)
current_status = (
current_search.status.value
if hasattr(current_search.status, "value")
@@ -445,6 +448,7 @@ class ExaGetWebsetSearchBlock(Block):
sdk_search = await aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
merge_exa_cost(self, sdk_search)
search = WebsetSearchModel.from_sdk(sdk_search)
@@ -526,6 +530,7 @@ class ExaCancelWebsetSearchBlock(Block):
canceled_search = await aexa.websets.searches.cancel(
webset_id=input_data.webset_id, id=input_data.search_id
)
merge_exa_cost(self, canceled_search)
# Extract items found before cancellation
items_found = 0
@@ -605,6 +610,7 @@ class ExaFindOrCreateSearchBlock(Block):
# Get webset to check existing searches
webset = await aexa.websets.get(id=input_data.webset_id)
merge_exa_cost(self, webset)
# Look for existing search with same query
existing_search = None
@@ -639,6 +645,7 @@ class ExaFindOrCreateSearchBlock(Block):
sdk_search = await aexa.websets.searches.create(
webset_id=input_data.webset_id, params=payload
)
merge_exa_cost(self, sdk_search)
search = WebsetSearchModel.from_sdk(sdk_search)

View File

@@ -0,0 +1,10 @@
"""Provider registration for fal — metadata only (auth lives in ``_auth.py``)."""
from backend.sdk import ProviderBuilder
fal = (
ProviderBuilder("fal")
.with_description("Hosted model inference")
.with_supported_auth_types("api_key")
.build()
)

View File

@@ -1,8 +1,15 @@
from backend.sdk import BlockCostType, ProviderBuilder
# Firecrawl bills in its own credits (1 credit ≈ $0.001). Each block's
# run() estimates USD spend from the operation (pages scraped, limit,
# credits_used on ExtractResponse) and merge_stats populates
# NodeExecutionStats.provider_cost before billing reconciliation. 1000
# platform credits per USD means 1 platform credit per Firecrawl credit
# — roughly matches our existing per-call tier for single-page scrape.
firecrawl = (
ProviderBuilder("firecrawl")
.with_description("Web scraping and crawling")
.with_api_key("FIRECRAWL_API_KEY", "Firecrawl API Key")
.with_base_cost(1, BlockCostType.RUN)
.with_base_cost(1000, BlockCostType.COST_USD)
.build()
)

View File

@@ -4,6 +4,7 @@ from firecrawl import FirecrawlApp
from firecrawl.v2.types import ScrapeOptions
from backend.blocks.firecrawl._api import ScrapeFormat
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -86,6 +87,14 @@ class FirecrawlCrawlBlock(Block):
wait_for=input_data.wait_for,
),
)
# Firecrawl bills 1 credit (~$0.001) per crawled page. crawl_result.data
# is the list of scraped pages actually returned.
pages = len(crawl_result.data) if crawl_result.data else 0
self.merge_stats(
NodeExecutionStats(
provider_cost=pages * 0.001, provider_cost_type="cost_usd"
)
)
yield "data", crawl_result.data
for data in crawl_result.data:

View File

@@ -2,25 +2,22 @@ from typing import Any
from firecrawl import FirecrawlApp
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockCost,
BlockCostType,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
cost,
)
from backend.util.exceptions import BlockExecutionError
from ._config import firecrawl
@cost(BlockCost(2, BlockCostType.RUN))
class FirecrawlExtractBlock(Block):
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = firecrawl.credentials_field()
@@ -74,4 +71,13 @@ class FirecrawlExtractBlock(Block):
block_id=self.id,
) from e
# Firecrawl surfaces actual credit spend on extract responses
# (credits_used). 1 Firecrawl credit ≈ $0.001.
credits_used = getattr(extract_result, "credits_used", None) or 0
self.merge_stats(
NodeExecutionStats(
provider_cost=credits_used * 0.001,
provider_cost_type="cost_usd",
)
)
yield "data", extract_result.data

View File

@@ -2,6 +2,7 @@ from typing import Any
from firecrawl import FirecrawlApp
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -50,6 +51,10 @@ class FirecrawlMapWebsiteBlock(Block):
map_result = app.map(
url=input_data.url,
)
# Firecrawl bills 1 credit (~$0.001) per map request.
self.merge_stats(
NodeExecutionStats(provider_cost=0.001, provider_cost_type="cost_usd")
)
# Convert SearchResult objects to dicts
results_data = [

View File

@@ -3,6 +3,7 @@ from typing import Any
from firecrawl import FirecrawlApp
from backend.blocks.firecrawl._api import ScrapeFormat
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -81,6 +82,11 @@ class FirecrawlScrapeBlock(Block):
max_age=input_data.max_age,
wait_for=input_data.wait_for,
)
# Firecrawl bills 1 credit (~$0.001) per scraped page; scrape is a
# single-page operation.
self.merge_stats(
NodeExecutionStats(provider_cost=0.001, provider_cost_type="cost_usd")
)
yield "data", scrape_result
for f in input_data.formats:

Some files were not shown because too many files have changed in this diff Show More