mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-30 03:00:41 -04:00
d3954e2bbd533a6ded12df20fd81f56e9529adc7
8498 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d3954e2bbd | Merge branch 'dev' into branch10 | ||
|
|
c93dac9043 |
fix(backend/copilot): self-recovery hint on storage-full + close TOCTOU rollback blob leak
WriteWorkspaceFileTool: when write_file rejects with "Storage limit exceeded", append a recovery hint pointing at list_workspace_files + delete_workspace_file. The agent reads this on the next turn and can offer to clean up before falling back to "ask the user to upgrade", so the user doesn't have to leave the chat. Upload route TOCTOU rollback: route the over-quota cleanup through WorkspaceManager.delete_file instead of soft_delete_workspace_file. The latter only flips isDeleted on the DB row and renames the path — the storage backend blob would survive, leaking storage on every concurrent-upload race. delete_file removes the blob first, then soft-deletes the DB record. |
||
|
|
bb46059d2c |
refactor(backend): drop noisy 80% storage warning + document write_file raises
The 80% warning fired on every write inside the band, producing repeated log entries during a single CoPilot turn that writes multiple files. The frontend storage bar already conveys usage to the user, and ops alerting belongs on a metric, not a log line that scales with write rate. Also document `VirusDetectedError` / `VirusScanError` in `WorkspaceManager.write_file` so callers can decide between explicit handling and generic `Exception` catches. |
||
|
|
4a1741cc15 |
fix(platform): cancel-banner copy + clearer 422 on currency mismatch (#12947)
## Why Two regressions surfaced after [#12933](https://github.com/Significant-Gravitas/AutoGPT/pull/12933) merged to `dev`: 1. **Cancel-pending banner shows wrong copy.** The merged PR moved cancel-at-period-end from `BASIC` → `NO_TIER`, but `PendingChangeBanner.isCancellation` was still keyed on `"BASIC"`. As a result, a user who cancels their sub now sees *"Scheduled to downgrade to No subscription on …"* instead of the intended *"Scheduled to cancel your subscription on …"*. Caught by Sentry on the merged PR. 2. **Currency-mismatch downgrade returns 502 (looks like outage).** A user with an existing GBP-active sub (Max Price has `currency_options.gbp`) tried to downgrade to Pro and got 502. The backend logs show: ``` stripe._error.InvalidRequestError: The price specified only supports `usd`. This doesn't match the expected currency: `gbp`. ``` The Pro Price is USD-only; Stripe rejects `SubscriptionSchedule.modify` because phases must share currency. Wrapping that in a generic 502 hid the real cause and made it read like a Stripe outage. ## What * Frontend: flip `PendingChangeBanner.isCancellation` from `pendingTier === "BASIC"` to `"NO_TIER"`. Update both component and page-level tests that exercised the cancellation branch. * Backend: catch `stripe.InvalidRequestError` whose message mentions `currency` in `update_subscription_tier`, and return **422** with *"Tier change unavailable for your current billing currency. Cancel your subscription and re-subscribe at the target tier, or contact support."* — so users see the actual reason, not a misleading outage message. Other `StripeError` paths still return 502. * New backend test asserts the currency-mismatch branch returns 422 with the new copy. ## How * `PendingChangeBanner.tsx` line 28: 1-char change (`"BASIC"` → `"NO_TIER"`). * `subscription_routes_test.py` and `PendingChangeBanner.test.tsx` updated to use `NO_TIER` for the cancellation fixture. * `v1.py` `update_subscription_tier` adds a typed `except stripe.InvalidRequestError` branch ahead of the generic `StripeError`; only currency-mismatch messages get the special 422, everything else falls through to the existing 502. ## The real fix lives in Stripe config The defensive 422 here is just a clearer error surface. To actually unblock GBP/EUR users from changing tiers, the per-tier Stripe Prices (Pro, and Basic if priced) need `currency_options` for GBP added — Max already has this, which is why Max checkout shows the £/$ toggle. Stripe locks `currency_options` after a Price has been transacted, so the procedure is: create a new Price with USD + GBP from the start → update the `stripe-price-ids` LD flag to the new Price ID. No further code change required; same Price ID stays per tier, multiple currencies inside it. ## Checklist - [x] Component test for new banner copy - [x] Backend test for 422 currency-mismatch branch - [x] Format / lint / types pass - [x] No protected route added — N/A |
||
|
|
ea4c617093 |
ci: lower frontend patch coverage target from 80% to 70%
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
dbb4090c52 |
fix(backend): reject zero LD storage values, fix prettier formatting
- Change LD workspace storage validation from `< 0` to `<= 0` to prevent a zero value from silently disabling quota enforcement - Fix prettier formatting in UsagePanelContentRender test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
489ccf96f5 |
fix(frontend): isort fix and additional StorageBar test coverage
- Fix import sort order in rate_limit_test.py (isort) - Add tests: zero limit hiding, 80%+ orange bar, singular file text, tiny usage display, header/tier label, showHeader=false Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
db66f34cf4 |
feat(backend): pull workspace storage limits from LaunchDarkly
- Add _DEFAULT_TIER_WORKSPACE_STORAGE_MB with explicit NO_TIER entry (250 MB) - Add _fetch_workspace_storage_limits_flag() and get_workspace_storage_limits_mb() mirroring the chat-limit LD pattern - Add Flag.COPILOT_TIER_WORKSPACE_STORAGE_LIMITS enum entry - Update get_workspace_storage_limit_bytes() to use LD-backed map - Add tests: LD resolution, NO_TIER behavior, unsubscribe downgrade, upload rejection when over cap, frontend null-usage-windows rendering Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
bf044a2634 |
Merge branch 'dev' into pr/12780/branch10
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
c08b9774dc |
fix(backend/push): skip OS push for onboarding payloads (#12944)
## Why [#12723](https://github.com/Significant-Gravitas/AutoGPT/pull/12723) wired Web Push fanout into `AsyncRedisNotificationEventBus.publish()` so copilot completion events reach users with the tab closed. But the bus is also used by `data/onboarding.py` for in-page step toasts, and those started firing OS-level system notifications (`increment_runs`, `step_completed`, etc.) — unwanted noise. ## What Smallest possible patch: skip the OS push fanout when `payload.type == "onboarding"`. WebSocket delivery is unchanged. ## How ```python async def publish(self, event: NotificationEvent) -> None: await self.publish_event(event, event.user_id) # Skip OS push for onboarding step toasts — those are in-page only. # TODO: remove once the onboarding/wallet rework lands. if event.payload.model_dump().get("type") == "onboarding": return ... ``` Five-line addition in `backend/data/notification_bus.py`. Marked `TODO` to remove once the upcoming onboarding/wallet rework decides per-event whether a system notification is desired. Tests: added `test_publish_skips_web_push_for_onboarding`; existing fanout tests continue to validate the happy path with non-onboarding payloads. ## Test plan - [x] `poetry run format` (ruff + isort + black + pyright) - [ ] CI: `poetry run pytest backend/data/notification_bus_test.py` - [ ] Manual on dev: trigger onboarding step → confirm no OS notification; finish copilot session → confirm OS notification still fires. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>autogpt-platform-beta-v0.6.58 |
||
|
|
fe3d6fb118 |
feat(platform): subscription credit grants + paywall gate + dialog UX + cross-pod cache (#12933)
## Why
Started as a regression fix for admin-granted user downgrades hitting
Stripe Checkout, broadened to close the surrounding gaps in the Stripe
billing flow that surfaced during testing. Three concrete user-facing
problems the PR resolves:
1. **Admin-granted users couldn't change tier in-app** when their
current tier had no `stripe-price-id-*` LD configured — clicking
Downgrade silently routed to a paid-signup Stripe Checkout instead of
just changing the tier.
2. **Subscription payments granted nothing visible to users** — paying
£20–£320/mo gave higher rate-limit multipliers but no AutoPilot credits
in the user's balance, despite a dialog promising "credit to your next
Stripe invoice" (which users naturally read as AutoGPT credits).
3. **Tier oscillated across page refreshes** — `get_user_by_id` was
process-local cached, so dev's 4 server pods each held their own copy.
Tier could read MAX on one pod and BASIC on another for ~5 min after a
webhook update, depending on which pod the request landed on.
Plus three structural improvements caught during review:
4. **No paywall enforcement for paid-cohort users without subscription**
— non-beta users on `BASIC` (no Stripe sub) could freely use AutoPilot.
5. **Upgrade/downgrade dialog copy was misleading** — implied a Stripe
redirect that doesn't happen for existing-sub modifications, used
"credit" ambiguously, and didn't surface the next-invoice date.
6. **Top-up Checkout created an ephemeral Stripe Product per session** —
no canonical Product for dashboard reporting, no way to scope coupons to
top-ups.
## What
### 1. Admin-granted downgrades skip Checkout (price-id-pruning
regression)
`update_subscription_tier()` used to gate its modify-or-DB-flip block on
`current_tier_price_id is not None`. When a tier was pruned from
`stripe-price-ids` LD, that gate skipped the inner DB-flip branch and
the request fell through to Checkout — sending admin-granted users to a
paid-signup flow when they were trying to *reduce* their tier. Drop the
gate and call `modify_stripe_subscription_for_tier()` unconditionally —
the function self-reports `False` when there's no Stripe sub. One
uniform path for everyone now.
### 2. Subscription credit grant on every paid Stripe invoice
New `invoice.payment_succeeded` webhook handler at
[`credit.py:handle_subscription_payment_success`](autogpt_platform/backend/backend/data/credit.py)
adds a `GRANT` transaction equal to `invoice.amount_paid`, keyed by
`INVOICE-{id}` for idempotency (Stripe webhook retries cannot
double-grant). Initial signup, monthly renewal, and prorated upgrade
charges all surface as AutoGPT balance bumps the moment Stripe confirms
the charge. Skipped: non-subscription invoices, $0 invoices, ENTERPRISE
users.
### 3. Cross-pod user cache
[`user.py:31`](autogpt_platform/backend/backend/data/user.py#L31)
`cache_user_lookup = cached(maxsize=1000, ttl_seconds=300,
shared_cache=True)`. Single line — moves the cache to Redis so all
server pods read/write the same key. The existing
`get_user_by_id.cache_delete(user_id)` invalidations now propagate
cross-pod.
### 4. PaywallGate
New
[`PaywallGate`](autogpt_platform/frontend/src/app/(platform)/PaywallGate/PaywallGate.tsx)
wraps the `(platform)/layout.tsx` route group. When
`ENABLE_PLATFORM_PAYMENT === true` (paid cohort) AND `subscription.tier
=== "BASIC"`, redirects to `/profile/credits` where the credits page
shows a "Pick a plan to continue using AutoGPT" banner above the tier
picker.
Notes:
- **Beta cohort skips entirely** (flag off → `useGetSubscriptionStatus`
query disabled, no redirect).
- **Gates on DB tier, not `has_active_stripe_subscription`** — Sentry
caught that a transient Stripe API error in
`get_active_subscription_period_end()` would set `has_active=false` for
paying users, locking them out. The DB tier is set by webhooks and
persists locally; Stripe API hiccups don't flip it.
- **Exempt routes**: `/profile`, `/admin`, `/auth`, `/login`, `/signup`,
`/reset-password`, `/error`, `/unauthorized`, `/health`. Onboarding
lives in the sibling `(no-navbar)` group, so this gate doesn't conflict
with the in-flight onboarding-paywall integration.
### 5. Upgrade/downgrade dialog clarity
`SubscriptionStatusResponse` now exposes
`has_active_stripe_subscription: bool` and `current_period_end: int |
None`, computed via a new
[`get_active_subscription_period_end`](autogpt_platform/backend/backend/data/credit.py)
helper. Frontend dialogs branch on those:
**Upgrade — modify-in-place** (existing sub):
> Your subscription is upgraded to MAX immediately. On your next invoice
on May 21, 2026, your saved card is charged for the upgrade proration
since today plus the next month at the new rate, with the unused portion
of your current plan automatically deducted. Credits matching the paid
amount are added to your AutoGPT balance once Stripe confirms the
charge.
**Upgrade — Checkout** (no sub):
> You'll be redirected to Stripe to enter payment details and start your
MAX subscription. The first invoice's amount is added to your AutoGPT
balance once Stripe confirms the charge.
**Downgrade (paid → paid)**:
> Switching to PRO takes effect at the end of your current billing
period on May 21, 2026 — no charge today. You keep your current plan
until then. From that date your saved card is billed at the PRO rate,
and matching credits are added to your AutoGPT balance with each paid
invoice.
Toast wording on success matches dialog. Tier labels run through
`getTierLabel()` so we render "Pro/Max/Business" not "PRO/MAX/BUSINESS"
(Sentry-flagged in review).
### 6. Top-up Stripe Product ID via LD flag
New `STRIPE_PRODUCT_ID_TOPUP` LD flag. **Unset (default)** → legacy
inline `product_data` (Stripe creates an ephemeral product per Checkout
— backward-compatible with current behavior). **Set to a Stripe Product
ID** → line item references that Product so all top-ups group under one
entity in Stripe Dashboard reporting; per-session amount stays dynamic
via `price_data.unit_amount`. The two paths are mutually exclusive
(Stripe rejects `product` + `product_data` together).
## How
- Backend changes confined to
[`v1.py`](autogpt_platform/backend/backend/api/features/v1.py),
[`credit.py`](autogpt_platform/backend/backend/data/credit.py),
[`user.py`](autogpt_platform/backend/backend/data/user.py),
[`feature_flag.py`](autogpt_platform/backend/backend/util/feature_flag.py).
- Frontend changes: new
[`PaywallGate`](autogpt_platform/frontend/src/app/(platform)/PaywallGate/PaywallGate.tsx)
component + small edits to
[`(platform)/layout.tsx`](autogpt_platform/frontend/src/app/(platform)/layout.tsx),
`SubscriptionTierSection.tsx`, `useSubscriptionTierSection.ts`,
`helpers.ts`.
- Both backend and frontend pass `user.id` to LD context (verified in
[`feature_flag.py:_fetch_user_context_data`](autogpt_platform/backend/backend/util/feature_flag.py)
and
[`feature-flag-provider.tsx`](autogpt_platform/frontend/src/services/feature-flags/feature-flag-provider.tsx))
for proper per-user targeting.
### Out of scope (follow-ups)
- Hard-paywall onboarding integration (Lluis's work — coordinated;
PaywallGate wraps `(platform)/layout.tsx` and onboarding lives in
`(no-navbar)`, so they don't conflict).
- Beta-users-as-Stripe-trial migration.
- Max-cap usage alerting + "Contact us" routing.
- "No Active Subscription" state rename.
- "Your credits" → "Automation Credits" rename + helper tooltip.
- BASIC tier resurface as a free / cancel-subscription option
(deliberately deferred per current product direction).
## Test plan
### Backend (all green in CI)
- [x] `poetry run pytest
backend/api/features/subscription_routes_test.py` — 41 passed.
- [x] `poetry run pytest backend/data/credit_subscription_test.py`
covering: `handle_subscription_payment_success` (grants credits, skips
non-sub/zero/missing-customer/unknown-user/ENTERPRISE, idempotent on
retry), `get_active_subscription_period_end` (happy path, no-customer
short-circuit, Stripe error swallow), top-up Product ID flag both
branches.
- [x] Type-check (3.11/3.12/3.13) — green after explicit
`list[stripe.checkout.Session.CreateParamsLineItem]` typing on top-up
`line_items`.
- [x] Codecov patch — both backend + frontend green.
### Frontend (all green in CI)
- [x] `pnpm test:unit` — 2154/2154 pass, including 5 new PaywallGate
tests (beta-cohort skip, paid-cohort BASIC redirect, no-redirect for
PRO/MAX/BUSINESS, exempt-prefix matrix, loading-state guard) and updated
`formatCost`/dialog-copy assertions.
- [x] `pnpm types`, `pnpm format`, `pnpm lint` — clean.
### Live verification on `dev-builder.agpt.co` (5/5 pass — see PR
comments)
- [x] Login + credits page renders correctly with Pro + Max cards, BASIC
+ BUSINESS hidden, no paywall banner for active subscriber.
- [x] Downgrade dialog shows new copy with concrete date + "no charge
today" + credit-grant explanation.
- [x] PaywallGate does NOT redirect paying users (MAX tier with active
sub).
- [x] PaywallGate REDIRECTS BASIC user (DB-flipped via `kubectl exec`
for testing, restored after) → `/build` redirects to `/profile/credits`,
violet "Pick a plan to continue using AutoGPT" banner displayed.
- [x] Upgrade dialog (modify-in-place) shows the corrected proration
phrasing.
- [ ] Manual: real production-like test of `invoice.payment_succeeded`
granting credits — fires on next billing cycle (2026-05-21 for the dev
test user); not testable today without manipulating Stripe webhook.
|
||
|
|
c6d31f8252 |
feat(frontend): gate onboarding SubscriptionStep behind ENABLE_PLATFORM_PAYMENT (#12943)
### Why / What / How **Why:** The onboarding `SubscriptionStep` (added in #12935) is currently shown to every new user, but the platform payment system is rolled out behind the `ENABLE_PLATFORM_PAYMENT` LaunchDarkly flag. We need the onboarding plan-selection step to honor the same flag so users in flag-off cohorts don't hit a payment surface that the rest of the product won't support. **What:** Conditionally render the `SubscriptionStep` based on `ENABLE_PLATFORM_PAYMENT`. When the flag is off the wizard runs `Welcome → Role → PainPoints → Preparing` (3 user-interactive steps + transition); when on, behavior is unchanged (`Welcome → Role → PainPoints → Subscription → Preparing`). **How:** - `page.tsx` reads the flag, computes `totalSteps` (3 vs. 4) and `preparingStep` (4 vs. 5), and only renders `SubscriptionStep` when the flag is on. - `useOnboardingPage.ts` threads the same `preparingStep` into the URL `parseStep` clamp and into the "submit profile when entering Preparing" effect, so both adapt to the flag state. - The Zustand store is left unchanged — its hard `Math.min(5, …)` clamp is unreachable in flag-off flow because PainPointsStep advances 3 → 4 (Preparing) and that's the terminal step. - `playwright/utils/onboarding.ts`: with `NEXT_PUBLIC_PW_TEST=true` LaunchDarkly returns `defaultFlags` (`ENABLE_PLATFORM_PAYMENT: false`), so the helper now waits up to 2s for the Subscription header and only clicks a plan CTA if the step is actually rendered. ### Changes 🏗️ - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/page.tsx` — gate `SubscriptionStep` on `ENABLE_PLATFORM_PAYMENT`; derive `totalSteps`/`preparingStep` from the flag. - `autogpt_platform/frontend/src/app/(no-navbar)/onboarding/useOnboardingPage.ts` — make `parseStep` and the profile-submission effect respect the flag-derived `preparingStep`. - `autogpt_platform/frontend/src/playwright/utils/onboarding.ts` — make the Subscription step optional in `completeOnboardingWizard` so E2E works in both flag states. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] Existing onboarding unit tests pass (`pnpm test:unit` — 2447 passed, including `PainPointsStep`, `RoleStep`, `SubscriptionStep`, store) - [x] `pnpm format`, `pnpm lint`, `pnpm types` clean - [ ] Manual: with flag **off**, walk onboarding and confirm wizard goes Welcome → Role → PainPoints → Preparing → /copilot, progress bar shows 3 steps - [ ] Manual: with flag **on** (LD or `NEXT_PUBLIC_FORCE_FLAG_ENABLE_PLATFORM_PAYMENT=true`), walk onboarding and confirm SubscriptionStep is present at step 4, progress bar shows 4 steps - [ ] Manual: with flag **off**, hit `/onboarding?step=5` directly and confirm it clamps back to step 1 (no orphan Subscription state) - [ ] Playwright: `completeOnboardingWizard` E2E flow continues to pass under default `NEXT_PUBLIC_PW_TEST=true` (flag off path) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes (no config changes — flag already exists in LaunchDarkly + `defaultFlags`) - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (none needed) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
28ae7ebac8 |
feat(onboarding): add subscription plan selection step (#12935)
## Summary Adds a new **Subscription Step** (Step 4) to the onboarding wizard, allowing users to choose a plan (Pro, Max, or Team) before reaching the "Preparing" step. ## Changes ### New files - **`steps/SubscriptionStep.tsx`** — Full subscription UI with: - Three plan cards (Pro $50/mo, Max $320/mo, Team — coming soon) - Monthly / yearly billing toggle (yearly shows annual total with 20% discount, plus monthly equivalent) - Country selector (28 Stripe-supported countries) that opens upward as a search modal - Localized pricing using live exchange rates - **`steps/countries.ts`** — Currency data module with exchange rates, `formatPrice()` helper, and zero-decimal currency handling (JPY, KRW, HUF, CLP) ### Modified files - **`store.ts`** — Extended `Step` type to `1 | 2 | 3 | 4 | 5`, added `selectedPlan` and `selectedBilling` state/actions - **`page.tsx`** — Wired `SubscriptionStep` as Step 4, moved `PreparingStep` to Step 5, adjusted progress bar and dot indicators - **`useOnboardingPage.ts`** — Updated `parseStep` range to 1–5, profile submission now triggers at Step 5 ## Design decisions - Follows existing component patterns: uses `FadeIn`, `Text`, `Button` atoms, `cn()` utility, Phosphor icons - Country selector opens **upward** to avoid clipping below the viewport - Plan selection advances to Step 5 immediately (Stripe integration is TODO) - Exchange rates are hardcoded for now — should be fetched from an API in production ## TODO - [ ] Integrate with Stripe checkout / backend subscription API - [ ] Fetch live exchange rates instead of hardcoded values - [ ] Add responsive layout for mobile viewports --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Lluis Agusti <hi@llu.lu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ubbe <hi@ubbe.dev> |
||
|
|
e0f9146d54 |
feat(platform): add Web Push notifications via VAPID for background delivery (#12723)
### Why / What / How **Why:** When a user kicks off an AutoPilot task and leaves the platform (closes the tab, switches to another page, or minimizes the browser), they have no way of knowing when it completes unless they come back and check. This breaks the "set it and forget it" promise of automation. **What:** Adds Web Push notifications using the standard Push API (VAPID). Push notifications are delivered through free browser vendor services (Google FCM, Apple APNs, Mozilla Push) to a service worker — even when all AutoGPT tabs are closed, as long as the browser process is running. The system is generic and extensible to all notification types, with copilot session completion as the first integration. **How:** - **Backend:** A new `PushSubscription` Prisma model stores per-user push subscriptions. When a `NotificationEvent` is published to the Redis notification bus, the existing `notification_worker` in `ws_api.py` fires a tracked background `send_push_for_user()` task. This uses `pywebpush` to call the browser push services with VAPID authentication. Includes per-user TTL-bounded debounce (5s), per-user subscription cap (20), 410/404 auto-cleanup, periodic scheduler-driven cleanup of high-failure rows, and route-level SSRF rejection of untrusted endpoints. - **Frontend:** A `push-sw.js` service worker handles `push` events and shows OS notifications via `self.registration.showNotification()`, with click-to-navigate. A `PushNotificationProvider` mounted at the platform layout registers the SW and subscription on all pages, posts the current URL to the SW on every Next.js navigation (since Chrome's `WindowClient.url` is stale for SPA routing), forwards the user's notifications-toggle setting to the SW, and tears down on logout. The copilot in-page notification path defers to the SW when a push subscription is active so users don't get duplicate alerts. ### Behavior — when does an OS notification fire? | Where the user is focused | Notifications toggle | OS notification? | |---|---|---| | Any `/copilot` page (any session, tab visible + browser focused) | on | suppressed — sidebar green check + title badge handle it | | `/library` (or any non-`/copilot` route) | on | **fires** | | `/copilot` but tab hidden (Cmd-Tab away, minimized, different tab) | on | **fires** | | All AutoGPT tabs closed (browser process still running) | on | **fires** | | Any state | off | suppressed | | Anywhere | permission not granted / no push subscription | falls back to in-page `Notification()` if user is away on `/copilot`; nothing otherwise | Click any OS notification → focuses an existing tab and navigates it to `/copilot?sessionId=<id>`, or opens a new window if no AutoGPT tab is open. ### Test plan #### Setup - [ ] Generate VAPID keys via the snippet in `backend/.env.default` and set `VAPID_PRIVATE_KEY`, `VAPID_PUBLIC_KEY`, `VAPID_CLAIM_EMAIL` in `backend/.env` - [ ] Leave `NEXT_PUBLIC_VAPID_PUBLIC_KEY` unset on the frontend (single source of truth via `/api/push/vapid-key`) - [ ] Start backend + frontend, grant notification permission on the copilot page - [ ] Verify `push-sw.js` is "activated and is running" in DevTools → Application → Service Workers - [ ] Verify `POST /api/push/subscribe` created exactly one DB row in `PushSubscription` for your user #### Notification show / suppress matrix - [ ] Trigger completion **on `/copilot` viewing the same session**, tab visible + focused → no OS notification (sidebar green check appears) - [ ] Trigger completion **on `/copilot` viewing a different session**, tab visible + focused → no OS notification (still considered "in the feature") - [ ] Trigger completion **on `/library`**, tab visible + focused → OS notification fires - [ ] Trigger completion **on `/copilot`** but with the tab hidden (Cmd-Tab to another app) → OS notification fires - [ ] Trigger completion with all AutoGPT tabs closed → OS notification fires (browser must still be running) - [ ] Toggle notifications **off** in the copilot UI → trigger completion → no OS notification - [ ] Toggle notifications **back on** → trigger completion → OS notification fires #### Click behavior - [ ] OS notification → click → focuses an existing AutoGPT tab and navigates to `/copilot?sessionId=<id>` - [ ] OS notification with no AutoGPT tab open → click → opens a new tab on `/copilot?sessionId=<id>` #### Lifecycle - [ ] Logout → DB row removed, browser unsubscribed; no further OS notifications until login + re-subscribe - [ ] Stale subscription (e.g. unsubscribed externally) → backend gets 410 from FCM → row auto-deleted; second push attempts no longer fan out to it ### Changes 🏗️ **Backend — New files:** - `backend/data/push_subscription.py` — CRUD for push subscriptions: `upsert` (with `MAX_SUBSCRIPTIONS_PER_USER` cap), `find_many`, `delete`, `increment_fail_count`, `cleanup_failed_subscriptions`, `validate_push_endpoint` (HTTPS + push-service hostname allowlist for SSRF prevention) - `backend/data/push_sender.py` — Fire-and-forget push delivery with `cachetools.TTLCache`-bounded debounce, defense-in-depth re-validation at send time, 410/404 auto-cleanup with regex-based status extraction (covers pywebpush versions where `e.response` is unset) - `backend/api/features/push/routes.py` — 3 endpoints: `GET /api/push/vapid-key`, `POST /api/push/subscribe`, `POST /api/push/unsubscribe` (all with `requires_user` auth and 400 on invalid endpoints) - `backend/api/features/push/model.py` — Pydantic models with `min_length`/`max_length` constraints on endpoint and crypto keys **Backend — Modified files:** - `schema.prisma` — Added `PushSubscription` model + `User` relation - `pyproject.toml` — Added `pywebpush ^2.3` dependency - `backend/util/settings.py` — VAPID key fields on `Secrets`; `push_subscription_cleanup_interval_hours` config - `backend/api/rest_api.py` — Registered push router at `/api/push` - `backend/api/ws_api.py` — Notification worker now fires `send_push_for_user()` as a tracked background task (strong-ref set + done callback so asyncio doesn't GC it mid-run) - `backend/data/db_manager.py` — Exposed push subscription RPC methods on the DB manager async client - `backend/executor/scheduler.py` — Periodic `cleanup_failed_push_subscriptions` job (default 24h) - `backend/.env.default` — VAPID env vars with key generation snippet **Frontend — New files:** - `public/push-sw.js` — Service worker: routes pushes via `NOTIFICATION_MAP`, suppresses when user is on `/copilot`, accepts `CLIENT_URL` and `NOTIFICATIONS_ENABLED` postMessages so SW logic stays in sync with SPA navigation and the toggle, click handler with focus → navigate → openWindow fallback, `pushsubscriptionchange` re-subscribe with `credentials: include` - `src/services/push-notifications/registration.ts`, `api.ts`, `helpers.ts` — SW registration / Push API subscription / backend API helpers - `src/services/push-notifications/usePushNotifications.ts` — Hook that auto-subscribes on login and tears down on logout - `src/services/push-notifications/useReportClientUrl.ts` — Posts current pathname+search to SW on every Next.js route change (works around stale `WindowClient.url`) - `src/services/push-notifications/useReportNotificationsEnabled.ts` — Forwards the user's notifications toggle to the SW - `src/services/push-notifications/PushNotificationProvider.tsx` — Mounts all three hooks at the platform layout level **Frontend — Modified files:** - `src/app/(platform)/layout.tsx` — Mounted `<PushNotificationProvider />` - `src/app/(platform)/copilot/useCopilotNotifications.ts` — Skips in-page `Notification()` when a SW push subscription is active (avoids duplicate alerts) - `src/services/storage/local-storage.ts` — Added `PUSH_SUBSCRIPTION_REGISTERED` key - `frontend/.env.default` — Optional `NEXT_PUBLIC_VAPID_PUBLIC_KEY` (left unset by default to keep `/api/push/vapid-key` as the single source of truth) **Configuration changes:** - New env vars: `VAPID_PRIVATE_KEY`, `VAPID_PUBLIC_KEY`, `VAPID_CLAIM_EMAIL` (backend); optional `NEXT_PUBLIC_VAPID_PUBLIC_KEY` (frontend) - New `push_subscription_cleanup_interval_hours` setting (default 24, range 1–168) - New DB migration: `PushSubscription` table (`20260420120000_add_push_subscription`) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] All blockers and should-fixes from the autogpt-pr-reviewer review have been addressed (see PR thread) - [x] All inline review threads resolved (49 threads addressed) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c3c2737c42 |
feat(platform): copilot-bot (Python / discord.py) (#12618)
## Why AutoPilot needs to reach users on chat platforms — Discord first, Telegram / Slack / Teams / WhatsApp next. This PR adds the bot service that bridges those platforms to the AutoPilot backend via the `PlatformLinkingManager` AppService introduced in #12615. Two independent linking flows (see #12615 for the rationale): - **SERVER links**: first person to run `/setup` in a guild claims it. Anyone in the server can mention the bot; all usage bills to the owner. - **USER links**: an individual DMs the bot, links their personal account, DMs bill to their own AutoPilot. A server owner still has to link their DMs separately. ## What A Python service using `discord.py`, living alongside the rest of the backend. Connects to the platform_linking service via cluster-internal RPC (no shared bearer token) and subscribes to copilot streams directly on Redis (no HTTP SSE proxy). Originally prototyped in Node.js with Vercel's Chat SDK — rewritten in Python after team feedback: the rest of the platform is Python, `discord.py` was already a dependency, and the Chat SDK's streaming-UI abstractions don't apply to a headless chat bot. ### Deployment - **Shares the existing backend Docker image** — no separate Dockerfile, no separate Artifact Registry. A `copilot-bot` poetry script entry lets the same image run with `command: ["copilot-bot"]` in the Helm chart. - **Auto-starts with `poetry run app`** when `AUTOPILOT_BOT_DISCORD_TOKEN` is set, so the full local dev stack includes the bot without extra setup. - **Runs standalone** via `poetry run copilot-bot` for the production pod. Infra PR: [AutoGPT_cloud_infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310). ### File layout ``` backend/copilot/bot/ ├── app.py # CoPilotChatBridge(AppService) + adapter factory + outbound @expose ├── config.py # Shared (platform-agnostic) config ├── handler.py # Core logic: routing, linking, batched streaming ├── platform_api.py # Thin facade over PlatformLinkingManagerClient + stream_registry ├── platform_api_test.py ├── text.py # split_at_boundary + format_batch ├── threads.py # Redis-backed thread subscription tracking ├── README.md └── adapters/ ├── base.py # PlatformAdapter ABC + MessageContext └── discord/ ├── adapter.py # Gateway connection, events, thread creation, buttons ├── commands.py # /setup, /help, /unlink └── config.py # Discord token + message limits ``` **Locality rule:** anything platform-specific lives under `adapters/<platform>/`. `app.py` is the only file that names specific platforms — it's the factory that picks adapters based on which tokens are set. Adding Telegram later = drop in `adapters/telegram/` with the same shape. ### `CoPilotChatBridge` — now an `AppService` Previously `AppProcess`. Now inherits `AppService`, runs its RPC server on `Config.copilot_chat_bridge_port=8010`, and exposes two scaffolding `@expose` methods for the backend→chat-platform direction: - `send_message_to_channel(platform, channel_id, content)` — stub - `send_dm(platform, platform_user_id, content)` — stub Both currently raise `NotImplementedError` — they unlock the architecture for future features (scheduled agent outputs piped to Discord, etc.) without another structural change. A matching `CoPilotChatBridgeClient` + `get_copilot_chat_bridge_client()` factory lets other services call the bot by the same `AppServiceClient` pattern used for `NotificationManager` and `PlatformLinkingManager`. ### Bot behaviour - `/setup` — server only, ephemeral, returns a "Link Server" button. Rejects DM invocations up front. - `/help` — ephemeral usage info. - `/unlink` — ephemeral, opens a "Settings" button pointing at `AUTOGPT_FRONTEND_URL/profile/settings` (real unlinking needs JWT auth). - **Thread per conversation**: @mentioning the bot in a channel creates a thread and routes the reply there. Subsequent messages in that thread don't need another @mention — thread subscriptions are tracked in Redis with a 7-day TTL. - **Batched follow-ups**: messages arriving mid-stream append to a per-thread pending list; drained as a single follow-up turn when the current stream ends. - **Persistent typing indicator**: 8-second re-fire loop. - **Per-user identity prefix**: every forwarded message tagged `[Message sent by {name} (Discord user ID: ...)]`. - **Platform-aware chunking**: long responses split at paragraph → line → sentence → word boundaries (1900 chars for Discord). - **Link buttons** for DM link prompts and `/setup` / `/unlink` responses. - **Duplicate message guard**: on `DuplicateChatMessageError` the bot stays quiet — no double response. ### Env vars | Variable | Purpose | |----------|---------| | `AUTOPILOT_BOT_DISCORD_TOKEN` | Discord bot token — enables the Discord adapter | | `AUTOGPT_FRONTEND_URL` | Frontend base URL for link confirmation pages | | `REDIS_HOST` / `REDIS_PORT` | Shared with backend — session + thread-subscription state + direct copilot stream subscription | | `PLATFORMLINKINGMANAGER_HOST` | Cluster DNS name of the `PlatformLinkingManager` service (RPC target) | Gone vs. the previous REST design: `AUTOGPT_API_URL`, `PLATFORM_BOT_API_KEY`, `SSE_IDLE_TIMEOUT`. ## How - **Adapter pattern**: `PlatformAdapter` ABC defines `start`, `stop`, `send_message`, `send_link`, `start_typing`, `create_thread`, `max_message_length`, `chunk_flush_at`, etc. Each platform implements the interface; the shared `MessageHandler` calls through it. - **Control plane over RPC**: `PlatformAPI` (~180 lines) is a thin facade over `PlatformLinkingManagerClient` — `resolve_server`, `resolve_user`, `create_link_token`, `create_user_link_token`, `stream_chat`. The bot never constructs HTTP requests or handles an API key. - **Streaming over Redis Streams**: `stream_chat` calls `start_chat_turn` (backend `@expose`), receives a `ChatTurnHandle(session_id, turn_id, user_id, subscribe_from="0-0")`, then subscribes directly via `stream_registry.subscribe_to_session(...)`. Yields text from `StreamTextDelta`, terminates on `StreamFinish`, surfaces `StreamError.errorText` to the user. No SSE parsing, no X-Session-Id header dance. - **Error model**: backend domain exceptions (`NotFoundError`, `LinkAlreadyExistsError`, `DuplicateChatMessageError`) cross the RPC boundary cleanly (all `ValueError`-based, registered in `backend.util.exceptions`). The bot catches them by type instead of inspecting HTTP status codes. - **Cooperative batching**: `TargetState.processing` flag + per-target `pending` list. Messages arriving while `processing=True` append; the running stream's finally block loops to drain the list before releasing. - **Typing helper for `endpoint_to_async`**: added an `@overload` so `async def` `@expose` methods on the server type-check correctly on the client side (the scheduler pattern avoids this by using sync `@expose`, but the new managers are async). ## Tests - `backend/copilot/bot/platform_api_test.py` — new. Covers resolve (server + user), create link tokens (success + `LinkAlreadyExistsError` propagation), stream chat (yields deltas, terminates on `StreamFinish`, surfaces `StreamError`, propagates `DuplicateChatMessageError` and `NotFoundError`, handles `subscribe_to_session` returning `None`). - `poetry run pyright backend/copilot/bot/` — clean. - `poetry run ruff check backend/copilot/bot/` — clean. - `poetry run copilot-bot` starts and connects to Discord Gateway, syncs slash commands. - `/setup` in a guild → confirm on frontend → mention bot → AutoPilot streams back in a created thread. - Thread follow-ups work without re-mentioning. - Spamming messages mid-stream produces one batched follow-up. - Long responses chunk at natural boundaries. - DM to unlinked user → "Link Account" button → confirm → DMs stream as that user's AutoPilot. ## Stack - Backend API: #12615 — merge first - Frontend link page: #12624 - Infra: [AutoGPT_cloud_infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310) --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai> |
||
|
|
37f247c795 |
feat(frontend): creator dashboard page for settings v2 (SECRT-2281) (#12934)
### Why / What / How **Why:** The creator dashboard route under settings v2 currently shows a "Coming soon" placeholder. SECRT-2281 fills it in so creators can manage their store submissions from one place. **What:** Implements the full creator dashboard at `/settings/creator-dashboard` — stats overview, desktop submissions table, mobile submissions list, filtering/sorting, selection bar, edit modal, and empty/loading/error states. **How:** Page logic lives in `useCreatorDashboardPage.ts` (data fetch, filter state, modal state, CRUD callbacks); pure transforms in `helpers.ts`; UI broken into colocated `components/*` (one folder per component, each ~200–400 lines). Reuses generated API hooks, `ErrorCard`, and `EditAgentModal` from the design system. Mobile/desktop split via Tailwind `md:` breakpoints rather than runtime detection. ### Changes 🏗️ - Replace placeholder `page.tsx` with the real dashboard, wired to `useCreatorDashboardPage` - Add `useCreatorDashboardPage.ts` (page-level state + handlers) and `helpers.ts` (filter/sort/stat utilities) - Add components: `DashboardHeader`, `DashboardSkeleton`, `EmptyState`, `StatsOverview`, `SubmissionsList` (+ `columns/*`, `useSubmissionSelection`), `SubmissionItem` (+ `useSubmissionItem`), `SubmissionSelectionBar`, `MobileSubmissionsList` (+ `MobileSelectionBar`), `MobileSubmissionItem`, `ColumnFilter` - Set document title to "Creator dashboard – AutoGPT Platform" - Surface fetch errors via `ErrorCard` with retry; show `DashboardSkeleton` while loading; show `EmptyState` when there are no submissions ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Loading state renders skeleton until submissions load - [ ] Empty state renders when the creator has no submissions - [ ] Error state renders `ErrorCard` and retry refetches the list - [ ] Stats overview reflects approved/pending/rejected/draft counts - [ ] Desktop list: sort/filter by status and other columns updates the visible rows - [ ] Desktop list: selection bar appears on row select and clears on reset - [ ] Mobile list (≤ md breakpoint): renders mobile items + selection bar - [ ] Edit modal opens for a submission, saves, and refreshes the list on success - [ ] Delete action removes the submission and updates stats - [ ] View action navigates to the submission's public detail - [ ] Submit/publish entry point opens the publish modal - [ ] Document title shows "Creator dashboard – AutoGPT Platform" |
||
|
|
ae4a421620 |
fix(platform): small fixes and stagger animations on settings pages (#12937)
## Why The new Settings v2 surfaces (preferences, api-keys, integrations, profile) shipped with a few rough edges spotted in self-review: - **Timezone saves silently dropped on refresh.** Backend `GET /auth/user/timezone` resolved the user via `get_or_create_user(user_data)` (a 5-min in-process cache keyed by the JWT-payload dict). `update_user_timezone` only invalidates `get_user_by_id`'s cache, so the GET kept returning the pre-save tz until TTL expired — looked exactly like "save did nothing." - **Confusing "Looks like you're in X" CTA on the Time zone card** that did nothing in the common case (server tz already matched the browser tz, so clicking it produced no dirty state). - **Save was disabled out of the gate when server tz was `"not-set"`** — the hook substituted the browser tz into both `formState` and `savedState`, so they were equal and `dirty` was false. - **Lists felt static.** No motion when API keys / integrations mount, and the loading skeletons popped in all at once instead of handing off cleanly to the loaded rows. - **Profile bio textarea** corner clipped against the rounded-3xl border and the scrollbar overflowed the rounded container. ## What ### Bug fixes - `GET /auth/user/timezone` now reads via `get_user_by_id(user_id)` — the same cache `update_user_timezone` already invalidates — so a save followed by refresh shows the new tz immediately. - `usePreferencesPage` now treats the raw server tz (`"not-set"` included) as the saved baseline, while `formState` uses the browser tz only as a *display* fallback. Effect: when the user has never set a tz, Save is enabled on first paint and a single click persists the detected tz. - Frontend save flow swapped `setQueryData` for `invalidateQueries`, mirroring the older `/profile/(user)/settings` page so we always re-read the persisted value. - Removed the auto-detect "Looks like you're in X" button + its dead helpers. ### Animations (per Emil Kowalski's guidelines) Added orchestrated stagger animations that run on both the loaded list **and** its skeleton, so the loading→loaded handoff is continuous in-position: - **API keys list + skeleton:** 280ms ease-out `cubic-bezier(0.16, 1, 0.3, 1)`, 40ms stagger, opacity + 6px translate. - **Integrations list + skeleton:** 300ms ease-out, 80ms stagger, opacity + 16px translate (rows are bigger / fewer). - Both honor `prefers-reduced-motion` via `useReducedMotion`; only `opacity` and `transform` are animated. ### Misc polish - Profile bio textarea: `!rounded-tr-md` so the top-right corner doesn't fight the surrounding `rounded-3xl`, plus a thin styled scrollbar (`scrollbar-thin scrollbar-thumb-zinc-200 hover:scrollbar-thumb-zinc-300`) that lives inside the rounded container instead of breaking out of it. ## How | File | Change | | --- | --- | | `backend/api/features/v1.py` | `get_user_timezone_route` now uses `get_user_by_id` + `Security(get_user_id)` instead of `get_or_create_user(user_data)` | | `frontend/.../preferences/usePreferencesPage.ts` | Split init into `initialFormState` (browser-fallback display) vs `initialSavedState` (raw server value); swap optimistic `setQueryData` for `invalidateQueries` after tz mutate | | `frontend/.../preferences/components/TimezoneCard/TimezoneCard.tsx` | Drop `initialValue` prop, remove auto-detect button + unused imports | | `frontend/.../preferences/page.tsx` | Drop `savedState`/`initialValue` wiring | | `frontend/.../api-keys/components/APIKeyList/APIKeyList.tsx` | Wrap rows in container `motion.div` with `staggerChildren`; per-row `motion.div` with opacity + y variants | | `frontend/.../api-keys/components/APIKeyListSkeleton/APIKeyListSkeleton.tsx` | Same stagger config so loading→loaded matches | | `frontend/.../integrations/components/IntegrationsList/IntegrationsList.tsx` + `IntegrationsListSkeleton.tsx` | Same pattern for the providers list | | `frontend/.../profile/components/ProfileForm/ProfileForm.tsx` | Tailwind classes only — `!rounded-tr-md` + `scrollbar-thin scrollbar-thumb-zinc-200 hover:scrollbar-thumb-zinc-300` | ## Test plan - [ ] On `/settings/preferences`: pick a different tz → Save → hard-refresh → new tz still selected. - [ ] First-time user (server tz = `not-set`): land on page, Save button should already be enabled; click Save → toast confirms; refresh → tz persists. - [ ] No "Looks like you're in X" button visible. - [ ] On `/settings/api-keys`: rows fade/slide in staggered on first mount; loading skeleton uses the same motion. - [ ] On `/settings/integrations`: provider groups fade/slide in staggered; skeleton matches. - [ ] OS "Reduce motion" enabled → no transforms, content appears instantly on all four surfaces. - [ ] On `/settings/profile`: bio textarea top-right corner is no longer hard-cornered against the card; scrollbar fits inside the rounded shape. - [ ] Existing unit tests still pass: `pnpm test:unit src/app/\(platform\)/settings/preferences` and `.../api-keys`. |
||
|
|
2879528308 |
feat(backend): Redis Cluster client support (#12900)
## Why Pre-launch scaling. Redis is currently a single-master pod — a real SPOF, and not scalable horizontally. To move it to a sharded Redis Cluster (via KubeBlocks in GKE), the backend has to speak the cluster protocol. Keeping both "standalone" and "cluster" code paths would have local dev not reflect prod. Going **cluster-only**. ## What - `backend.data.redis_client` now always constructs `RedisCluster` (sync) / `redis.asyncio.cluster.RedisCluster` (async). Type aliases `RedisClient` / `AsyncRedisClient` point at the cluster classes. - `RedisCluster` uses the existing `REDIS_HOST` / `REDIS_PORT` as a startup node and auto-discovers peers via `CLUSTER SLOTS`. - Classic Redis pub/sub is broadcast cluster-wide and redis-py's async `RedisCluster` has no `.pubsub()`; dedicated `get_redis_pubsub[_async]` helpers return plain `(Async)Redis` clients to the seed node. All pub/sub callers (`event_bus`, `notification_bus`, `copilot.pending_messages`) route through these helpers. - `rate_limit.py` MULTI/EXEC pipelines are split per-counter — daily and weekly counters hash to different slots, which `RedisCluster` correctly rejects as `CrossSlotTransactionError`. Per-counter `INCRBY + EXPIRE` atomicity is preserved; the counters are logically independent budgets. - `util/cache.py` shared-cache client is also `RedisCluster` now. - Pre-existing mock-based unit tests updated; new `redis_client_test.py` covers the swap. ## Local dev `docker-compose.platform.yml` now runs **2-master Redis Cluster** (`redis` + `redis-2`, 16384 slots split 0-8191 / 8192-16383). A one-shot `redis-init` sidecar bootstraps it on first boot via raw `CLUSTER MEET` + `CLUSTER ADDSLOTSRANGE` (bundled `redis-cli --cluster create` enforces a 3-node minimum). This deliberately catches cross-slot bugs on a laptop rather than in prod: ``` >>> ALL SMOKE TESTS PASS <<< [sync] class: RedisCluster [sync] 20 keys across slots: OK [sync] colocated MULTI/EXEC: OK [5, 12, 1] [sync] cross-slot MULTI/EXEC rejected as expected: CrossSlotTransactionError [sync] EVAL single-key: OK [sync] pub/sub (classic, broadcast): OK [async] class: RedisCluster [async] 15 keys across slots: OK [async] colocated pipeline: OK [async] pub/sub: OK ``` `rest_server` `/health` → 200, both shards have connected clients + keys distributed 19/19 under the smoke run. `executor` boots + connects to RabbitMQ + Redis cleanly. For a 3-shard override (6 pods, with replicas) when you want to test real KubeBlocks topology: ``` docker compose -f docker-compose.yml -f docker-compose.redis-cluster.yml up -d ``` ## Deploy order (companion infra PR: [cloud_infrastructure#312](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/312)) The existing `helm/redis` chart is updated in that PR to run as a 1-shard cluster (backwards-compatible toggle, default on). That rollout must land before this PR's image goes live so the backend's `RedisCluster` client has something to discover. Sequence: 1. Infra: `helm upgrade redis` (1-shard cluster-enabled) 2. Infra: `helm upgrade rabbit-mq` (3-node cluster) 3. Backend: merge + deploy this PR 4. Follow-up: swap to KubeBlocks `redis-cluster` chart (3-shard sharded, already staged in infra PR) ## Caveats / follow-ups - Classic pub/sub via seed node means every node in the cluster sees every message (broadcast). Fine at current volume; if it becomes hot, migrate to `SPUBLISH`/`SSUBSCRIBE` (Redis 7+ sharded pub/sub). - Per-user rate-limit counters (daily vs weekly) lost cross-counter transactionality, but per-counter atomicity is preserved — the two counters are independent budgets so no correctness regression. - Local 2-master cluster crashes lose the cluster state; `redis-init` idempotently rebootstraps. ## Checklist - [x] Lint + format pass (`poetry run format` + `poetry run lint`) - [x] Unit tests pass — `redis_client_test`, `redis_helpers_test`, `event_bus_test`, `pending_messages_test`, `rate_limit_test`, `cluster_lock_test` - [x] Live smoke against 2-master cluster — sync + async; MULTI/EXEC; EVAL; pub/sub; cross-slot rejection - [x] Full stack smoke — `rest_server` /health, `executor` boot, keys distributed across both shards - [ ] Dev deploy (pending infra PR merge + manual validation) |
||
|
|
1974ec6260 |
fix(frontend/copilot): fix streaming reconnect races, hydration ordering, and reasoning split (#12813)
## Summary Improves Copilot/AutoPilot streaming reliability across frontend and backend. The diff now covers the original streaming investigation issues plus follow-up CI and review fixes from the latest merge with `dev`. Addresses [SECRT-2240](https://linear.app/autogpt/issue/SECRT-2240), [SECRT-2241](https://linear.app/autogpt/issue/SECRT-2241), and [SECRT-2242](https://linear.app/autogpt/issue/SECRT-2242). ## Changes - Fixes reasoning vs response rendering so action tools such as `run_block` and `run_agent` do not cause assistant response text to be hidden inside the collapsed reasoning section. - Reworks Copilot session lifecycle handling: active-stream hydration, resume ordering, reconnect timeout recovery, wake resync, session deletion, title polling, stop handling, and session-switch stale callback guards. - Adds a per-session Copilot stream store/registry and transport helpers to prevent duplicate resumes, duplicate sends, and cross-session contamination during reconnect or reload flows. - Adds pending follow-up message support and backend pending-message safeguards, including sanitization of queued user content and requeue-on-persist-failure behavior. - Improves backend stream and executor robustness: active stream registry checks, bounded cancellation drain with sync fail-close fallback, Redis helper coverage, and updated SDK response adapter expectations for post-tool status events. - Adds and polishes usage-limit UI, including reset gate behavior, backdrop blending behind the usage-limit card, and usage panel/card coverage. - Fixes a chat input Enter-submit race where Playwright and fast users could fill the textarea and press Enter before React had re-enabled the submit button, causing the visible message not to send. - Refactors the Copilot page into smaller hooks/components and adds focused tests around stream recovery, hydration, pending queueing, rate-limit gates, and message rendering. ## Test plan - [x] `poetry run format` - [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py backend/copilot/executor/processor_test.py` - [x] `pnpm prettier --write` on touched frontend files - [x] `pnpm vitest run src/app/(platform)/copilot/components/ChatInput/__tests__/useChatInput.test.ts` - [x] `pnpm types` - [x] `pnpm lint` (passes with existing unrelated `next/no-img-element` warnings) - [ ] Full GitHub CI after latest push ## Review notes - The Sentry review thread about unbounded cancellation cleanup is addressed in `375ec9d5f`: cancellation now waits for normal async cleanup but exits after `_CANCEL_GRACE_SECONDS` and falls through to the sync fail-close path. - The previous backend CI failures were stale test expectations around the new `StreamStatus("Analyzing result…")` event after tool output; tests now assert that event explicitly. - The previous full-stack E2E failure was the Copilot input Enter race; the input now submits from the live form value instead of depending on a possibly stale disabled button state. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
932ecd3a07 |
fix(backend/copilot): normalize model name based on actual transport, not config shape (#12932)
## Summary When `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` is paired with a populated `CHAT_BASE_URL=https://openrouter.ai/api/v1` (e.g. left over from an earlier OpenRouter setup), the SDK was passing the OpenRouter slug `anthropic/claude-opus-4.7` straight through to the Claude Code CLI subprocess. The CLI uses OAuth and ignores `CHAT_BASE_URL`/`CHAT_API_KEY`, so it rejects the slug: > There's an issue with the selected model (anthropic/claude-opus-4.7). It may not exist or you may not have access to it. The bug was in `_normalize_model_name`, which gated on `config.openrouter_active` (config-shape check) instead of the transport the CLI actually uses for the turn. ## Changes - Add `ChatConfig.effective_transport` property returning `subscription` | `openrouter` | `direct_anthropic`, detected in that priority order. Subscription wins over OpenRouter config because the CLI subprocess uses OAuth and ignores the OpenRouter env vars (see `build_sdk_env` mode 1). - Switch `_normalize_model_name` to gate on `effective_transport`. Subscription and direct-Anthropic transports both produce the CLI-friendly hyphenated form (`claude-opus-4-7`) and reject non-Anthropic vendors loudly. - `_resolve_sdk_model_for_request` already routes any LD-served override through `_normalize_model_name`, so a per-user advanced-tier override under subscription now correctly becomes `claude-opus-4-7` instead of the OpenRouter slug. The standard-tier \"no LD override → return None\" behaviour is preserved. - Update two existing service tests to assert the corrected behaviour (Kimi LD override under subscription falls back to tier default normalised for the CLI; Opus advanced override returns hyphenated form). ## Test plan - [x] `poetry run pytest backend/copilot/sdk/service_helpers_test.py backend/copilot/sdk/service_test.py backend/copilot/config_test.py -v` — 165 passed. - [x] `poetry run pytest backend/copilot/sdk/env_test.py backend/copilot/sdk/p0_guardrails_test.py` — 136 passed (other call sites of `openrouter_active` unchanged). - [x] `poetry run ruff format` + `ruff check` clean on touched files. ### New tests added (service_helpers_test.py) - Subscription transport with OpenRouter base URL set + advanced-tier LD override → returns `claude-opus-4-7` (not the OpenRouter slug, not None). - Subscription transport with OpenRouter base URL set + standard-tier no override → returns None (existing behaviour preserved). - Subscription transport rejects non-Anthropic vendor (`moonshotai/...`) → ValueError. - `effective_transport` returns `subscription` when subscription is on regardless of OpenRouter config; returns `openrouter` when subscription is off and OpenRouter is fully configured; returns `direct_anthropic` otherwise. |
||
|
|
4a567a55a4 |
fix(backend/copilot): pause idle timer during pending tools (#12927)
## Summary Pause the SDK idle timer while a tool call is pending, with a 2-hour hung-tool cap as backstop. Fixes SECRT-2239 — long-running tools (10+ min, e.g. sub-agent execution) were being silently aborted by the 10-minute idle timeout introduced in #12660. ## What changed (backend only) - `_IDLE_TIMEOUT_SECONDS = 1800` (30 min) — soft cap when no tool pending (raised from 10 min) - `_HUNG_TOOL_CAP_SECONDS = 7200` (2 h) — hard cap when a tool is pending; protects against truly hung tool calls without false-aborting legitimate long-running ones - `_idle_timeout_threshold(adapter)` — returns the appropriate threshold based on whether any tool is currently pending in the adapter Backed by 7 regression tests in `service_test.py::TestIdleTimeoutThreshold`. ## Frontend coordination The original cherry-pick batch included a `useStreamActivityWatchdog` hook for client-side wire-silence detection. That hook is dropped from this PR because it overlaps with Lluis's #12813, which ships the same component as part of a comprehensive copilot streaming refactor. End state on dev: his PR contributes the watchdog, this PR contributes the backend pause + cap. ## Test plan - 7/7 unit tests in `backend/copilot/sdk/service_test.py::TestIdleTimeoutThreshold` pass - pyright clean on `service.py` + `service_test.py` - /pr-test --fix posted with native-stack run + screenshots: https://github.com/Significant-Gravitas/AutoGPT/pull/12927#issuecomment-4328320714 ## Linear SECRT-2239 |
||
|
|
2b28434786 |
feat(platform/backend): Filter store creators with approved agents (#10014)
Filtering store creators to only show profiles with an approved agent keeps the marketplace focused on usable inventory and prevents empty creator cards. ### Changes 🏗️ - add a `num_agents > 0` filter to `get_store_creators` - add a regression test ensuring we only return creators with approved agents - keep the existing SQL injection regression tests intact after rebasing onto `dev` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] python3 -m pytest autogpt_platform/backend/backend/server/v2/store/db_test.py -k get_store_creators_only_returns_approved *(blocked: repo environment lacks pytest and related deps)* <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Filter `get_store_creators` to creators with `num_agents > 0` and add a test to validate the behavior. > > - **Store backend**: > - Update `get_store_creators` in `backend/server/v2/store/db.py` to filter creators by `num_agents > 0`. > - **Tests**: > - Add `test_get_store_creators_only_returns_approved` in `backend/server/v2/store/db_test.py` to verify filtering and pagination count calls. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit c2fca584cce5a8c26dbdadd68696a0033642f193. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
5d1cdc2bad |
fix(backend/copilot): surface empty-success ResultMessage as stream error (SECRT-2252) (#12926)
## Summary - Detect ghost-finished sessions where the SDK returns a `ResultMessage` with `subtype="success"`, empty `result`, no produced content, and `output_tokens == 0`. - Emit `StreamError(code="empty_completion")` instead of silently calling `StreamFinish`, so the caller (and the user) sees the failure. ## Background Linear: [SECRT-2252](https://linear.app/agpt/issue/SECRT-2252) — SDK silent empty completion not retried, leaving the user with a blank stream (`start -> start-step -> finish-step -> finish`). ## Changes - `response_adapter.py::convert_message`: in the `ResultMessage` branch, check `_is_empty_completion()` before falling through to the existing success path. When matched, close any open step, emit `StreamError`, and skip `StreamFinish`. - `response_adapter.py::_is_empty_completion`: new helper that returns `True` only when `result` is falsy, no text/reasoning was emitted, no tool calls were registered, no tool results were seen, and `usage["output_tokens"]` is `0`. - `response_adapter_test.py`: 4 new unit tests covering empty-success (None and empty-string variants), non-empty success, and the non-empty-tokens-but-empty-result fallthrough. ## Out of scope (per ticket) - Retry-once behavior. This PR only surfaces the error; the caller decides retry semantics. Follow-up work can wire automatic retry on `code="empty_completion"`. ## Test plan - [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py` — all 58 tests pass (4 new + 54 existing). - [x] `poetry run pyright backend/copilot/sdk/response_adapter.py backend/copilot/sdk/response_adapter_test.py` — clean. ## Checklist - [x] My code follows the style of this project. - [x] I have added tests covering my changes. - [x] I have updated the documentation accordingly. (N/A — internal adapter behavior) |
||
|
|
3c08b90500 |
feat(frontend): preferences v2 page (SECRT-2279) (#12925)
### Why / What / How **Why** — Settings v2 needs a dedicated Preferences page covering account info (email, password reset), time zone, and notification preferences with a single, predictable save flow. The existing legacy page at `/profile/(user)/settings` mixes concerns and uses `__legacy__` UI primitives we are migrating away from. SECRT-2279. **What** — A new preferences page at `/settings/preferences` built from atomic / molecular design-system components. Three cards (Account, Time zone, Notifications) share one inline Save / Discard bar at the bottom. The page replaces the legacy settings page from a UX standpoint while keeping the same backend mutations. <img width="1511" height="899" alt="Screenshot 2026-04-27 at 6 43 09 PM" src="https://github.com/user-attachments/assets/5762fc41-1654-4764-8fbf-d5dd262e031a" /> **How** - **Page composition (`page.tsx`)** uses a single `usePreferencesPage` hook that owns dirty/saved/form state. Renders `PreferencesHeader`, `AccountCard`, `TimezoneCard`, `NotificationsCard`, and the inline `SaveBar` below the cards. - **Account card** — Email row shows the current address with a compact pencil button that opens a width-constrained `Dialog` for editing (Cancel / Update). Password row is a `NextLink` button that routes to `/reset-password`. - **Time zone card** — Single row inside a card: label + info-icon tooltip on the left; small-size `Select` and the GMT offset chip on the right. The "auto-detect" prompt shows up only when the saved tz differs from the browser tz, rendered as a pill on the right side of the card. - **Notifications card** — Tabs for Agents / Marketplace / Credits; toggling a switch flips a flag in formState and enables Save. - **Save flow** — `usePreferencesPage` keeps a `savedState` snapshot (mirrors the old `react-hook-form` `defaultValues` capture-once semantics) so the dirty check is fully decoupled from any backend GET refetch. After a successful mutation, `savedState ← formState`, and the timezone query cache gets an optimistic `setQueryData` write so the value isn't snapped back by the (cached) GET endpoint. - **Skeleton** — `PreferencesSkeleton` mirrors the real layout — header, Account card with the two row shapes, Time zone single row, Notifications tabs + toggle rows, and the Save / Discard buttons. - **Sidebar** — Renames the entry "Settings" → "Preferences" with `SlidersHorizontalIcon`. Profile and Creator Dashboard get clearer affordances (`UserIcon`, `ChartLineUpIcon`). ### Changes 🏗️ - New page: `src/app/(platform)/settings/preferences/page.tsx` and `usePreferencesPage.ts` - New components: `AccountCard`, `TimezoneCard`, `NotificationsCard`, `PreferencesHeader`, `PreferencesSkeleton`, `SaveBar` - Helpers: `helpers.ts` (timezones list, GMT-offset formatter, dirty utilities, notification-group definitions) - Sidebar: rename "Settings" → "Preferences", swap to `SlidersHorizontalIcon` + cleaner Profile / Creator Dashboard icons (`SettingsSidebar/helpers.ts`) - Tests: - `preferences/__tests__/main.test.tsx` — page render, edit-email dialog open/cancel, time zone info trigger, Save/Discard disabled-on-clean, notification toggle → save submission, discard revert - Updated `SettingsSidebar.test.tsx` and `SettingsMobileNav.test.tsx` for the "Preferences" label ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Sidebar shows "Preferences" with the new icon; clicking routes to `/settings/preferences` - [ ] Account card: pencil button opens a 420px-wide dialog; Update button stays disabled until the email differs and is valid; Cancel closes the dialog - [ ] Password row: "Reset password" button navigates to `/reset-password` - [ ] Time zone: changing the select enables Save; clicking Save persists the value and the form keeps showing the saved value (does not snap back), the inline GMT offset chip updates, info tooltip appears on hover - [ ] Auto-detect prompt appears only when saved tz ≠ browser tz; clicking it sets the select to the browser tz - [ ] Notifications: toggling any switch enables Save; saving flips the flag in the request body; switching tabs preserves toggles; Discard reverts unsaved toggles - [ ] Save / Discard render below the cards on the right and stay disabled until any field is dirty - [ ] Loading state shows the new skeleton that mirrors the layout - [ ] `pnpm test:unit` passes (covers the page-level integration tests above) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) |
||
|
|
599f370206 |
feat(frontend): add settings v2 profile page (#12924)
### Why / What / How
**Why:** SECRT-2278 — settings v2 needs a Profile page so users can
manage how they appear on the marketplace (display name, handle, bio,
links, avatar) without hopping into the legacy settings UI.
**What:** Adds the `/settings/profile` page end-to-end:
- Form fields for display name, handle, bio, avatar, and up to 5 links —
wired to the `getV2GetUserProfile`, `postV2UpdateUserProfile`, and
`postV2UploadSubmissionMedia` endpoints.
- Bio editor gets a markdown toolbar (bold / italic / strikethrough /
link / bulleted list) with a live preview toggle that renders via
`react-markdown` + `remark-gfm`.
- Save/Discard bar with full validation (handle regex, bio length, dirty
tracking) and toast feedback for success and failure paths.
- Forward refs through the `Input` atom so consumers can target the
underlying `<textarea>` / `<input>` (needed for the toolbar's
selection/cursor manipulation).
- Comprehensive integration tests (Vitest + RTL + MSW) at the page level
plus pure-helper unit tests, in line with the project's
"integration-first" testing strategy. Coverage is reported via
`cobertura` for Codecov.
**How:**
- The toolbar applies markdown syntax by reading `selectionStart` /
`selectionEnd` from a forwarded textarea ref. To avoid the textarea
jumping to the top on click: buttons `preventDefault` on `mousedown` (so
the textarea keeps focus), and the handler captures `scrollTop` before
mutation and restores it (with `focus({ preventScroll: true })`) after
React commits the new value in the next animation frame.
- The preview pane styles markdown elements via Tailwind arbitrary child
selectors (`[&_ul]:list-disc` etc.) instead of pulling in
`@tailwindcss/typography`, since the plugin isn't installed and the
project's `prose` usage was a no-op.
- Profile data hydration tolerates nullish API fields by mapping through
`profileToFormState`, padding `links` to 3 slots so the UI always has
the initial layout.
- Tests use Orval-generated MSW handlers from `store.msw.ts`, mock
`useSupabase` to inject an authenticated user, and assert UI behavior
via Testing Library queries.
### Changes 🏗️
- New: `app/(platform)/settings/profile/__tests__/helpers.test.ts`,
`__tests__/main.test.tsx`
- Updated: `settings/profile/page.tsx`, `useProfilePage.ts`,
`helpers.ts`, plus `ProfileForm`, `ProfileHeader`, `ProfileSkeleton`,
`LinksSection`, `SaveBar`
- Updated: `settings/layout.tsx` (settings v2 chrome adjustments to host
the profile page)
- Atom change: `components/atoms/Input/Input.tsx` now forwards refs
(`HTMLInputElement | HTMLTextAreaElement`) — backward-compatible for
existing consumers
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open `/settings/profile` and confirm the page hydrates the
existing display name, handle, bio, avatar, and links
- [x] Edit each field and verify validation messages (empty name,
invalid handle with spaces, bio over 280 chars)
- [x] Bio markdown toolbar: select text and click Bold / Italic / Strike
— selection wraps, cursor stays in place, textarea does not scroll to
top
- [x] Bio toolbar with no selection: each button inserts the markdown
placeholder template
- [x] Click Bulleted list twice on the same line — line is prefixed with
`- ` only once
- [x] Toggle Preview — bio renders bullets, bold, italic, strikethrough,
links correctly; toolbar buttons dim and become inert
- [x] Toggle Edit — textarea returns with the same content
- [x] Add link → 4th and 5th slots appear; the 6th attempt is blocked by
the "Limit of 5 reached" button label
- [x] Remove link 1 — the rest reorder correctly
- [x] Avatar upload — happy path replaces the avatar; failure path
surfaces a destructive toast
- [x] Save with valid data → success toast, query invalidates, save
button disables until next edit
- [x] Save with a server 422 → destructive toast, no state corruption
- [x] Discard reverts every field back to the loaded profile
- [x] `pnpm test:unit` passes locally; `coverage/cobertura-coverage.xml`
shows ≥ 80% line coverage for `src/app/(platform)/settings/profile/**`
|
||
|
|
8786c00f9c |
feat(blocks): add Claude Opus 4.7 model support (#12826)
Requested by @Bentlybro Anthropic released [Claude Opus 4.7](https://www.anthropic.com/news/claude-opus-4-7) today. This PR adds it to the platform's supported model list. ## Why Users and developers need access to `claude-opus-4-7` via the platform's LLM block and API. The model is available on Anthropic's API today. ## What - Adds `CLAUDE_4_7_OPUS = "claude-opus-4-7"` to the `LlmModel` enum - Adds corresponding `ModelMetadata` entry: 200k context, 128k output, price tier 3 ($5/M input, $25/M output — same as Opus 4.6) ## How Two lines added to `llm.py`, following the exact same pattern as all other Anthropic model additions. No migrations, no frontend changes needed — the frontend reads model metadata from the backend's JSON schema endpoint automatically. Closes SECRT-2248 --------- Co-authored-by: Bentlybro <Github@bentlybro.com> |
||
|
|
384cbd3ccd |
fix(frontend): redirect www to non-www with 308 to preserve request method (#9188) (#12920)
## Summary Fixes #9188 — redirects `www.` to non-www to prevent cookie/auth domain mismatch. Uses **308** (Permanent Redirect) instead of 301, which preserves the HTTP method and request body. This is important because the middleware matcher runs on `/auth/authorize` and `/auth/integrations/*`, where OAuth callbacks may use POST. Split from #12895 per reviewer request. ## Changes - Added www→non-www redirect in Next.js middleware using `NextResponse.redirect(url, 308)` --------- Co-authored-by: majdyz <zamil.majdy@agpt.co> |
||
|
|
8be9cf70af |
fix(frontend): filter null query params in buildUrlWithQuery (#11237) (#12921)
## Summary Fixes #11237 — `buildUrlWithQuery` now filters out `null` values in addition to `undefined`, preventing them from being serialized as literal `"null"` strings in URL query parameters. Split from #12895 per reviewer request. ## Changes - Added `value !== null` check alongside existing `value !== undefined` in `buildUrlWithQuery` --------- Co-authored-by: majdyz <zamil.majdy@agpt.co> |
||
|
|
a723966e0b |
feat(platform): settings v2 integrations page + provider description SDK (#12911)
### Why / What / How
**Why:** The settings v2 integrations surface renders from a hardcoded
`MOCK_PROVIDERS` array — no real user credentials, no delete, no way to
connect a new service. Provider display metadata (descriptions +
supported auth types) was scattered across frontend maps with no backend
source of truth, leaving each new provider to be manually registered in
two places.
**What:** Full-featured settings v2 integrations page driven by live
backend data, plus a backend SDK extension so every provider carries a
description **and** declares its supported auth types. Settings UI uses
both to render the connect-a-service dialog: descriptions in the list,
auth types to pick the right tabs in the detail view.
**How:**
- **Credentials list** — single-fetch via `useGetV1ListCredentials`,
grouped client-side by provider, debounced (250 ms) Unicode-normalized
in-memory search (no roundtrip per keystroke), managed/system creds
filtered via the shared `filterSystemCredentials` helper from
`CredentialsInput`. Loading → skeletons that mirror the real accordion
shape, error → `ErrorCard` with retry, empty → `IntegrationsListEmpty`
with custom marquee illustration.
- **Delete flow** — `useDeleteIntegration` returns per-target `succeeded
/ failed / needsConfirmation` so the UI can name failed items and keep
them selected for one-click retry. Single + bulk both gated by
`DeleteConfirmDialog`. Per-row delete button disables + shows a spinner
via `isDeletingId` so double-clicks can't fire two requests. Success
toast names the credential ("Removed GitHub key").
- **Connect-a-service dialog** — backend-driven (`useGetV1ListProviders`
returns `ProviderMetadata[]` with description + supported_auth_types),
Emil-spec animations (150 ms ease-out step swap, 200 ms ease-out height
resize, 180 ms entry fade+slide+blur on tab swap, all respecting
`prefers-reduced-motion`). Detail view picks tab order via deterministic
`TAB_PRIORITY` (oauth → api_key → user_password → host_scoped) and
remembers last-selected tab per provider for the session.
- **OAuth tab** → `openOAuthPopup` + `getV1InitiateOauthFlow` →
`postV1ExchangeOauthCodeForTokens`
- **API key tab** → zod-validated form with
`autoComplete="new-password"` + `spellCheck=false` so browsers don't
autofill the wrong stored key → `postV1CreateCredentials`
- **Provider metadata SDK** — chainable
`ProviderBuilder.with_description(...)` +
`.with_supported_auth_types(...)` (the latter populated automatically by
`with_oauth` / `with_api_key` / `with_managed_api_key` /
`with_user_password`; explicit form reserved for legacy providers whose
auth lives outside the builder chain). `GET /integrations/providers`
upgraded from `List[str]` → `List[ProviderMetadata]` carrying both
fields.
- **Backward compat** — `BackendAPI.listProviders()` maps the new
`ProviderMetadata[]` shape down to `string[]` so the deprecated
`CredentialsProvider` (used by the builder/library credential pickers)
keeps working without ripple changes.
- **Routing** — page lives at `/settings/integrations` directly. No
feature flag gate (settings v2 layout is already on dev).
### Changes 🏗️
**Backend**
- `backend/sdk/builder.py` — `with_description()` +
`with_supported_auth_types()` chain methods; the latter is
auto-populated by every existing auth-method chain call so explicit
declaration is only needed for legacy providers.
- `backend/sdk/provider.py` — `description` + `supported_auth_types`
fields on `Provider`.
- `backend/api/features/integrations/router.py` — `GET /providers` now
returns `List[ProviderMetadata]`; calls `load_all_blocks()` (cached
`@cached(ttl_seconds=3600)`) before reading `AutoRegistry`.
- `backend/api/features/integrations/models.py` — `ProviderMetadata`
with `name + description + supported_auth_types`;
`get_provider_description` + `get_supported_auth_types` helpers reading
from `AutoRegistry`.
- 13 existing `_config.py` files updated with `.with_description(...)`:
agent_mail, airtable, ayrshare, baas, bannerbear, dataforseo, exa,
firecrawl, linear, stagehand, wordpress, wolfram, generic_webhook.
- 20 new `_config.py` files (one per provider block dir): apollo,
compass, discord, elevenlabs, enrichlayer, fal, github, google, hubspot,
jina, mcp, notion, nvidia, replicate, slant3d, smartlead, telegram,
todoist, twitter, zerobounce. Each declares
`with_supported_auth_types(...)` because their auth handlers live in
`backend/integrations/oauth/` (legacy) or are block-level
`CredentialsMetaInput` declarations — outside the builder chain.
- 1 new `backend/blocks/_static_provider_configs.py` registering
description + auth types for ~24 providers that live in shared files
(openai, anthropic, groq, ollama, open_router, v0, aiml_api, llama_api,
reddit, medium, d_id, e2b, http, ideogram, openweathermap, pinecone,
revid, screenshotone, unreal_speech, webshare_proxy, google_maps, mem0,
smtp, database). Comment documents the migration path (each entry
retires when the provider graduates to its own `_config.py`).
**Frontend**
- `src/app/(platform)/settings/integrations/page.tsx` — replaces mock
page; composes header + list + connect dialog.
-
`src/app/(platform)/settings/integrations/components/IntegrationsList/`
— list + skeleton + selection (Record<string, true> instead of Set) +
delete orchestration hook.
-
`src/app/(platform)/settings/integrations/components/ConnectServiceDialog/`
— split per the 200-line house rule into `ConnectServiceDialog`,
`ListView`, `ProviderRow`, `useMeasuredHeight`. DetailView's nested
helpers extracted to siblings: `MethodPanel`, `UnsupportedNotice`,
`ProviderAvatar`. Tabs render in deterministic priority order;
last-selected tab persisted per provider in module-scope.
-
`src/app/(platform)/settings/integrations/components/DeleteConfirmDialog/`
— new confirm dialog gating single + bulk deletes (shows up to 3 names +
remaining count for bulk).
-
`src/app/(platform)/settings/integrations/components/IntegrationsListEmpty/components/IntegrationsMarquee.tsx`
— switched from `next/image unoptimized` to plain `<img loading="lazy"
decoding="async">` for decorative logos (no LCP candidate, avoids Next
Image runtime overhead).
-
`src/app/(platform)/settings/integrations/components/hooks/useDeleteIntegration.ts`
— bulk delete now returns per-target
`succeeded/failed/needsConfirmation`; failed items stay selected for
retry; per-id pending tracking via `isDeletingId`; toast names the
credential.
-
`src/app/(platform)/settings/integrations/components/hooks/useDebouncedValue.ts`
— small reusable debounce hook (250 ms, used by both list + dialog
search).
- `src/app/(platform)/settings/integrations/helpers.ts` —
`formatProviderName` guarded against non-string input; `filterProviders`
now Unicode-normalized (NFKD + strip combining marks) so accented
queries match.
- `src/providers/agent-credentials/helper.ts` — `toDisplayName` same
`typeof string` guard.
- `src/components/contextual/CredentialsInput/helpers.ts` — loosened
`filterSystemCredentials` / `getSystemCredentials` generic constraint to
accept `title?: string | null` so it consumes `CredentialsMetaResponse`
directly.
- `src/lib/autogpt-server-api/client.ts` — `listProviders()` maps the
new shape to `string[]` for backward compat.
- `src/app/api/openapi.json` — regenerated spec includes
`ProviderMetadata` with `supported_auth_types`.
### PR feedback addressed 🛠️
Round of fixes after the first review pass:
- Bulk delete: per-item names in failure toast, failed items kept
selected for retry.
- Confirmation dialog before any delete (single or bulk) —
`DeleteConfirmDialog`.
- Per-row delete button disabled + spinner while pending (no
double-click double-fire).
- Toast names the credential ("Removed GitHub key") instead of generic
copy.
- API key input: `autoComplete="new-password"` + `spellCheck=false`;
title field `autoComplete="off"`.
- Search debounced 250 ms on both list + dialog; Unicode-normalized so
"Açai" matches "acai".
- `toDisplayName` / `formatProviderName` guarded against non-string
input (`provider.split is not a function` was reproducible).
- Skeleton mirrors the real accordion shape — no layout shift on data
load.
- Selection bar sticky position fixed for <375 px (`top-2 sm:top-0`).
- Last-selected auth tab persisted per provider for the session.
- Tabs ordered deterministically (oauth → api_key → user_password →
host_scoped) instead of insertion order.
- `useMemo` removed from `useIntegrationsList` per project rule (no
measured perf need).
- Selection state migrated from `Set<string>` to `Record<string, true>`
(idiomatic React state shape).
- ConnectServiceDialog 288 LoC → ~130 (extracted `ListView`,
`ProviderRow`, `useMeasuredHeight`); DetailView helpers → siblings.
- `next/image unoptimized` → plain `<img>` for decorative logos in
marquee + provider rows + avatar.
- `with_supported_auth_types(...)` pruned in 11 `_config.py` files where
it was redundant with `with_oauth` / `with_api_key` /
`with_managed_api_key` / `with_user_password`. Kept in legacy ones
(github, discord, google, notion, ...) where the docstring says it's
required because auth handlers live outside the builder chain.
- Tab swap + dialog step animations re-tuned vs Emil Kowalski's
animation rules: ease-out default, under 300 ms,
transform+opacity+filter only, blur-bridge to soften swap, willChange to
dodge the 1px shift, reduced-motion fallbacks via `useReducedMotion`.
- Merged latest `dev` (api-keys SECRT-2273 took dev's version, no
api-keys diff in this PR; settings layout took dev's version, no
`SETTINGS_V2` feature flag in this PR; `scroll-area` took dev's
version).
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Navigate to `/settings/integrations` — loading skeletons appear,
then user's real credentials grouped by provider.
- [ ] Type in the search box — filters client-side after 250 ms, no new
network requests (DevTools → Network stays quiet). Try "açai" with
diacritics — matches "acai" providers.
- [ ] Connect a service → dialog loads provider list with backend
descriptions; search matches name, slug, and description.
- [ ] Click a provider → dialog tweens height smoothly (no jump); header
shows provider avatar + name + description; tabs render in oauth →
api_key priority; last-used tab restored on reopen.
- [ ] Open `linear` (oauth + api_key) — switching tabs animates with a
quick fade+slide+blur entry, no flash.
- [ ] OAuth tab → "Continue with X" opens popup, completes consent,
popup closes, dialog closes, new credential appears with success toast.
- [ ] API key tab → paste a key (browser does NOT offer to autofill any
stored password), Save → toast names the credential, dialog closes.
- [ ] Delete (single) via trash icon → confirmation dialog → button
disables with spinner during the request → toast names the credential.
- [ ] Delete (bulk) via selection bar → confirmation lists up to 3 names
→ if some fail, failed ones stay selected for retry; toast lists which
failed.
- [ ] Double-click a delete button rapidly — only one request fires.
- [ ] Managed credentials (e.g. "Use Credits for AI/ML API") do **not**
appear in the list.
- [ ] Test on a fresh account (no credentials) — `IntegrationsMarquee`
empty state renders.
- [ ] Throttle network to Slow 3G — skeleton (mirroring real shape)
visible, then list slides in.
- [ ] Block `/api/integrations/credentials` → `ErrorCard` with retry.
- [ ] `curl /api/integrations/providers` returns `[{ name, description,
supported_auth_types }, ...]` with every provider carrying both fields.
- [ ] `prefers-reduced-motion: reduce` set → all motion collapses to
opacity-only fades.
- [ ] On <375 px viewport — selection bar clears the mobile nav.
- [ ] `pnpm format && pnpm lint && pnpm types && pnpm test:unit` all
pass.
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**) — no flag changes in this PR.
|
||
|
|
5b1d9763ed |
fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit (#12918)
## Background [SECRT-2275](https://linear.app/autogpt/issue/SECRT-2275). User report: when a copilot ("autopilot") turn is interrupted by a usage-limit, tool-call-limit, or other run interruption, the user's recent work disappears. User described it as: "my initial message was lost 3 times and it disappeared, then when I would say 'continue' it would do a random old task." Investigation surfaced two distinct failure modes. This PR addresses both. - **Mode 1** — rate-limit (or other pre-stream rejection) at turn start: the user's text only ever lives in the optimistic `useChat` bubble; the backend rejects before the message is persisted, so the bubble is a lie and a refresh / retry would lose the text. - **Mode 2** — long-running turn interrupted mid-stream: the entire turn's progress (assistant text, tool calls, reasoning) vanishes on interruption — what users describe as "the turn is gone." ## Mode 1 — frontend: restore unsent text on 429 Backend can't recover this on its own: `check_rate_limit` raises before `append_and_save_message`, so by the time the 429 surfaces there is no DB row to roll forward. See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922` (rate-limit check) and `routes.py:945` (later append-and-save). Frontend fix in `autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts`: when `useChat`'s `onError` reports a usage-limit error, we - drop the optimistic user bubble (DB has no record of it, so leaving it would be a phantom), - push `lastSubmittedMsgRef.current` back into the composer via the existing `setInitialPrompt` slot — the same slot URL pre-fills use, so `useChatInput`'s `consumeInitialPrompt` effect picks it up automatically, - clear `lastSubmittedMsgRef` so the dedup guard doesn't block re-send. In-memory only; surviving a hard refresh while rate-limited is a separate follow-up (would need localStorage persistence with TTL). Test: `autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts` — verifies the composer is repopulated and the optimistic bubble is dropped on a 429. ## Mode 2 — backend: preserve interrupted partial in DB ### Root cause The SDK retry loop in `stream_chat_completion_sdk` always rolls back `session.messages` to the pre-attempt watermark on any exception. That rollback is correct **before a retry** so attempt #2 doesn't duplicate attempt #1's content. But it runs **before the retry decision is made**, so when retries are exhausted (or no retry is attempted) the partial work is discarded too. Three branches of the retry loop ended in a final-failure state with side effects worse than just losing the partial: - `_HandledStreamError` non-transient: rollback then add error marker — partial gone - `Exception` with `events_yielded > 0`: rollback then break — **no error marker added either**, so on refresh the chat looks like nothing happened even though the user just watched tokens stream live - `Exception` non-context-non-transient + the while-`else:` exhaustion path: same, no marker - Outer except (cancellation, GeneratorExit cleanup): didn't restore captured partial ### Fix `autogpt_platform/backend/backend/copilot/sdk/service.py`: 1. **`_InterruptedAttempt` dataclass** — holds the rolled-back `partial: list[ChatMessage]` + optional `handled_error: _HandledErrorInfo`. Three methods drive the contract: - `capture(session, transcript_builder, transcript_snap, pre_attempt_msg_count)` — slices `session.messages`, restores the transcript, strips trailing error markers to prevent duplicate markers after restore. - `clear()` — drops captured state on a successful retry so outer cleanup paths don't replay pre-retry content. - `finalize(session, state, display_msg, retryable=...) -> list[StreamBaseResponse]` — re-attaches partial, synthesizes `tool_result` rows for orphan `tool_use` blocks, appends the canonical error marker, and returns the flushed events so the caller can yield them to the client (no double-flush). 2. **`_flush_orphan_tool_uses_to_session(session, state) -> list[StreamBaseResponse]`** — synthesizes `tool_result` rows for any `tool_use` that never resolved before the error so the next turn's LLM context stays API-valid (Anthropic rejects orphan tool_use). Uses the public `adapter.flush_unresolved_tool_calls` and returns the events for the caller to yield. 3. **`_classify_final_failure(...) -> _FinalFailure | None`** — picks the display message + stream code + retryable flag for the final-failure exit. One source of truth for the in-history error marker and the client-facing `StreamError` SSE yield so they can't drift. 4. **Consolidated post-loop emit**: the former three scattered blocks (partial restore + redundant re-flush + two separate `yield StreamError` sites) collapsed to one block driven by `_classify_final_failure` → `_FinalFailure` → `finalize()` → yield events + single `StreamError`. 5. **Adapter `flush_unresolved_tool_calls`** (renamed from `_flush_unresolved_tool_calls` to drop the `# noqa: SLF001` suppressors on cross-module callers). Each retry-loop rollback site calls `interrupted.capture(...)`; the success break calls `interrupted.clear()`; the post-loop failure block calls `interrupted.finalize(...)` exactly once. The baseline service already preserves partial work via its existing finally block — no change needed there. ## Tests Backend (`backend/copilot/sdk/interrupted_partial_test.py`, new, 18 tests): - `TestInterruptedAttemptCapture` — slice semantics + stale-marker stripping - `TestInterruptedAttemptFinalize` — appends partial then marker, handles empty partial, no-op on `None` session, flushes unresolved tools between partial and marker, returns flushed events for caller to yield - `TestFlushOrphanToolUses` — synthesizes `tool_result` rows, returns events, no-op on None state / no unresolved - `TestClassifyFinalFailure` — handled_error wins, attempts_exhausted, transient_exhausted, stream_err fallback, returns None on success path - `TestRetryRollbackContract` — end-to-end: capture + finalize yields the exact content the user saw streaming live plus the error marker 1022 total SDK tests pass (baseline + new). Frontend (`useCopilotStream.test.ts`): 1 new test — `restores the unsent text and drops the optimistic user bubble on 429 usage-limit`. ## Out of scope - Frontend rendering tweaks for the interrupted-turn marker (existing error-marker rendering already works). - Refresh-survival of the unsent text in Mode 1 (would require localStorage persistence with TTL) — separate follow-up. - Hard process-kill / OOM where Python `finally` doesn't run — needs a different mechanism (pod-level checkpoint sweeper). ## Checklist - [x] My code follows the style guidelines of this project (black/isort/ruff via `poetry run format`) - [x] I have performed a self-review of my own code - [x] I have added relevant unit tests - [x] I have run lint and tests locally (1022 SDK tests pass) ## Test plan - [ ] Verify a long-running turn that hits transient-retry exhaustion preserves partial assistant text + tool results in chat history after refresh - [ ] Verify the next user message after an interrupted turn carries enough context that the model can continue the prior task instead of inventing a new one - [ ] Verify a successful retry (attempt #1 fails, attempt #2 succeeds) shows ONLY attempt #2's content (no leaked partial from #1) - [ ] Verify hitting daily usage limit at turn start re-populates the composer with the unsent text and removes the optimistic user bubble --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
10ea46663f |
fix(backend/notifications): atomic upsert + drop eager include (#12919)
## Problem
`create_or_add_to_user_notification_batch` does:
1. `find_unique(..., include={"Notifications": True})` — loads ALL
notifications for the batch (thousands for heavy AGENT_RUN users),
causing Postgres `statement_timeout` in dev.
2. Find-then-update is non-atomic — concurrent invocations either hit
`@@unique([userId, type])` violations or drop notifications.
Real Sentry: `canceling statement due to statement timeout` on this
exact query, traced to `database-manager` pod.
## Fix
Use Prisma `upsert` (atomic) and skip the eager include. Only load
notifications if the caller actually needs them (audited — the sole
caller `NotificationManager._should_batch` ignores the returned DTO and
separately fetches the oldest message via
`get_user_notification_oldest_message_in_batch`).
## Tests
- Single + existing-batch upsert paths
- Concurrent race regression
## Out of scope
Unrelated to PR #12900 (Redis Cluster migration). Separate change.
|
||
|
|
06188a86a6 |
refactor(platform/copilot): consolidate 4 model-routing LD flags into 1 JSON flag (#12917)
## What
Replaces 4 string-valued LaunchDarkly flags with a single JSON-valued
flag for copilot model routing:
- ~~`copilot-fast-standard-model`~~
- ~~`copilot-fast-advanced-model`~~
- ~~`copilot-thinking-standard-model`~~
- ~~`copilot-thinking-advanced-model`~~
**New:** `copilot-model-routing` (JSON), keyed `{mode: {tier: model}}`:
```json
{
"fast": { "standard": "anthropic/claude-sonnet-4-6", "advanced": "anthropic/claude-opus-4-6" },
"thinking": { "standard": "moonshotai/kimi-k2.6", "advanced": "anthropic/claude-opus-4-6" }
}
```
## Why
Same pattern as the sibling consolidation in #12915 (pricing /
cost-limits flags) and the merged #12910 (tier-multipliers):
- One flag per config domain — less LD UI clutter, easier audit trail.
- Atomic updates — rotating fast.standard + thinking.standard is a
single save.
- Fewer LD entities to name, version, target, explain.
- Mirrors the now-uniform copilot-* JSON-flag shape.
## How
- `backend/util/feature_flag.py`: drop the four `COPILOT_*_MODEL` enum
values, add `COPILOT_MODEL_ROUTING`.
- `backend/copilot/model_router.py`: rewrite `resolve_model` to fetch
the JSON flag once per call and walk `payload[mode][tier]`. Missing
mode, missing tier-within-mode, non-string cell value, non-dict payload,
or LD failure all fall back to the corresponding `ChatConfig` default
(same user-visible semantics as before). `_FLAG_BY_CELL` removed
entirely; `_config_default` / `ModelMode` / `ModelTier` unchanged.
- Per-user LD targeting preserved — cohorts can still receive different
routing.
- No caching added (preserves existing uncached behaviour).
- Docstring references in `copilot/config.py` + `copilot/sdk/service.py`
updated to point at the new nested key path; one docstring in
`service_test.py` likewise.
## Operator action required BEFORE merging
This PR removes 4 LD flags and introduces 1 replacement.
1. In LaunchDarkly, create `copilot-model-routing` (type: **JSON**,
server-side only). Default variation = union of the current four string
flags, shaped as:
```json
{
"fast": { "standard": "<current copilot-fast-standard-model>",
"advanced": "<current copilot-fast-advanced-model>" },
"thinking": { "standard": "<current copilot-thinking-standard-model>",
"advanced": "<current copilot-thinking-advanced-model>" }
}
```
Omit any cell that's currently unset (its `ChatConfig` default will be
used).
2. Merge this PR.
3. After deploy + smoke, delete the four legacy flags:
- `copilot-fast-standard-model`
- `copilot-fast-advanced-model`
- `copilot-thinking-standard-model`
- `copilot-thinking-advanced-model`
## Testing
- `backend/copilot/model_router_test.py` rewritten — 27 tests pass:
- LD unset / `None` payload → fallback for every cell.
- Full JSON → each cell maps to its value (parametrized).
- Partial JSON (missing mode, missing tier-within-mode, mode value not a
dict).
- Non-dict payloads (str / list / int / bool) → fallback + warning.
- Non-string cell values (number, list, bool, dict) → fallback +
'non-string' warning.
- Empty-string cell → fallback + 'empty string' warning (not
'non-string').
- LD raises → fallback + warning with `exc_info`.
- `user_id=None` → skip LD entirely.
- Single-LD-call regression guard against re-introducing per-cell flag
fan-out.
- `backend/copilot/sdk/service_test.py`: 61 tests still pass (it mocks
`_resolve_thinking_model_for_user`, so the inner flag change is
transparent).
- `black --check` / `ruff check` / `isort --check` all clean.
## Sibling
- #12915 — same consolidation pattern for stripe-price / cost-limits
flags.
## Checklist
- [x] I have read the project's contributing guide.
- [x] I have clearly described what this PR changes and why.
- [x] My code follows the style guidelines of this project.
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] New and existing unit tests pass locally with my changes (CI will
confirm).
|
||
|
|
2deac2073e |
fix(block_cost_config): audit + correct stale LLM/block rates + migrate generic ReplicateModelBlock to COST_USD (#12912)
## Why PR #12909's pricing refresh was sourced from aggregators (pricepertoken, blog mirrors) instead of provider pricing pages. Follow-up audit against **official provider docs** caught **22 stale entries** — 9 LLM token rates + 12 non-LLM block rates + 1 block that needed a code refactor to bill dynamically. Also flagged by Sentry: Mistral models were sitting on the wrong provider's rate table. Cross-verified JS-rendered pages (docs.x.ai, DeepSeek, Kimi) via agent-browser. ## Corrections applied ### LLM TOKEN_COST (9 entries) | Model | Old | New | Reason | |---|---|---|---| | `GPT5` | 94/750 | **188/1500** | Was OpenAI Batch API rate; Standard is $1.25/$10 | | `DEEPSEEK_CHAT` | 42/63 | **21/42** | Unified to deepseek-v4-flash at $0.14/$0.28 (Sept 2025) | | `DEEPSEEK_R1_0528` | 82/329 | **21/42** | Same v4-flash routing | | `MISTRAL_LARGE_3` | 300/900 | **300/900** (restored after brief 75/225 detour) | Routes via OpenRouter ($2/$6), not Mistral direct | | `MISTRAL_NEMO` | 3/6 → 23/23 | **5/5** | Routes via OpenRouter ($0.035/$0.035); Mistral-direct $0.15 doesn't apply | | `KIMI_K2_0905` | 82/330 | **90/375** | Matches K2 family $0.60/$2.50 | | `KIMI_K2_5` | 90/450 | **66/300** | OpenRouter pass-through $0.44/$2 | | `KIMI_K2_6` | 143/600 | **112/698** | OpenRouter pass-through $0.7448/$4.655 | | `META_LLAMA_4_MAVERICK` | 30/90 | **75/116** | Groq $0.50/$0.77 (deprecated 2026-02-20) | ### Non-LLM BLOCK_COSTS — rate corrections (11 entries) Under-billing fixes: - `AIVideoGeneratorBlock` (FAL) SECOND 3 → **15 cr/s** - `CreateTalkingAvatarVideoBlock` (D-ID) RUN 15 → **100 cr** - Nano Banana Pro/2 across 3 blocks: RUN 14 → **21 cr** - `UnrealTextToSpeechBlock` RUN 5 → **COST_USD 150 cr/$** (block now emits `chars × $0.000016`) Over-billing fixes: - `IdeogramModelBlock` default 16 → **12**, V_3 18 → **14** - `AIImageEditorBlock` FLUX_KONTEXT_MAX 20 → **12** - `ValidateEmailsBlock` 250 → **150 cr/$** - `SearchTheWebBlock` 100 → **150 cr/$** - `GetLinkedinProfilePictureBlock` 3 → **1 cr** ### Non-LLM BLOCK_COSTS — block refactored for dynamic billing (1 entry) - **`ReplicateModelBlock`** (the generic "run any Replicate model" wrapper) migrated from flat RUN 10 cr → **COST_USD 150 cr/$**. Block now uses `client.predictions.async_create + async_wait` instead of `async_run(wait=False)` so it can read `prediction.metrics.predict_time` and bill `predict_time × $0.0014/s` (Nvidia L40S mid-tier, where most popular public models run). Additionally (addressing CodeRabbit's critical review on this refactor): `async_wait()` returns normally regardless of terminal status — it doesn't raise on `failed`/`canceled` like the old `async_run` did. The block now explicitly checks `prediction.status` after `async_wait()` and raises `RuntimeError` on `failed` (with `prediction.error` as context) or `canceled` **before** `merge_stats`, so failed runs are never billed for partial compute time. **Why this matters:** flat 10 cr was 10–500× under-billing long video/LLM runs (users could wire in a $50/hr A100 Llama inference and pay us $0.10). It was also 20× over-billing trivial SDXL runs. Now scales with real compute time AND no longer bills failed predictions. ### Documentation-only - **Grok legacy models** (grok-3, grok-4-0709, grok-4-fast, grok-code-fast-1): dropped from docs.x.ai's public pricing page but still callable via the API. Added inline comment noting this; rates kept at their verified launch pricing. - **Mistral routing**: added comment explaining why TOKEN_COST for MISTRAL_* is the OpenRouter safety floor (not Mistral-direct) since `ModelMetadata.provider = "open_router"` for all Mistral entries. ## How - For each entry, opened the **official provider pricing page** directly and computed `our_cr = round(1.5 × provider_usd × 100)`. - For JS-rendered pages (docs.x.ai, api-docs.deepseek.com), used agent-browser headless to render + extract rates from the DOM. - Migrated 2 blocks (`UnrealTextToSpeechBlock`, `ReplicateModelBlock`) from flat RUN to COST_USD — the Replicate migration touched the block's SDK interaction. - Updated 2 FAL-video unit tests that asserted the old `3 cr/s` rate. - Updated 3 stale test assertions: 2 for Unreal TTS (still on `characters` cost_type) + 1 for ZeroBounce (old 250 cr). ## Known remaining risk (explicitly out of scope) - **`ReplicateFluxAdvancedModelBlock`** not migrated — bounded to Flux models ($0.04–$0.08), flat 10 cr stays within 1.25–2.5× margin. Separate PR if desired. - **AgentMail** on free tier (1 RUN). When paid pricing publishes, revisit. - **Live Replicate API verification**: mitigated via 9 unit tests covering the refactored path (`async_create` version-vs-model branching, metrics-based billing emission, failed/canceled raises, zero/missing-metrics no-emission, `async_wait` ordering), and SDK signature confirmed via `inspect.signature` — but no real API call executed. A smoke test on a cheap model before merge is still recommended. ## Test plan - [x] `poetry run pytest backend/data/block_cost_config_test.py backend/executor/block_usage_cost_test.py backend/blocks/claude_code_cost_test.py backend/blocks/cost_leak_fixes_test.py backend/blocks/block_cost_tracking_test.py backend/copilot/tools/helpers_test.py backend/blocks/replicate/replicate_block_cost_test.py -q` — all passing (80+ tests). - [x] Sources: openai.com/api/pricing, claude.com/pricing, api-docs.deepseek.com, mistral.ai/pricing, platform.kimi.ai/docs/pricing, docs.x.ai, groq.com/pricing, replicate.com, fal.ai, d-id.com, ideogram.ai, zerobounce.net, jina.ai, unrealspeech.com, enrichlayer.com. - [ ] Live Replicate API call to verify `predictions.async_create + async_wait + metrics.predict_time` path. |
||
|
|
24406dfcec |
refactor(platform): consolidate 6 LD flags into 2 JSON flags (#12915)
## What Consolidates two groups of LaunchDarkly flags into single JSON-valued flags, matching the pattern established by `copilot-tier-multipliers` (merged in #12910): **Stripe prices** — 4 string flags → 1 JSON flag: - ~~`stripe-price-id-basic`~~ / ~~`-pro`~~ / ~~`-max`~~ / ~~`-business`~~ - **New:** `copilot-tier-stripe-prices` (JSON) ```json { "PRO": "price_xxx", "MAX": "price_yyy" } ``` **Cost limits** — 2 number flags → 1 JSON flag: - ~~`copilot-daily-cost-limit-microdollars`~~ / ~~`copilot-weekly-cost-limit-microdollars`~~ - **New:** `copilot-cost-limits` (JSON) ```json { "daily": 625000, "weekly": 3125000 } ``` ## Why - One flag to manage per config domain (LD UI less cluttered, easier audit trail). - Atomic updates — e.g., rotating Pro + Max prices happens in a single save. - Fewer LD entities to name, version, target, and explain. - Mirrors the just-merged `copilot-tier-multipliers` shape so the whole pricing/limits config is uniform. ## How - `get_subscription_price_id(tier)` now parses `copilot-tier-stripe-prices` and looks up `tier.value` — returns `None` when the flag is unset, non-dict, tier key missing, or value isn't a non-empty string. - `get_global_rate_limits` uses a new sibling `_fetch_cost_limits_flag()` helper (60s cache, `cache_none=False`) that extracts `daily` / `weekly` int keys independently and falls back to the existing `ChatConfig` defaults when any key is missing / non-int / negative. A broken `daily` doesn't wipe out `weekly` (or vice versa). - Tests rewritten to mock the new JSON shapes + cover partial / invalid / missing-key fallbacks. ## ⚠️ Operator action required BEFORE merging This PR **removes 6 LD flags** and introduces 2 replacements. To avoid a pricing/rate-limit outage, do this in LaunchDarkly first: 1. Create `copilot-tier-stripe-prices` (type: **JSON**). Default variation = union of the current `stripe-price-id-*` values: ```json { "PRO": "<current stripe-price-id-pro>", "MAX": "<current stripe-price-id-max>" } ``` Omit BASIC / BUSINESS if those flags are unset today. 2. Create `copilot-cost-limits` (type: **JSON**). Default variation = the current two flags' values: ```json { "daily": <current daily microdollars>, "weekly": <current weekly microdollars> } ``` 3. Merge this PR. 4. After deploy + smoke test, delete the six legacy flags: - `stripe-price-id-{basic,pro,max,business}` - `copilot-daily-cost-limit-microdollars` - `copilot-weekly-cost-limit-microdollars` ## Testing - Backend unit tests: `pytest backend/copilot/rate_limit_test.py backend/data/credit_subscription_test.py backend/api/features/subscription_routes_test.py` — rewritten to exercise the JSON flag shapes + fallback paths; passes locally. - `black --check` / `ruff check` / `isort --check` — all clean. ## Checklist - [x] I have read the project's contributing guide. - [x] I have clearly described what this PR changes and why. - [x] My code follows the style guidelines of this project. - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] New and existing unit tests pass locally with my changes (CI will confirm). |
||
|
|
33eb9e9ad9 |
fix(platform): address review — format_bytes rollup, storage bar visibility
- Rename _format_bytes → format_bytes (public API, used cross-module) - Add unit boundary rollup (1024 KB → 1.0 MB) matching frontend formatBytes - Show WorkspaceStorageSection even when token limits are null - Move import to top level in routes.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
938cf9ea5f |
test(backend): add guard test for tier multiplier enum completeness
Ensures every SubscriptionTier has an entry in _DEFAULT_TIER_MULTIPLIERS. Matches the existing storage limit guard — both fail immediately if someone adds a new tier without updating the mappings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1fed970a88 |
fix(backend): move mid-file import to top, add tier coverage guard test
- Move _format_bytes import to top of routes.py (was mid-function) - Add test_every_subscription_tier_has_storage_limit that iterates all SubscriptionTier enum values and asserts each has a TIER_WORKSPACE_STORAGE_MB entry — fails immediately if someone adds a new tier without a storage limit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4d85a68d9a |
fix(platform): resolve merge conflicts with dev, update tier names
- Resolve conflicts in test files and SubscriptionTierSection - Update TIER_WORKSPACE_STORAGE_MB for renamed tiers: FREE→BASIC, add MAX - Update tier descriptions in helpers.ts with storage limits - Update rate_limit_test.py for new tier names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
000ddb007a |
dx: use $REPO_ROOT in pr-test skill instead of hardcoded absolute path (#12914)
## Summary
- `.claude/skills/pr-test/SKILL.md` referenced
`/Users/majdyz/Code/AutoGPT/.ign.testing.{lock,log}` in 5 places, which
breaks the skill for anyone else who clones the repo.
- Replaced with `$REPO_ROOT`, which is already defined in Step 0 as `git
-C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'`. That
resolves to the main/primary worktree from any sibling worktree,
preserving the original "always pin the lock to the root checkout so all
siblings see the same file" semantics.
- No behavior change for the existing user; repo becomes portable for
everyone else.
## Test plan
- [x] `grep -n "/Users/majdyz" .claude/skills/pr-test/SKILL.md` returns
only the two intentional mentions in the "never paste absolute paths
into PR comments" warning.
- [x] `$REPO_ROOT` is defined in Step 0 before any Step 3.0 usage.
|
||
|
|
408b205515 |
feat(platform): LD-configurable rate-limit multipliers + relative UI display (#12910)
## Summary - **Backend (`copilot/rate_limit`)** — ``TIER_MULTIPLIERS`` is now float-typed and resolvable through a new LaunchDarkly flag ``copilot-tier-multipliers``. The integer defaults live on as ``_DEFAULT_TIER_MULTIPLIERS`` and are merged with whatever LD returns (missing / invalid keys inherit defaults; LD failures fall back to defaults without raising). ``get_global_rate_limits`` now honours the flag per-user and casts ``int(base * multiplier)`` so downstream microdollar math stays integer even when LD hands back a fractional multiplier (e.g. 8.5×). Cached for 60 s via ``@cached(ttl_seconds=60, maxsize=8, cache_none=False)`` to match the pattern in ``get_subscription_price_id``. - **Backend (`api/features/v1`)** — ``SubscriptionStatusResponse`` gains ``tier_multipliers: dict[str, float]``, populated for the same set of tiers that make it into ``tier_costs`` so hidden tiers never get a rendered badge. - **Frontend (`SubscriptionTierSection`)** — drops the hard-coded ``"5x" / "20x"`` strings from ``TIERS`` and introduces ``formatRelativeMultiplier(tierKey, tierMultipliers)``: the lowest *visible* multiplier becomes the baseline (no badge), every other tier renders ``"N.Nx rate limits"`` relative to it. Fractional LD values like 8.5× round to one decimal. The admin rate-limit page (``/admin/rate-limits``) keeps the static ``TIER_MULTIPLIERS`` defaults — it's admin-facing, infrequently viewed, and fine to lag the LD value until next deploy (noted in-code). Related upstream: this PR stacks logically after #12903 (which added the ``MAX`` tier + LD-configurable prices) but does **not** require it — each PR can merge in either order. No schema changes, no migration. ## Test plan - [x] ``poetry run black backend/... --check`` + ``poetry run ruff check backend/...`` pass - [x] ``pnpm format`` pass (modified files unchanged) - [x] New backend tests: ``TestGetTierMultipliers`` (defaults, LD override, invalid JSON, unknown tier / non-positive values, LD failure) — **5 / 5 pass** - [x] New backend test: ``TestGetGlobalRateLimitsWithTiers::test_ld_override_applies_fractional_multiplier`` — **pass** - [x] ``backend/copilot/rate_limit_test.py`` — non-DB subset **72 / 72 pass**; ``TestGetUserTier`` / ``TestSetUserTier`` require the full test-server fixture (Redis + Prisma) and are not run in this worktree — same behaviour on clean ``dev`` - [x] ``backend/api/features/subscription_routes_test.py`` — **40 / 40 pass** (includes new ``test_get_subscription_status_tier_multipliers_ld_override``) - [x] Frontend vitest targeted suite — **51 / 51 pass** - ``helpers.test.ts`` — new ``formatRelativeMultiplier`` cases (lowest-tier null, integer ratio, fractional ratio, hidden-tier null, fractional LD) - ``SubscriptionTierSection.test.tsx`` — three new cases for relative badges, rebasing when the lowest tier is hidden, fractional LD overrides |
||
|
|
f8c123a8c3 |
feat(blocks): dynamic COST_USD billing + close 8 cost-leak surfaces (#12909)
## Why
`ClaudeCodeBlock` was a flat `RUN, 100 cr/run` entry when real cost is
**$0.02–$1.50/run**. Plugging that leak surfaced the question "are other
blocks doing the same?" — an audit found **7 more cost-leak surfaces**.
This PR closes all of them atomically so the cost pipeline is uniform
post-#12894.
## What
### 1. ClaudeCodeBlock → COST_USD 150 cr/$ (the headline)
Claude Code CLI's `--output-format json` already returns
`total_cost_usd` on every call, rolling up Anthropic LLM + internal
tool-call spend. Block now emits it via `merge_stats`:
```python
total_cost_usd = output_data.get("total_cost_usd")
if total_cost_usd is not None:
self.merge_stats(NodeExecutionStats(
provider_cost=float(total_cost_usd),
provider_cost_type="cost_usd",
))
```
Registered as `COST_USD, 150 cr/$` — matches the 1.5× margin baked into
every `TOKEN_COST` entry.
### 2. Exa websets — ~40 blocks instrumented
Registered as `COST_USD 100 cr/$` but **never emitted `provider_cost`**
→ ran wallet-free. Added `extract_exa_cost_usd` + `merge_exa_cost`
helpers in `exa/helpers.py` and threaded `merge_exa_cost(self,
response)` through every Exa SDK call across 14 files (59 call sites).
Future-proof: lights up as soon as `exa_py` surfaces `cost_dollars` on
webset response types.
### 3. AIConditionBlock — registered under LLM_COST
Full LLM block with token-count instrumentation already in place, but
**no `BLOCK_COSTS` entry at all** → wallet-free. One-line fix: added to
the LLM_COST group next to AIConversationBlock.
### 4. Pinecone × 3 — added BLOCK_COSTS
- `PineconeInitBlock` + `PineconeQueryBlock`: 1 cr/run RUN (platform
overhead; user pays Pinecone directly).
- `PineconeInsertBlock`: ITEMS scaling with `len(vectors)` emitted via
`merge_stats`.
### 5. Perplexity Sonar (all 3 tiers) → COST_USD 150 cr/$
Block already extracted OpenRouter's `x-total-cost` header into
`execution_stats.provider_cost`; just tagged it `cost_usd` and flipped
the registry. **Deep Research was under-billing up to 30×** ($0.20–$2.00
real vs flat 10 cr).
### 6. CodeGenerationBlock (Codex / GPT-5.1-Codex) → COST_USD 150 cr/$
Block computes USD from `response.usage.input_tokens / output_tokens`
using GPT-5.1-Codex rates ($1.25/M in + $10/M out) and emits `cost_usd`.
Was flat 5 cr for arbitrary-length generations.
### 7. VideoNarrationBlock (ElevenLabs) → COST_USD 150 cr/$
Block computes USD from `len(script) × $0.000167` (Starter tier per-char
price) and emits `cost_usd`. **Was under-billing ~25–30× on long
scripts** (5K-char narration: flat 5 cr vs ~$0.83 real = 125 cr).
### 8. Meeting BaaS FetchMeetingData → COST_USD 150 cr/$
Join block keeps its flat 30 cr commit. FetchMeetingData now extracts
`duration_seconds` from the response metadata, computes USD via
`duration × $0.000192/sec`, and emits `cost_usd`. Long meetings (hours)
no longer fit inside the 30 cr deposit.
## Why 150 cr/$
Matches the **1.5× margin already baked into `TOKEN_COST` for every
direct LLM block**:
| Model | Real | Our rate (per 1M) | Markup |
|---|---|---|---|
| Claude Sonnet 4 | $3/$15 | 450/2250 cr | 1.5× |
| GPT-5 | $2.50/$10 | 375/1500 cr | 1.5× |
| Gemini 2.5 Pro | $1.25/$5 | 187/750 cr | 1.5× |
Applying the same ratio to `total_cost_usd` ≡ `cost_amount=150` (1 cr ≈
$0.01 → 100 cr/$ pass-through × 1.5× = 150).
## Test plan
- [x] **Unit**: new `claude_code_cost_test.py` (9 tests) + existing
`exa/cost_tracking_test.py` (16 tests) + full cost pipeline. **119/119
pass**.
- [x] `poetry run ruff format` + `poetry run ruff check backend/` —
clean.
- [ ] Live E2E: real ClaudeCode / Perplexity Deep Research / Codex run
with balance delta verification (post-merge).
## Follow-ups (not in this PR)
- `exa_py` SDK update to surface `cost_dollars` on Webset response types
(upstream) — unlocks real billing for the 40 webset blocks.
- Replicate suite: migrate per-model RUN entries to COST_USD via
`prediction.metrics["predict_time"] × per-model $/sec`.
|
||
|
|
34374dfd55 |
feat(frontend): Settings v2 API keys page (SECRT-2273) (#12907)
### Why / What / How **Why:** The Settings v2 API keys page was a UI-only stub with 100 mock rows, a noop "Create Key" button, noop delete buttons, and no empty/loading states. Users couldn't actually manage their keys from the new Settings UI. Ships SECRT-2273. **What:** Replaces the mock with a working page: paginated list (15/page) with infinite scroll, create flow with one-time plaintext reveal, single + batch revoke with confirmation dialogs, per-key details dialog, skeleton loader, animated empty state, toast + mutation-loading feedback, and responsive header. https://github.com/user-attachments/assets/bc576de3-0369-4e73-b945-c66c142ebfe5 <img width="397" height="860" alt="Screenshot 2026-04-24 at 11 26 53 AM" src="https://github.com/user-attachments/assets/ed8681ea-7d16-40cc-96f7-72d798857229" /> **How:** - **Backend** adds a new `GET /api/api-keys/paginated` route returning `{ items, total_count, page, page_size, has_more }`. The legacy `GET /api/api-keys` is untouched so the existing profile page keeps working. The list fn runs `find_many` + `count` in parallel and filters to `ACTIVE` status by default so revoked keys stay hidden. - **Frontend** fetches via TanStack Query. Right now the hook consumes the legacy endpoint with client-side slicing (15/page) so the page works against staging today; once the paginated route ships we swap to the generated `useGetV1ListUserApiKeysPaginatedInfinite` hook that's already in the regenerated client. - All new UI lives in `src/app/(platform)/settings/api-keys/components/` — no legacy components reused. Shared primitives (Dialog, Form, Toast, Skeleton, InfiniteScroll, BaseTooltip) come from the atoms/molecules design system. - Empty state uses a vertical marquee of ghost key-cards (framer-motion, translateY 0→-50% on a duplicated stack, linear easing, symmetric mask fade). Respects `prefers-reduced-motion`. - Settings layout ScrollArea switched to `h-full` on mobile and `md:h-[calc(100vh-60px)]` on desktop to remove a double scrollbar that appeared when the mobile nav took space above the fixed-height scroll region. ### Changes 🏗️ **Backend** - `GET /api/api-keys/paginated` — new route, page + page_size query params, `ListAPIKeysPaginatedResponse`. - `list_user_api_keys_paginated` — new data fn, gathers find_many + count, default ACTIVE-only filter. - Existing `/api/api-keys` routes untouched. **Frontend (settings/api-keys)** - `page.tsx` + `components/APIKeyList/`, `APIKeyRow/`, `APIKeysHeader/`, `APIKeySelectionBar/` — real-data wiring, drop mock array. - `components/hooks/` — `useAPIKeysList`, `useCreateAPIKey`, `useRevokeAPIKey`. - `components/CreateAPIKeyDialog/` — zod-validated form + success view with copy. - `components/DeleteAPIKeyDialog/` — confirm with loading state; single + batch. - `components/APIKeyInfoDialog/` — shows masked key, scopes, description, created/last_used. - `components/APIKeyListEmpty/` + `APIKeyListEmpty/components/APIKeyMarquee.tsx` — animated empty state. - `components/APIKeyListSkeleton/` — 6-row skeleton. **Other** - `settings/layout.tsx` — responsive ScrollArea height (fixes double-scrollbar on mobile). - `components/ui/scroll-area.tsx` — optional `showScrollToTop` FAB. - `__tests__/placeholder-pages.test.tsx` — drop api-keys from placeholder list. - `AGENTS.md` — Phosphor `-Icon` suffix convention note. - `api/openapi.json` — regenerated with new paginated endpoint. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Page loads → skeleton → list with real keys - [ ] Empty state renders with the vertical marquee (and stays static with `prefers-reduced-motion`) - [ ] Create key dialog: name + description + permissions validates; success view shows plaintext once + copy works; closing resets state - [ ] Revoke single key via row trash icon → confirm dialog → toast on success → row disappears - [ ] Batch-revoke via selection bar → confirm dialog → all revoked - [ ] Info icon next to each key opens the details dialog (scopes, timestamps, masked key) - [ ] Infinite scroll loads more rows when scrolling past page 1 (≥16 keys) - [ ] Mobile (<640px): single scrollbar, Create Key button below title at size=small - [ ] Desktop (md+): same layout as before, scroll-to-top FAB appears after scrolling #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) |
||
|
|
2cb52e5d19 |
feat(frontend): add Settings v2 page layout behind SETTINGS_V2 flag (SECRT-2272) (#12885)
### Why / What / How **Why:** The Settings area is getting a redesign (per Figma [Settings-Page](https://www.figma.com/design/YGck0Hb0GEgFzwbX47kSNs/Settings-Page?node-id=1-2)). Ticket SECRT-2272 covers just the shell so content/forms for each section can land in follow-up PRs without blocking on the nav restructure. v1 at `/profile/settings` must stay intact for end users during the rollout. **What:** Adds a new parallel Settings hub at `/settings` (dedicated sidebar + 7 placeholder sub-routes) behind a new `SETTINGS_V2` LaunchDarkly flag. Default `false` so nothing changes for users until the flag flips. Backend is untouched. https://github.com/user-attachments/assets/dd680eaf-3d41-4a9a-87f3-d06d536a2503 **How:** - New `Flag.SETTINGS_V2 = "settings-v2"` added to `use-get-flag.ts` with `defaultFlags[Flag.SETTINGS_V2] = false`. Gate the whole route group at `layout.tsx` via existing `FeatureFlagPage` HOC which redirects to `/profile/settings` when the flag is off. - `SettingsSidebar` replicates the Figma spec (237px, 7 items at 217×38, `gap-[7px]`, rounded-[8px], active `bg-[#EFEFF0]` + text `#1F1F20` Geist Medium, inactive text `#505057` Geist Regular, icon 16px Phosphor light/regular at `#1F1F20`). Colors + typography use the canonical tokens exported by Figma (zinc-50 `#F9F9FA`, zinc-200 `#DADADC` for the right-border, etc.). - `SettingsNavItem` is extracted as its own component and owns its per-item entrance variant. - Per-link loading indicator uses Next.js 15's `useLinkStatus()` hook — spinner appears on the right of the clicked item and clears automatically once the target page renders. - `SettingsMobileNav` (< md breakpoint): sidebar hides; a pill trigger with the current section's icon + label opens a Radix Popover listing all 7 sections. - Entrance animations via framer-motion, tuned to Emil Kowalski's guidelines — `cubic-bezier(0, 0, 0.2, 1)` ease-out, all durations ≤ 280ms, only `transform` and `opacity`, `useReducedMotion` disables movement but keeps fade. Sidebar items stagger in (40ms offset). Main content re-animates on every route change via `key={pathname}`. - All 7 placeholder pages render the section title (Poppins Medium 22/28 via `variant="h4"`, `#1F1F20`) + "Coming soon" copy; they are intentionally client components to avoid hook-order issues with the client-side flag gate in the layout. ### Changes 🏗️ - `src/services/feature-flags/use-get-flag.ts`: register `Flag.SETTINGS_V2` + default `false` - `src/app/(platform)/settings/layout.tsx`: flag gate + responsive shell + route-keyed content animation - `src/app/(platform)/settings/page.tsx`: client-side redirect to `/settings/profile` - `src/app/(platform)/settings/components/SettingsSidebar/`: - `SettingsSidebar.tsx` — aside with staggered entrance - `SettingsNavItem.tsx` — per-item Link + icon + label + loader (extracted) - `useSettingsSidebar.ts` — hook mapping nav items with `isActive` from `usePathname` - `helpers.ts` — typed nav item config (label / href / Phosphor icon) × 7 - `src/app/(platform)/settings/components/SettingsMobileNav/SettingsMobileNav.tsx`: mobile Popover trigger - 7 placeholder pages: `profile`, `creator-dashboard`, `billing`, `integrations`, `preferences`, `api-keys`, `oauth-apps` **Follow-up PRs will migrate real content into each tab.** LaunchDarkly flag key `settings-v2` must be created in the LD dashboard before enabling for users. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `NEXT_PUBLIC_FORCE_FLAG_SETTINGS_V2=true` → `/settings` redirects to `/settings/profile`, sidebar renders 7 items with "Profile" active - [x] Click each nav item → URL changes, active item highlights, content pane re-animates, per-link spinner shows during navigation - [x] Viewport < 768px → sidebar hides, mobile pill trigger opens Popover with all 7 items; selecting one navigates and closes - [x] Without the flag env override, `/settings` redirects to `/profile/settings` (v1 unchanged) - [x] `pnpm types` clean; prettier clean on touched files - [x] Manual a11y pass with `prefers-reduced-motion` enabled — fade remains, translations disabled #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes *(no new env vars required; existing `NEXT_PUBLIC_FORCE_FLAG_*` pattern covers local override)* - [x] `docker-compose.yml` is updated or already compatible with my changes *(no docker changes)* - [x] I have included a list of my configuration changes in the PR description *(LaunchDarkly dashboard must have `settings-v2` flag created before enabling; no other config changes)* |
||
|
|
ab88d03b13 |
refactor(backend/integrations): clearer naming + docs for managed-cred sweep (#12908)
## Why Review comments on #12883 (thanks @Pwuts) surfaced a few spots where the managed-credential plumbing's names and docstrings didn't match what the code actually does: - `_read_or_create_profile_key` suggests "read from any source or create new", but only migrates the legacy `managed_credentials.ayrshare_profile_key` side-channel — it doesn't read an existing managed credential. (That check lives in the outer `_provision_under_lock`.) - Docstrings refer to "the startup sweep" in several places — there's no startup hook; the sweep runs on `/credentials` fetches. - `is_available` / `auto_provision` relationship wasn't explicit; readers couldn't tell whether `is_available` was a config check or a liveness check, or which of the two gates the sweep checks first. ## What Naming + docstring cleanup. **Zero behavior changes.** - Rename `_read_or_create_profile_key` → `_migrate_legacy_or_create_profile_key` with docstring explaining why it doesn't re-check the managed cred. - Replace "startup sweep" → "credentials sweep" everywhere. - `ManagedCredentialProvider` class docstring now names the two gates: 1. `auto_provision` — does this provider participate in the sweep at all? 2. `is_available` — are the required env vars / secrets set? - `is_available` docstring now spells out: what it checks (env vars), what it does NOT check (upstream health), and that it's only consulted when `auto_provision=True`. - `ensure_managed_credentials` docstring defines "credentials sweep", when it fires, how the per-user in-memory cache works. - Module-level docstring drops the stale "non-blocking background task" wording (#12883 made the sweep bounded-await). ## How 4 files, all backend: - `backend/integrations/managed_credentials.py` - `backend/integrations/managed_providers/ayrshare.py` - `backend/integrations/managed_providers/ayrshare_test.py` - `backend/api/features/integrations/router.py` Tests: 13/13 Ayrshare tests pass against the rename. ## Checklist - [x] Follows style guide - [x] Existing tests still pass (no functional change) - [x] No new tests needed — pure rename + docstring change |
||
|
|
3aa72b4245 |
feat(backend/copilot): inline picker-backed inputs via run_block + accept AgentInputBlock subclasses (#12880)
### Why / What / How **Why:** Resolves #12875. CoPilot's agent-builder was hardcoding Google Drive file IDs into consuming blocks' `input_default` instead of wiring an `AgentGoogleDriveFileInputBlock`. A beta user hit this across **13 saved versions** of one agent. Root causes: 1. `validate_io_blocks` only accepted the literal base `AgentInputBlock` / `AgentOutputBlock` IDs, so even when CoPilot used a specialized subclass like `AgentGoogleDriveFileInputBlock` as the only input, the validator forced it to keep a throwaway base alongside — entrenching the anti-pattern. 2. Running a Drive consumer directly via CoPilot's `run_block` silently failed because the auto-credentials flow (picker attaches `_credentials_id`) existed only in the graph executor, never in CoPilot's direct-execution path. 3. Drive picker guidance lived in `agent_generation_guide.md` instead of on the blocks themselves, so it duplicated and drifted from the code. 4. Observed in a live session: when asked to read a private sheet, CoPilot refused with "share publicly or use the builder" instead of calling `run_block` and letting the picker render — the prompt rule was buried and the fallback path (omitted required picker field) returned a generic schema preview. **What:** Four coordinated platform + CoPilot improvements. No block-specific validator rules, no Drive-specific code in UI or prompt. **How:** #### 1. `validate_io_blocks` subclass support Accepts any block with `uiType == "Input"` / `"Output"` (populated from `Block.block_type` at registration). `AgentGoogleDriveFileInputBlock`, `AgentDropdownInputBlock`, `AgentTableInputBlock`, etc. stand alone. Base-ID fallback preserved for call sites that pass a minimal blocks list. #### 2. Inline picker via `run_block` - Extracted `_acquire_auto_credentials` from `backend/executor/manager.py` into shared `backend/executor/auto_credentials.py` (exports `acquire_auto_credentials` + `MissingAutoCredentialsError`). - Wired it into `backend/copilot/tools/helpers.py::execute_block`. When `_credentials_id` is present, the block executes with creds injected (chained flows work). When missing/null, `execute_block` returns the existing `SetupRequirementsResponse` — frontend's `FormRenderer` renders the picker inline via the existing `GoogleDrivePickerField`/`GoogleDrivePickerInput`. On pick, the LLM re-invokes `run_block` with the populated input — same continuation pattern as OAuth-missing-credentials. No new response types, no new continuation tool, no new frontend component. - `run_block` now short-circuits to `SetupRequirementsResponse` when missing required fields include a picker-backed field, skipping the schema-preview round trip the LLM would otherwise take. - `get_inputs_from_schema` spreads the full property schema (`**schema`) instead of whitelisting — any `format` / `json_schema_extra` / custom widget config flows through to the generic custom-field dispatch on the frontend. Future picker formats (date pickers, file pickers, etc.) work without backend changes. - Frontend `SetupRequirementsCard/helpers.ts` uses index-signature passthrough for arbitrary schema keys — no widget-specific code in that layer. #### 3. `validate_only` parameter on `run_block` `run_block(id, {})` is not always a safe probe — for blocks with zero required inputs, it executes. New `validate_only: true` parameter returns `BlockDetailsResponse` (schema + missing-input list) without executing, rendering picker cards, or charging credits. Same response shape as the existing schema preview — no new branch, just an extra condition on the existing one. LLM uses this for pre-flight when it's unsure whether a block has required inputs. #### 4. Block-local picker guidance Agent-generation picker guidance relocated from the guide onto the blocks themselves — surfaced at `find_block` time, exactly when the LLM decides to wire a picker-backed consumer: - `GoogleDriveFileField` (shared factory for every Drive field on Sheets/Docs/etc.) appends a standard hint to the caller's description covering: feed from the specialized input block, never hardcode (even one parsed from a URL), picker is the only credential source. - `AgentGoogleDriveFileInputBlock`'s block description now covers when it's required, the `allowed_views` mapping, wiring direction, and a concrete link-shape example. - `agent_generation_guide.md` loses the dedicated 71-line Drive section. The IO-blocks section now tells the LLM specialized subclasses satisfy the requirement and carry their own usage guidance in block/field descriptions — read them when `find_block` surfaces a match. - New "Picker-backed inputs via `run_block`" section in the CoPilot prompt, written generically (picker fields detected via `format` / `auto_credentials` schema hints, no provider names hardcoded) — covers: don't ask the user for URLs/IDs, don't refuse private-resource asks, chained picker objects pass through as-is. - Sharpened `MissingAutoCredentialsError` message so when a bare ID reaches execution, the error explicitly tells the LLM the picker renders inline (not "ask the user for something"). ### Changes 🏗️ - `backend/copilot/tools/agent_generator/validator.py` — `_collect_io_block_ids` + subclass-aware `validate_io_blocks`. - `backend/executor/auto_credentials.py` (new) — shared `acquire_auto_credentials` + `MissingAutoCredentialsError`. - `backend/executor/manager.py` — imports from the shared module, drops the local copy. - `backend/copilot/tools/helpers.py` — `execute_block` calls `acquire_auto_credentials`, merges kwargs, releases locks in `finally`, returns `SetupRequirementsResponse` on missing creds. `get_inputs_from_schema` spreads the full property schema. - `backend/copilot/tools/run_block.py` — picker-field short-circuit + `validate_only` parameter. - `backend/copilot/prompting.py` — "Picker-backed inputs via `run_block`" + "Pre-flight with `validate_only`" sections. - `backend/blocks/google/_drive.py` — `GoogleDriveFileField` appends the agent-builder hint to every Drive consumer's description. - `backend/blocks/io.py` — `AgentGoogleDriveFileInputBlock` description expanded. - `backend/copilot/sdk/agent_generation_guide.md` — Drive section removed, IO-blocks subclass note expanded. - `frontend/.../SetupRequirementsCard/helpers.ts` — index-signature passthrough for arbitrary schema keys; schema fields propagate into the generated RJSF schema. - Tests: new `TestExecuteBlockAutoCredentials` (4 cases) + `validate_only` + picker-short-circuit cases in `run_block_test.py`; `manager_auto_credentials_test.py` moved to new import path; 6 new frontend cases in `SetupRequirementsCard/__tests__/helpers.test.ts` covering schema passthrough. - Also: one-line hoist of `import secrets` in `backend/integrations/managed_providers/ayrshare.py` — ruff E402 introduced by #12883 was blocking our lint post-merge. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Backend unit suites: validator_test (48), helpers_test (40), run_block_test (19), manager_auto_credentials_test (15) — **all green** - [x] Frontend `SetupRequirementsCard` helpers — **75/75 pass** (including 6 new passthrough cases) - [x] `poetry run format` (ruff + isort + black) clean on touched files (pre-existing pyright errors in unrelated `graphiti_core` / `StreamEvent` / etc. files not introduced by this PR) - [x] Live CoPilot chat on dev-builder confirmed the setup card renders `custom/google_drive_picker_field` for a Drive consumer block called via `run_block` - [x] Live agent-generation confirmed CoPilot creates a subclass-only agent (`AgentGoogleDriveFileInputBlock` → `GoogleSheetsReadBlock` → `AgentOutputBlock`) with no throwaway base `AgentInputBlock` #### For configuration changes: - [x] N/A — no config changes --------- Co-authored-by: majdyz <zamil.majdy@agpt.co> |
||
|
|
cc1f692fec |
feat(platform): add MAX tier + LD-configurable pricing + hide unconfigured tiers (#12903)
## What Introduces a new `MAX` tier slot between `PRO` and `BUSINESS` (self-service $320/mo at 20× capacity), routes every self-service tier's Stripe price ID through LaunchDarkly, and hides tiers from the UI when their price isn't configured. `BUSINESS` stays in the enum at 60× as a reserved/future self-service slot (hidden by default until its LD price flag is set). ENTERPRISE stays admin-managed. ## Tier shape after this PR | Enum | UI label | Multiplier | LD price flag | Surfaced in UI by default | |---|---|---|---|---| | `FREE` | Basic | 1× | `stripe-price-id-basic` | no (flag unset) | | `PRO` | Pro | 5× | `stripe-price-id-pro` | yes (already live) | | `MAX` **(new)** | Max | 20× | `stripe-price-id-max` | no (flag unset until $320 price ready) | | `BUSINESS` | Business | 60× | `stripe-price-id-business` | no (reserved / future) | | `ENTERPRISE` | — | 60× | — (admin-managed) | no (Contact-Us only) | ## Prisma - Added `MAX` between `PRO` and `BUSINESS` in `SubscriptionTier`. - Migration `add_subscription_tier_max/migration.sql` uses `ALTER TYPE ... ADD VALUE IF NOT EXISTS 'MAX' BEFORE 'BUSINESS'` (transactional since PG 12). No data migration — no rows currently on BUSINESS via self-service flows. ## Backend - `get_subscription_price_id` flag map covers `FREE`/`PRO`/`MAX`/`BUSINESS`. ENTERPRISE returns `None`. - `GET /credits/subscription.tier_costs` only includes tiers whose LD price ID is set. Current tier always present as a safety net. - `POST /credits/subscription` routes by LD-resolved prices instead of hard-coding `tier == FREE`: - Target `FREE` + `stripe-price-id-basic` unset → legacy cancel-at-period-end (unchanged behaviour). - Target has LD price → modify in-place when user has an active sub, else Checkout Session. - Priced-FREE users with no sub fall through to Checkout (admin-granted DB-flip shortcut gated on `current_tier != FREE`). - `sync_subscription_from_stripe` + `get_pending_subscription_change` cover FREE/PRO/MAX/BUSINESS in the price-to-tier map so every tier's Stripe webhook reconciles cleanly. - Pending-tier mapping collapsed into a single membership check. - `TIER_MULTIPLIERS`: `FREE=1, PRO=5, MAX=20, BUSINESS=60, ENTERPRISE=60`. ## Frontend - UI labels: FREE→"Basic", MAX→"Max", BUSINESS→"Business" (PRO unchanged). `TIER_ORDER` now `[FREE, PRO, MAX, BUSINESS, ENTERPRISE]`. - `SubscriptionTierSection` filters by `tier_costs` — any tier without a backend-provided price is hidden (current tier always visible). - `formatCost` surfaces "Free" only when `FREE` is actually `$0`; non-zero `stripe-price-id-basic` renders `$X.XX/mo`. - Admin rate-limit display lists all five tiers with multiplier badges. ## LaunchDarkly flag actions (operator) - **New:** `stripe-price-id-basic` → FREE tier. Set to `""` or a `$0` Stripe price. - **New:** `stripe-price-id-max` → MAX tier. Point at the `$320` Stripe price when you launch the Max tier. - **Unchanged:** `stripe-price-id-pro` (PRO), `stripe-price-id-business` (BUSINESS — leave unset until you're ready for the 60× Business tier). - Base rate limits stay on `copilot-daily-cost-limit-microdollars` / `copilot-weekly-cost-limit-microdollars` (Basic's limit; everything else = × tier multiplier). ## Out of scope - Subscription-required onboarding screen / middleware gating (separate PR). - "Pricing available soon" vs Stripe-failure disambiguation in the UI (follow-up). ## Testing - Backend: 213 tests across `subscription_routes_test.py`, `credit_subscription_test.py`, `rate_limit_test.py`, `admin/rate_limit_admin_routes_test.py` — all passing. - Frontend: 91 tests across `credits/` + `admin/rate-limits/` — all passing. - Fresh-backend manual E2E on the pre-MAX commit confirmed tier-hiding works (`tier_costs` returns only the current tier when LD flags are unset). ## Checklist - [x] I have read the project's contributing guide. - [x] I have clearly described what this PR changes and why. - [x] My code follows the style guidelines of this project. - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] New and existing unit tests pass locally with my changes (CI will confirm). |
||
|
|
be61dc4304 |
fix(backend): use {schema_prefix} in raw SQL migrations instead of hardcoded 'platform.' (#12905)
### Why / What / How
**Why.** Backend CI was failing at startup with `relation
"platform.AgentNode" does not exist`. Prisma's `migrate deploy` uses the
`schema.prisma` datasource, which doesn't declare a schema, so when
`DATABASE_URL` has no `?schema=platform` query param (as in CI / raw
Supabase), Prisma creates tables in `public` — but the lifespan
migration `backend.data.graph.migrate_llm_models` hardcoded
`platform."AgentNode"` in its raw SQL and crashed the boot.
**What.** Switched `migrate_llm_models` to use the
`execute_raw_with_schema` helper and the `{schema_prefix}` placeholder —
the same pattern already used by the sibling
`fix_llm_provider_credentials` migration in the same file. The helper in
`backend/data/db.py` reads the schema from `DATABASE_URL` at runtime and
substitutes `"platform".` or an empty prefix, so the query works in both
dev (schema=platform) and CI / raw Supabase (public).
**How.**
- Template change: `UPDATE platform."AgentNode"` → `UPDATE
{{schema_prefix}}"AgentNode"` (f-string double-brace escape so
`{schema_prefix}` survives to `.format()` inside
`execute_raw_with_schema`).
- Replace `db.execute_raw(...)` with `execute_raw_with_schema(...)`;
drop the now-unused `prisma as db` import.
- Regression test: mocks `execute_raw_with_schema` and asserts every
emitted query contains `{schema_prefix}` and no longer contains
`platform."AgentNode"`.
### Audit
Audited the other three lifespan migrations in
`backend/api/rest_api.py::lifespan_context`:
- `backend.data.user.migrate_and_encrypt_user_integrations` — uses
Prisma ORM, no raw SQL. OK.
- `backend.data.graph.fix_llm_provider_credentials` — already uses
`query_raw_with_schema` + `{schema_prefix}`. OK.
- `backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs`
— uses Prisma ORM, no raw SQL. OK.
Also grepped the whole backend for `platform."` in Python files —
`migrate_llm_models` was the only offender; the other hits were
unrelated string content (docstrings, error messages, test data).
### Changes
- `autogpt_platform/backend/backend/data/graph.py`: `migrate_llm_models`
now uses `execute_raw_with_schema` with the `{schema_prefix}`
placeholder; unused `prisma as db` import dropped.
- `autogpt_platform/backend/backend/data/graph_test.py`: added
`test_migrate_llm_models_uses_schema_prefix_placeholder` regression
test.
### Checklist
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ran `migrate_llm_models` under mocked `execute_raw_with_schema` —
all 7 emitted UPDATE queries contain `{schema_prefix}` and none hardcode
`platform."AgentNode"`.
- [x] Verified the f-string double-brace escape by evaluating the
template and running `.format(schema_prefix=...)` — substitution is
correct for both `"platform".` and empty-prefix (public-schema) cases.
- [x] `poetry run pyright backend/data/graph.py` clean (pre-existing
pyright error on `backend/api/features/v1.py:834` on `origin/dev` is
unrelated).
- [x] Grepped the whole backend for other hardcoded `platform."..."`
raw-SQL occurrences — none found.
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
(N/A — no config changes)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (N/A — no config changes)
|
||
|
|
575f75edf4 |
refactor(platform): migrate Ayrshare to standard managed-credential flow (#12883)
## Why Beta user report: AutoPilot told them to sign up for Ayrshare themselves — which AutoGPT actually manages — because AutoPilot inferred the requirement from the block description string rather than any structured schema. Root cause: Ayrshare was the only block family whose "credential" lived in a bespoke `UserIntegrations.managed_credentials.ayrshare_profile_key` side channel and whose blocks declared **no** `credentials` field. `find_block` / `resolve_block_credentials` had nothing to show the LLM, so the LLM guessed. (An initial commit added a runtime `gh` CLI bootstrap for a separate "gh isn't installed in the sandbox" report — that work was empirically verified unnecessary and reverted; see the commit history for the bench results.) ## What **Ayrshare now goes through the standard managed-credential flow:** - New `AyrshareManagedProvider` alongside the existing `AgentMailManagedProvider`. Provisions the per-user profile as `APIKeyCredentials(provider="ayrshare", is_managed=True)` via the shared `add_managed_credential` path. Reuses any legacy `managed_credentials.ayrshare_profile_key` value on first provision so existing users keep their linked social accounts. - `AyrshareManagedProvider.is_available()` returns `False` so the `ensure_managed_credentials` startup sweep **never** auto-provisions Ayrshare (profile quota is a real per-user subscription cost). New public `ensure_managed_credential(user_id, store, provider)` helper lets the `/api/integrations/ayrshare/sso_url` route provision on demand, reusing the same distributed Redis lock + upsert path as AgentMail. - New `ProviderBuilder.with_managed_api_key()` method registers `api_key` as a supported auth type without the env-var-backed default credential that `with_api_key()` creates — so the org-level Ayrshare admin key cannot leak to blocks as a "profile key". - `BaseAyrshareInput` gains a shared `credentials` field; all 13 social blocks inherit it. Each `run()` now takes `credentials: APIKeyCredentials`; the inline `get_profile_key` guard + "please link a social account" error is gone. Standard `resolve_block_credentials` pre-run check owns the "not connected" path, returning a normal `SetupRequirementsResponse`. - **Migration-ordering safety:** `post_provision` hook on `ManagedCredentialProvider` clears the legacy `ayrshare_profile_key` field **only after** `add_managed_credential` has durably stored the managed credential. If persistence fails, the legacy key stays intact so a retry can reuse it — covered by `TestMigrationOrderingSafety`. - New public `IntegrationCredentialsStore.get_user_integrations()` — reads no longer have to reach past the `_get_user_integrations` privacy fence or abuse `edit_user_integrations` as a pseudo-read. - `/api/integrations/ayrshare/sso_url` collapses from a 60-line provision-then-sign dance to: pre-flight `settings_available()`, `ensure_managed_credential`, fetch the credential, sign a JWT. - `IntegrationCredentialsStore.set_ayrshare_profile_key` removed — the managed credential is now the only write path. - Legacy `UserIntegrations.ManagedCredentials.ayrshare_profile_key` field is retained so the managed provider can migrate existing users on first provision; removing the field is a follow-up once rollout has propagated. ## How After this PR, `find_block` returns Ayrshare blocks with a structured `credentials_provider: ['ayrshare']`. AutoPilot sees the credential requirement the same way it sees GitHub's or AgentMail's, calls `run_block`, and gets a plain `SetupRequirementsResponse` when the managed credential has not been provisioned yet. No more description-string speculation; the whole Ayrshare flow is the normal flow. The Builder's `AyrshareConnectButton` (`BlockType.AYRSHARE`) still works — it hits the same endpoint, now a thin wrapper over the managed provider — so users still get the "Connect Social Accounts" popup for OAuth'ing individual social networks. ## Test plan - [x] `poetry run pytest backend/blocks/test/test_block.py -k "ayrshare or PostTo"` — 26/26 pass. - [x] `poetry run pytest backend/integrations/managed_providers/ayrshare_test.py` — 10/10 pass. - [x] `poetry run pytest backend/api/features/integrations/router_test.py` — 21/21 pass. - [x] `poetry run pyright` on all touched backend files — 0 errors. - [x] Runtime sanity: `find_block` on `PostToXBlock` lists `credentials_provider: ['ayrshare']` in the JSON schema. - [ ] Manual QA in preview: connect social account via Builder's "Connect Social Accounts" button → post to X via CoPilot end-to-end. - [ ] Verify existing users with `managed_credentials.ayrshare_profile_key` continue to work without re-linking. |
||
|
|
0f6eea06c4 |
feat(platform/backend): dynamic BlockCostType (SECOND/ITEMS/COST_USD/TOKENS) + E2B/FAL migration (#12894)
## Why PR #12893 shipped flat-floor credit charges so no provider sits wallet-free. This PR is the next step: make dynamic pricing actually dynamic. Blocks that scale with walltime, item count, provider-reported USD, or token volume now get billed based on captured execution stats instead of a fixed floor. Before this PR `BlockCostType` only had `RUN` / `BYTE` / `SECOND`, and `SECOND` was dead code — no caller ever passed `run_time > 0`, so every per-second entry evaluated to 0. This PR wires the stats plumbing through, adds the cost-type variants that cover the real billing models our providers charge on, and migrates blocks across the codebase to use them. ## What ### Machinery - `BlockCostType` gains `ITEMS`, `COST_USD`, `TOKENS`. `BlockCost` gains `cost_divisor: int = 1` so SECOND/ITEMS/TOKENS can express "1 credit per N units" without fractional amounts. - `block_usage_cost(..., stats: NodeExecutionStats | None = None)` — pre-flight (no stats) dynamic types return 0 so the balance check isn't blocked on unknown-future cost; post-flight (stats populated) they consume captured execution stats. - `TokenRate` model + `TOKEN_COST` table (~60 models: Claude family, GPT-5 family, Gemini 2.5, Groq/Llama, Mistral, Cohere, DeepSeek, Grok, Kimi, Perplexity Sonar). Rates are credits per 1M tokens with input / output / cache-read / cache-creation split. - `compute_token_credits(input_data, stats)` — reads `stats.input_token_count / output_token_count / cache_read_token_count / cache_creation_token_count`, multiplies by `TOKEN_COST[model]`, ceils to integer credits. Falls back to flat `MODEL_COST[model]` for unmapped models (no silent under-billing). - `billing.charge_reconciled_usage(node_exec, stats)` — runs post-flight, charges positive delta / refunds negative delta. RUN-only blocks produce zero delta (no-op). Swallows `InsufficientBalanceError` + unexpected errors so reconciliation never poisons the success path. - Pre-flight balance guard — dynamic-cost blocks (0 pre-flight charge) are blocked when the wallet is non-positive. Closes Sentry `r3132206798` (HIGH). - Reconciliation fires `handle_low_balance` on positive delta so users still get alerted after post-flight reconciliation. ### Block migrations — cost-type changes | Provider / block family | Old | New | Cost type | |---|---|---|---| | All LLM blocks (Anthropic / OpenAI / Groq / Open Router / Llama API / v0 / AIML, via `LLM_COST` list) | RUN, flat per-model from `MODEL_COST` | `TOKEN_COST` per-token rate table (input / output / cache-read / cache-creation) | **TOKENS** | | Jina `SearchTheWebBlock` | RUN, 1 cr | 100 cr / $ (≈ 1 cr per $0.01 call) | **COST_USD** | | ZeroBounce `ValidateEmailsBlock` | RUN, 2 cr | 250 cr / $ (≈ 2 cr per $0.008 validation) | **COST_USD** | | Apollo `SearchOrganizationsBlock` | RUN, 2 cr flat | 1 cr / 2 orgs (divisor=2) | **ITEMS** | | Apollo `SearchPeopleBlock` (no enrich) | RUN, 10 cr flat | 1 cr / person | **ITEMS** | | Apollo `SearchPeopleBlock` (enrich_info=true) | RUN, 20 cr flat | 2 cr / person | **ITEMS** | | Firecrawl (all blocks — Crawl, MapWebsite, Search, Extract, Scrape, via `ProviderBuilder.with_base_cost`) | RUN, 1 cr | 1000 cr / $ (1 cr per Firecrawl credit ≈ $0.001) | **COST_USD** | | DataForSEO (KeywordSuggestions, RelatedKeywords, via `with_base_cost`) | RUN, 1 cr | 1000 cr / $ | **COST_USD** | | Exa (~45 blocks, via `with_base_cost`) | RUN, 1 cr | 100 cr / $ (Deep Research $0.20 → 20 cr) | **COST_USD** | | E2B `ExecuteCodeBlock` / `InstantiateCodeSandboxBlock` / `ExecuteCodeStepBlock` | RUN, 2 cr flat | 1 cr / 10 s walltime (divisor=10) | **SECOND** | | FAL `AIVideoGeneratorBlock` | RUN, 10 cr flat | 3 cr / walltime s | **SECOND** | ### Cost-leak fixes — interim values (flagged 🔴 CONSERVATIVE INTERIM in Notion) Separate from the type migrations above, these 3 providers had real API costs but were under-billed (or wallet-free): | Provider / block | Old | New | Cost type | Plan for proper fix | |---|---|---|---|---| | Stagehand (`StagehandObserve` / `Act` / `Extract`, via `with_base_cost`) | RUN, 1 cr | 1 cr / 3 walltime s (divisor=3) | **SECOND** | Have blocks emit `provider_cost` USD (session_seconds × $0.00028 + real LLM USD) → migrate to `COST_USD 100 cr/$`. | | Meeting BaaS `BaasBotJoinMeetingBlock` (via `@cost` decorator override) | RUN, 5 cr | RUN, 30 cr | RUN | Surface meeting duration on `FetchMeetingData` response → migrate Join to `SECOND` or `COST_USD` post-flight. | | AgentMail (~37 blocks, via `with_base_cost`) | **0 cr (unbilled)** | RUN, 1 cr | RUN | Revisit when AgentMail publishes paid-tier pricing (currently beta). | ### UI - `NodeCost.tsx` dynamic labels: RUN → `N /run`, SECOND → `~N /sec` (or `~N / Xs` with divisor), ITEMS → `~N /item` (or `/ X items`), COST_USD → `~N · by USD`, TOKENS → `~N · by tokens` (tooltip explains cache discount). - Floor amounts prefixed with `~` for dynamic types so users see an estimate, not a hard guarantee. ## How The resolver split is the key design decision. Instead of charging the "true" cost entirely post-flight (which would let a user burn credits they don't have), pre-flight returns a safe estimate: - RUN: full `cost_amount` (same as before — backwards compatible). - SECOND/ITEMS/COST_USD: `0` when stats aren't populated yet. - TOKENS: `MODEL_COST[model]` as a flat floor from the existing rate table. Post-flight, the executor calls `charge_reconciled_usage`, which evaluates the same resolver with stats and charges the positive delta (or refunds the negative delta). RUN blocks get a 0-delta no-op; dynamic blocks get their actual charge. Failure modes are bounded: insufficient balance is logged (not raised; reconciliation must never poison a success), unexpected errors are swallowed and alerted via Discord. TOKENS routes through a dedicated `compute_token_credits` helper so the rate table (`TOKEN_COST`) can grow organically without touching resolver logic. Models not yet in `TOKEN_COST` fall back to the flat `MODEL_COST` tier. Migration for providers with a real USD spend (Exa, Firecrawl, DataForSEO, Jina Search, ZeroBounce) is a one-line `_config.py` change via the extended `ProviderBuilder.with_base_cost`. Each block's `run()` populates `provider_cost` from the response (Exa's `cost_dollars.total`, Firecrawl's `credits_used`, etc.) via `merge_stats`, and the post-flight resolver multiplies by `cost_amount` credits/$. ## Test plan - [x] 92/92 cost-pipeline tests pass — `block_usage_cost_test.py`, `billing_reconciliation_test.py`, `manager_cost_tracking_test.py`, `block_cost_config_test.py`. - [x] Deep E2E against live stack (real DB, `database_manager` RPC): 8/8 scenarios pass — RUN pre-flight, dry-run no-charge, TOKENS refund, ITEMS scaling, ITEMS zero-items short-circuit, COST_USD exact + ceil semantics, pre-flight balance guard. Report: https://github.com/Significant-Gravitas/AutoGPT/pull/12894#issuecomment-4307672357 - [x] `poetry run ruff check` / `ruff format` / `pnpm format` / `pnpm lint` / `pnpm types` — clean. - [x] Manual UI: `NodeCost.tsx` renders `~N · by tokens` for AITextGeneratorBlock, `~N · by USD` for Jina/Exa/Firecrawl. ## Follow-ups (not in this PR) - Stagehand / Meeting BaaS / Ayrshare: expose provider-side unit cost (session-seconds, meeting duration, platform analytics credits) to migrate from interim flat/walltime to fully dynamic `COST_USD`. - Replicate / Revid: walltime-based billing once response cost is piped through. - AgentMail: final rate once paid tier is published. |
||
|
|
43b38f6989 |
fix(backend/copilot): surface non-zero E2B exits as real results, not sandbox errors (#12904)
## Why
`gh auth status` looked flaky in the E2B sandbox. Not actually flaky: it
fails deterministically when the user has not connected GitHub (or the
token is missing/expired), and our wrapper disguises that legitimate
exit-1 as a sandbox infrastructure failure.
Root cause: E2B's `sandbox.commands.run()` raises `CommandExitException`
for **any** non-zero exit. We caught it as a generic `Exception` and
returned an `ErrorResponse` with message:
```
E2B execution failed: Command exited with code 1 and error:
{stderr}
```
When the model runs `gh auth status 2>&1`, stderr is redirected to
stdout — so `exc.stderr` is empty **and** `exc.stdout` (which carries
the real info, e.g. "You are not logged into any GitHub hosts") is
discarded. The model sees a generic infra failure, can't tell it's an
auth-check signal, and prompts the user with broken-looking errors
instead of calling `connect_integration(provider="github")`.
Compare: the local bubblewrap path already handles non-zero exits
correctly by returning a `BashExecResponse` with `exit_code` set. The
E2B path was asymmetric.
## What
- Import `CommandExitException` and catch it explicitly in
`_execute_on_e2b` before the generic handler.
- Return a `BashExecResponse` with the real `exit_code`, `stdout`, and
`stderr` from the exception (scrubbed of injected secret values, same as
the success path).
- Extract shared scrub/build logic into `_build_response` to avoid
duplicating it across the success and exit-exception branches.
- Keep `TimeoutException` and the catch-all `except Exception` for real
infra failures.
## How
Result shape now matches bubblewrap: non-zero exit is a valid result,
not an error. The model sees:
```
message: "Command executed with status code 1"
exit_code: 1
stdout: "You are not logged into any GitHub hosts. ..."
stderr: ""
```
instead of the prior cryptic "E2B execution failed" message.
## Test plan
- [x] New unit test `test_nonzero_exit_returned_as_bash_exec_response`
in `bash_exec_test.py` — mocks `sandbox.commands.run` to raise
`CommandExitException`, asserts `BashExecResponse` with correct
`exit_code`, and verifies secret scrubbing on both `stdout` and
`stderr`.
- [x] `poetry run pytest backend/copilot/tools/bash_exec_test.py` — 5
passed.
- [x] `poetry run pyright` on changed files — 0 errors.
- [x] `poetry run ruff` — clean.
|