AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
majdyz	cd8079dba2	test: add CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE to _CONFIG_ENV_VARS test_default_config_is_enabled uses _clean_config_env to ensure env vars don't pollute the ChatConfig constructor test. The new claude_agent_cross_user_prompt_cache field reads from CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE, but that var was missing from the list — leaving the test non-deterministic if that env var is set in CI.	2026-04-13 01:23:08 +00:00
majdyz	0ab7c9852c	fix: remove accidentally committed worktree, add to gitignore	2026-04-13 00:54:11 +00:00
majdyz	fa6cc99a8a	fix(backend): format service.py and test files	2026-04-13 00:54:01 +00:00
majdyz	54f507b54b	fix(backend): address PR review — extract testable helper, add TypedDict, rename config field - Extract _build_system_prompt_value() helper so tests exercise production code instead of reconstructing the dict locally. - Add _SystemPromptPreset TypedDict for proper type annotation (replaces str \| dict[str, Any]). - Rename claude_agent_exclude_dynamic_sections → claude_agent_cross_user_prompt_cache for clarity.	2026-04-13 00:48:37 +00:00
majdyz	c4e48b5c71	perf(backend): enable cross-user prompt caching via SystemPromptPreset Use SystemPromptPreset with exclude_dynamic_sections=True in the SDK path so the Claude Code default prompt serves as a cacheable prefix shared across all users. Our custom prompt is appended after it, and dynamic sections (working dir, git status, auto-memory) are excluded from the prefix -- giving cross-user cache hits that reduce input token cost by ~90%. Add claude_agent_exclude_dynamic_sections config field (default True) to make this configurable, with fallback to raw string when disabled.	2026-04-13 00:39:30 +00:00
Zamil Majdy	b319c26cab	feat(platform/admin): per-model cost breakdown, cache token tracking, OrchestratorBlock cost fix (#12726 ) ## Why The platform cost tracking system had several gaps that made the admin dashboard less accurate and harder to reason about: Q: Do we have per-model granularity on the provider page? The `model` column was stored in `PlatformCostLog` but the SQL aggregation grouped only by `(provider, tracking_type)`, so all models for a given provider collapsed into one row. Now grouped by `(provider, tracking_type, model)` — each model gets its own row. Q: Why does Anthropic show `per_run` for OrchestratorBlock? Bug: `OrchestratorBlock._call_llm()` was building `NodeExecutionStats` with only `input_token_count` and `output_token_count` — it dropped `resp.provider_cost` entirely. For OpenRouter calls this silently discarded the `cost_usd`. For the SDK (autopilot) path, `ResultMessage.total_cost_usd` was never read. When `provider_cost` is None and token counts are 0 (e.g. SDK error path), `resolve_tracking` falls through to `per_run`. Fixed by propagating all cost/cache fields. Q: Why can't we get `cost_usd` for Anthropic direct API calls? The Anthropic Messages API does not return a dollar amount — only token counts. OpenRouter returns cost via response headers, so it uses `cost_usd` directly. The Claude Agent SDK does compute `total_cost_usd` internally, so SDK-mode OrchestratorBlock runs now get `cost_usd` tracking. For direct Anthropic LLM blocks the estimate uses per-token rates (see cache section below). Q: What about labeling by source (autopilot vs block)? Already tracked: `block_name` stores `copilot:SDK`, `copilot:Baseline`, or the actual block name. Visible in the raw logs table. Not added to the provider group-by (would explode row count); use the logs table filter instead. Q: Is there double-counting between `tokens`, `per_run`, and `cost_usd`? No. `resolve_tracking()` uses a strict preference hierarchy — exactly one tracking type per execution: `cost_usd` > `tokens` > provider heuristics > `per_run`. A single execution produces exactly one `PlatformCostLog` row. Q: Should we track Anthropic prompt cache tokens (PR #12725)? Yes — PR #12725 adds `cache_control` markers to Anthropic API calls, which causes the API to return `cache_read_input_tokens` and `cache_creation_input_tokens` alongside regular `input_tokens`. These have different billing rates: - Cache reads: 10% of base input rate (much cheaper) - Cache writes: 125% of base input rate (slightly more expensive, one-time) - Uncached input: 100% of base rate Without tracking them separately, a flat-rate estimate on `total_input_tokens` would be wrong in both directions. ## What - Per-model provider table: SQL now groups by `(provider, tracking_type, model)`. `ProviderCostSummary` and the frontend `ProviderTable` show a model column. - Cache token columns: New `cacheReadTokens` and `cacheCreationTokens` columns in `PlatformCostLog` with matching migration. - LLM block cache tracking: `LLMResponse` captures `cache_read_input_tokens` / `cache_creation_input_tokens` from Anthropic responses. `NodeExecutionStats` gains `cache_read_token_count` / `cache_creation_token_count`. Both propagate to `PlatformCostEntry` and the DB. - Copilot path: `token_tracking.persist_and_record_usage` now writes cache tokens as dedicated `PlatformCostEntry` fields (was metadata-only). - OrchestratorBlock bug fix: `_call_llm()` now includes `resp.provider_cost`, `resp.cache_read_tokens`, `resp.cache_creation_tokens` in the stats merge. SDK path captures `ResultMessage.total_cost_usd` as `provider_cost`. - Accurate cost estimation: `estimateCostForRow` uses token-type-specific rates for `tokens` rows (uncached=100%, reads=10%, writes=125% of configured base rate). ## How `resolve_tracking` priority is unchanged. For Anthropic LLM blocks the tracking type remains `tokens` (Anthropic API returns no dollar amount). For OrchestratorBlock in SDK/autopilot mode it now correctly uses `cost_usd` because the Claude Agent SDK computes and returns `total_cost_usd`. For OpenRouter through OrchestratorBlock it now correctly uses `cost_usd` (was silently dropped before). ## Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `ProviderCostSummary` SQL updated - [x] Cache token fields present in `PlatformCostEntry` and `PlatformCostLogCreateInput` - [x] Prisma client regenerated — all type checks pass - [x] Frontend `helpers.test.ts` updated for new `rateKey` format - [x] Pre-commit hooks pass (Black, Ruff, isort, tsc, Prisma generate)	2026-04-10 23:14:43 +07:00
Zamil Majdy	85921f227a	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into preview/all-active-prs	2026-04-10 22:59:30 +07:00
Zamil Majdy	5844b13fb1	feat(backend/copilot): support multiple questions in ask_question tool (#12732 ) ### Why / What / How Why: The `ask_question` copilot tool previously only accepted a single question per invocation. When the LLM needs to ask multiple clarifying questions simultaneously, it either crams them into one text field (requiring users to format numbered answers manually) or makes multiple sequential tool calls (slow and disruptive UX). What: Replace the single `question`/`options`/`keyword` parameters with a `questions` array parameter so the LLM can ask multiple questions in one tool call, each rendered as its own input box. How: Simplified the tool to accept only `questions` (array of question objects). Each item has `question` (required), `options`, and `keyword`. The frontend `ClarificationQuestionsCard` already supports rendering multiple questions — no frontend changes needed. ### Changes 🏗️ - `backend/copilot/tools/ask_question.py`: Replaced dual question/questions schema with single `questions` array. Extracted parsing into module-level `_parse_questions` and `_parse_one` helpers. Follows backend code style: early returns, list comprehensions, top-down ordering, functions under 40 lines. - `backend/copilot/tools/ask_question_test.py`: Rewritten with 18 focused tests covering happy paths, keyword handling, options filtering, and invalid input handling. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py` — all tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 21:54:53 +07:00
Zamil Majdy	c014e1aa35	merge(preview): merge all active PRs into preview/all-active-prs from fresh dev	2026-04-10 08:40:23 +07:00
Zamil Majdy	e59f576622	Merge remote-tracking branch 'origin/spare/13' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	c99fa32ae3	Merge remote-tracking branch 'origin/spare/3' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	b71789da50	Merge remote-tracking branch 'origin/feat/subscription-tier-billing' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	5661326e7e	fix(platform): fetch real Stripe prices in subscription status endpoint - Import get_subscription_price_id in v1.py - get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS tiers to return actual unit_amount instead of hardcoded zeros - UI will now show correct monthly costs when LD price IDs are configured - Fix Button import from __legacy__ to design system in SubscriptionTierSection - Update subscription status tests to mock the new Stripe price lookup	2026-04-10 08:37:40 +07:00
Zamil Majdy	df3fe926f2	style(backend/copilot): apply Black formatting to ask_question Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:56:42 +00:00
Zamil Majdy	505af7e673	refactor(backend/copilot): simplify ask_question to questions-only API Drop the dual question/questions schema in favor of a single `questions` array parameter. This removes ~175 lines of complexity (the _execute_single path, duplicate params, precedence logic). Restructured per backend code style rules: - Top-down ordering: public _execute first, helpers below - Early return with guard clauses, no deep nesting - List comprehensions via walrus operator in _parse_questions - Helpers extracted as module-level functions (not methods) - Functions under 40 lines each The frontend ClarificationQuestionsCard already renders arrays of any length — no UI changes needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:54:11 +00:00
Zamil Majdy	d896a1f9fa	fix(backend/copilot): add missing isinstance assertion in test Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params to fix Pyright type-check CI failure (reportAttributeAccessIssue). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:48:02 +00:00
Zamil Majdy	6aa5a808e0	fix(backend/copilot): add isinstance assertions to fix type-check CI Tests that access `result.questions` without first narrowing the type from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright type-check failures. Added `assert isinstance(result, ClarificationNeededResponse)` before accessing `.questions` in 4 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:40:08 +00:00
Zamil Majdy	18c88b4da0	fix(frontend/builder): always clear messages on flowID change to keep action state consistent When navigating back to a cached session, appliedActionKeys was reset to empty but messages were preserved. This caused previously applied actions to reappear as unapplied in the UI, allowing them to be re-applied and creating duplicate undo entries. Clearing messages unconditionally on navigation ensures the displayed action buttons always reflect the actual applied state.	2026-04-10 02:03:56 +07:00
Zamil Majdy	3a5ce570e0	fix(backend/copilot): address PR review round 4 - Restore top-level `required: ["question"]` in schema for LLM tool- calling compatibility; validation handles the questions-only path - Fix keyword null bug: `item.get("keyword")` returning None now correctly falls back to `question-{idx}` instead of producing "None" - Filter empty-string options in _build_question (`str(o).strip()`) to avoid artifacts like "Email, , Slack" - Revert session type hint to `ChatSession` to match base class contract - Add tests for null keyword and empty-string options filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:56:37 +00:00
Zamil Majdy	5a3739e54d	fix(backend/copilot): address PR review round 2 - Remove top-level `required: ["question"]` from schema so the `questions`-only calling convention is valid for schema-compliant LLMs - Move logger assignment below all imports (PEP 8 / isort) - Remove duplicated option filtering in `_execute_single`; let `_build_question` own that responsibility - Fix `session` type hint to `ChatSession \| None` to match the guard - Add test for `questions` as non-list type (falls back to single path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:43:11 +00:00
Zamil Majdy	72bc8a92df	fix(frontend/builder): guard msg.parts with nullish coalescing to prevent runtime error	2026-04-10 01:41:15 +07:00
Zamil Majdy	cc29cf5e20	fix(backend/copilot): address PR review round 1 - Fix falsy option filtering: use `if o is not None` instead of `if o` so valid values like "0" are preserved - Improve multi-question `message` field: join all questions with ";" instead of only using the first question's text - Add logging warnings for skipped invalid items in multi-question path instead of silently dropping them - Simplify schema: use `"required": ["question"]` instead of empty required + anyOf (more LLM-friendly) - Add missing test cases: session=None, single-item questions array, duplicate keywords, falsy option values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:39:55 +00:00
Zamil Majdy	a0efbbba90	feat(backend/copilot): support multiple questions in ask_question tool The ask_question tool previously only accepted a single question per invocation, forcing the LLM to cram multiple queries into one text box or make multiple sequential tool calls. This adds a `questions` parameter (list of question objects) so multiple input fields render at once. Backward-compatible: the existing `question`/`options`/`keyword` params still work. When `questions` (plural) is provided, they take precedence. The frontend ClarificationQuestionsCard already supports rendering multiple questions — no frontend changes needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:21:35 +00:00
Zamil Majdy	8ed959433a	fix(frontend/builder): clear stale messages in retrySession so new session starts clean	2026-04-10 00:56:31 +07:00
Zamil Majdy	98f3e09580	fix(frontend/builder): reset hasSentSeedMessageRef in retrySession so seed is sent to new session	2026-04-10 00:39:10 +07:00
Zamil Majdy	9ec44dd109	test(backend): add route-level tests for subscription API endpoints Tests for GET/POST /credits/subscription covering: - GET returns current tier (PRO, FREE default when None) - POST FREE skips Stripe when payment disabled - POST PRO sets tier directly for beta users (payment disabled) - POST paid tier rejects missing success_url/cancel_url with 422 - POST paid tier creates Stripe Checkout Session and returns URL - POST FREE with payment enabled cancels active Stripe subscription	2026-04-10 00:19:06 +07:00
Zamil Majdy	bfb82b6246	fix(platform): address reviewer feedback on subscription endpoint - Remove useCallback from changeTier (not needed per project guidelines) - Block self-service tier changes for ENTERPRISE users (admin-managed) - Preserve current tier on unrecognized Stripe price_id instead of defaulting to FREE (prevents accidental downgrades during price migration)	2026-04-10 00:08:54 +07:00
Zamil Majdy	63210770ce	test(backend): add tests for get_subscription_price_id to improve coverage	2026-04-09 23:54:02 +07:00
Zamil Majdy	f2b8f81bb1	test(backend/copilot): add unit tests for update_message_content_by_sequence Cover success, not-found (returns False + warning), and DB-error (returns False + error log) paths to push patch coverage above the 80% threshold.	2026-04-09 23:52:39 +07:00
Zamil Majdy	68b51ae2d3	test(backend): add coverage for sync_subscription_from_stripe edge cases Tests for: - Unknown/mismatched Stripe price_id defaults to FREE (not early return) - None from LaunchDarkly price flags defaults to FREE - BUSINESS tier mapping - StripeError during cancel_stripe_subscription is logged, not raised	2026-04-09 23:52:16 +07:00
Zamil Majdy	63ff214563	fix(backend): default to FREE tier on unknown Stripe price ID in webhook sync When sync_subscription_from_stripe encounters an unrecognized price_id (e.g. LD flags unconfigured or price changed), it no longer returns early leaving the user on a stale tier. Instead it defaults to FREE and logs a warning, keeping the DB state consistent with Stripe's subscription status. Also guard against None pro_price/biz_price from LaunchDarkly before comparison to avoid silent mismatches.	2026-04-09 23:41:51 +07:00
Zamil Majdy	9498daca31	fix(frontend/builder): wrap panel in CopilotChatActionsProvider to prevent crash EditAgentTool and RunAgentTool call useCopilotChatActions() which throws if no provider is in the tree. Wrap the panel content with CopilotChatActionsProvider wired to sendRawMessage so tool components can send retry prompts without crashing.	2026-04-09 23:41:06 +07:00
Zamil Majdy	ce0cb1e035	fix(backend/copilot): persist user-context prefix to DB in both SDK and baseline paths The user message was saved to DB before the <user_context> prefix was added to session.messages. Subsequent upsert_chat_session calls only append new messages (slicing by existing_message_count), so the prefixed content was never written to the DB. On page reload or --resume, the unprefixed version was loaded, losing personalisation. Fix: add update_message_content_by_sequence to db.py and call it after injecting the prefix in both sdk/service.py and baseline/service.py.	2026-04-09 23:40:14 +07:00
Zamil Majdy	0d89f7bb33	fix(backend): handle customer.subscription.created webhook event Add customer.subscription.created to the sync handler so user tier is upgraded immediately when the subscription is first created (not just on subsequent updates/deletions).	2026-04-09 23:39:16 +07:00
Zamil Majdy	aef9298be6	test(platform/admin): add cache token and retry cost accumulation tests Add unit tests for: - Anthropic cache_read_tokens/cache_creation_tokens in llm_call response - cache token accumulation in AIStructuredResponseGeneratorBlock stats - provider_cost persistence on exhausted retry path - usd_to_microdollars None-safe branch - explicit start param covering _build_where false branch - cache token columns in platform_cost integration test	2026-04-09 23:33:21 +07:00
Zamil Majdy	e5ea2e0d5b	fix(backend/copilot): fix stale docstring referencing anthropic.omit instead of NOT_GIVEN	2026-04-09 23:24:43 +07:00
Zamil Majdy	4eabc48053	fix(backend): fix migration conflict with dev's SubscriptionTier migration dev branch already creates SubscriptionTier enum and subscriptionTier column in 20260326200000_add_rate_limit_tier. Remove duplicate DDL from our migration and only add SUBSCRIPTION to CreditTransactionType using IF NOT EXISTS guard.	2026-04-09 23:24:12 +07:00
Zamil Majdy	101504ce0b	fix(platform): cancel Stripe subscription when downgrading to FREE tier Add cancel_stripe_subscription() which lists and cancels all active Stripe subscriptions for the customer, preventing continued billing after downgrade. Call it from update_subscription_tier() when tier == FREE and payment is enabled. Add two unit tests covering active and empty subscription scenarios.	2026-04-09 23:21:27 +07:00
Zamil Majdy	2f67249d5f	test(platform/admin): increase patch coverage for export endpoint and cache token tracking Add tests for the /logs/export endpoint (success, truncated, filters, auth) and fix missing import of get_platform_cost_logs_for_export in platform_cost_test.py.	2026-04-09 23:20:37 +07:00
Zamil Majdy	e73b5b3692	fix(backend): validate success_url/cancel_url for paid Stripe checkout Add upfront 422 validation when upgrading to a paid tier without providing redirect URLs. Also catch stripe.StripeError alongside ValueError to return a proper 422 instead of a 500 on Stripe API errors.	2026-04-09 23:18:16 +07:00
Zamil Majdy	57c0c86a10	fix(frontend/builder): skip Escape-to-close when focus is in textarea/input Pressing Escape while drafting a message was silently discarding the user's text. Guard the handler so it only closes the panel when focus is outside an editable element.	2026-04-09 23:15:56 +07:00
Zamil Majdy	77d8362983	docs(blocks): sync misc.md with memory_search/memory_store tools from dev merge	2026-04-09 23:15:02 +07:00
Zamil Majdy	201d88b846	Merge remote-tracking branch 'origin/dev' into spare/3	2026-04-09 23:14:33 +07:00
Zamil Majdy	611a00d930	fix(backend): resolve dev merge conflict and remove credit-based subscription cost Remove get_subscription_cost (referenced deleted flags SUBSCRIPTION_COST_PRO/BUSINESS). Subscription pricing is now handled by Stripe. Add GRAPHITI_MEMORY flag from dev.	2026-04-09 23:14:15 +07:00
Zamil Majdy	8d31bdb2dc	fix(platform): address remaining review comments on subscription billing - Remove `# type: ignore[attr-defined]` suppressors from `set_auto_top_up` and `set_subscription_tier` — pyright resolves `CachedFunction.cache_delete` through the import boundary without the suppressor - Add `max(0, ...)` guard to `get_subscription_cost` to prevent negative LaunchDarkly flag values from yielding negative costs - Change `SubscriptionTierRequest.tier` from `str` to `Literal["FREE", "PRO", "BUSINESS"]` so Pydantic rejects ENTERPRISE and any unknown tier with a 422 at the schema layer - Move `SubscriptionTier` and feature-flag imports from local function scope to module-level in v1.py (top-level imports policy) - Fix `test_sync_subscription_from_stripe_active` mock to use a proper async `side_effect` function instead of calling an `AsyncMock` inline	2026-04-09 23:06:40 +07:00
Zamil Majdy	2e64f3add7	feat(frontend): redirect to Stripe checkout when upgrading subscription POST /credits/subscription now returns {url} when Stripe checkout is needed. Redirect user to Stripe on non-empty URL, refresh tier on empty URL (beta/FREE). Remove credit-based tier validation; Stripe handles payment gating.	2026-04-09 22:58:58 +07:00
Zamil Majdy	b7f242f163	chore(backend/copilot): merge dev to pick up graphiti memory and update docs	2026-04-09 22:58:12 +07:00
Zamil Majdy	98c0920c04	fix(platform/admin): revert unrelated openapi.json changes to match backend schema - Restore CreditTransactionType to original enum without SUBSCRIPTION - Restore input/ctx fields in ValidationError schema These changes were accidentally included from workspace drift; they are not part of this PR and should come from their own respective PRs.	2026-04-09 22:54:02 +07:00
Zamil Majdy	4942249a60	fix(platform): resolve merge conflicts with dev branch Merges latest dev branch changes into feat/subscription-tier-billing. Updates credit_subscription_test.py to match new Stripe-based implementation.	2026-04-09 22:51:06 +07:00
Zamil Majdy	0c94d884d0	fix(backend): use monkeypatch.setattr in test and use typed sentry_sdk imports - Replace type: ignore suppressor with monkeypatch.setattr in AIConditionBlock test - Replace bare sentry_sdk module with typed API imports in metrics/service/manager	2026-04-09 22:50:58 +07:00

1 2 3 4 5 ...

8351 Commits