AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
majdyz	12601f3ab9	fix(copilot): cap sessionModes at 200 entries to prevent localStorage leak	2026-04-13 12:54:53 +00:00
majdyz	47be9c7024	fix(copilot): default to thinking mode for sessions without recorded mode Sessions created before the mode fix had no recorded mode. Previously restoreSessionMode would leave the global mode unchanged (whatever it was set to on another session). Now defaults to extended_thinking when no mode is recorded — no need to clear localStorage.	2026-04-13 12:52:01 +00:00
majdyz	c9fadf20e1	fix(copilot): record current session mode before switching away Old sessions (created before the mode fix) didn't have a recorded mode, so switching away and back would lose the mode. Now we record the current mode for the departing session before switching.	2026-04-13 12:48:03 +00:00
majdyz	7d16258a98	perf(copilot): reduce tool output truncation from 500K to 100K chars 500K chars (~125K tokens) per tool result was too generous — a few large tool outputs could push context past 200K+ tokens. 100K chars (~25K tokens) keeps individual results reasonable. The SDK writes oversized results to tool-results/ files and returns a reference.	2026-04-13 12:24:35 +00:00
majdyz	ac054c31f6	perf(copilot): trigger compaction at 100K tokens instead of 140K Set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50 to compact at 50% of 200K context window (100K) instead of the default 70% (140K). Context >200K accounts for 54% of cost despite being only 3% of calls. Earlier compaction keeps context smaller and reduces cache creation.	2026-04-13 12:15:52 +00:00
majdyz	1d3cce0ebf	fix(copilot): strip <internal_reasoning> tags from Sonnet response stream Models without extended thinking (e.g. Sonnet) sometimes emit <internal_reasoning>...</internal_reasoning> tags as visible text. Extract ThinkingStripper to a shared module and apply it to the SDK streaming path so these tags are stripped before reaching the SSE client and the persisted message.	2026-04-13 11:50:43 +00:00
majdyz	ea1d8485f5	fix: resolve openapi.json merge conflict — keep cost_bearing_request_count	2026-04-13 11:39:01 +00:00
majdyz	364d98aab6	fix(copilot): remove effort=low default to prevent internal_reasoning leak effort=low on Sonnet causes <internal_reasoning> tags to leak into visible output. Changed default to None (let model decide). Only passed to SDK when explicitly set via CHAT_CLAUDE_AGENT_THINKING_EFFORT.	2026-04-13 11:36:16 +00:00
majdyz	f121dcd5c8	Resolve merge conflicts in copilot baseline service files Keep HEAD's pre-drain count logic for transcript loading and drain error handling, and merge incoming cache token extraction tests from PR #12762.	2026-04-13 10:49:02 +00:00
majdyz	ea0b5f70ad	Fix merge conflict in platform_cost.py crashing all new pods Resolve conflicts between cost dashboard PR (#12757) and cache token columns PR (#12762). Keep all HEAD-side functionality (percentile queries, histogram buckets, cost-bearing request counts, unfiltered aggregate) while retaining cache token fields from the incoming side.	2026-04-13 10:37:49 +00:00
majdyz	dbaaa88e1b	perf(copilot): switch default model from Opus to Sonnet Opus at $15/$75 per M tokens is unsustainable for agentic sessions (1M+ context after 30+ turns = $7+/turn). Sonnet at $3/$15 per M is 5x cheaper with comparable quality for most tasks. Override via CHAT_MODEL=anthropic/claude-opus-4.6 for premium tier.	2026-04-13 10:25:49 +00:00
majdyz	ec2acfb9e3	fix(frontend): add cache token fields to UserCostSummary in openapi.json The backend added total_cache_read_tokens and total_cache_creation_tokens to UserCostSummary but the OpenAPI spec was not updated, causing frontend build failures.	2026-04-13 10:13:18 +00:00
majdyz	69e9a5bb22	fix(frontend): add cache token fields to UserCostSummary in openapi.json The backend added total_cache_read_tokens and total_cache_creation_tokens to UserCostSummary but the OpenAPI spec was not updated, causing frontend build failures.	2026-04-13 10:12:44 +00:00
majdyz	95087cd170	Merge branch 'fix/copilot-mode-per-session' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:58:49 +00:00
majdyz	1e7eadce26	fix(copilot): validate persisted session modes, add removeSessionMode, fix useEffect deps - Validate entries from localStorage before constructing the sessionModes map, filtering out corrupt/unknown mode strings (addresses CodeRabbit review) - Add removeSessionMode action and call it on session delete so the map does not grow unboundedly - Add recordSessionMode to the useEffect dependency array to avoid stale-closure risk - Add clarifying comment to restoreSessionMode no-op branch - Extend tests to cover removeSessionMode, no-op, and corrupt-localStorage behaviour	2026-04-13 09:57:14 +00:00
majdyz	1485d1910c	Merge branch 'fix/sse-replay-deduplication' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:56:12 +00:00
majdyz	89c9c649d8	fix: resolve merge conflicts in UserTable.tsx — keep all columns (avg cost + cache read/write)	2026-04-13 09:55:56 +00:00
majdyz	a17f05f2b1	fix(copilot): scope dedup fingerprint by user message ID instead of text Using user message text as the context key caused the deduplicator to drop the second assistant reply when a user asked the same question twice in one session. Switching to user message ID (which is unique per turn) fixes the false positive while still preventing SSE-replayed duplicates. Adds a regression test covering the same-question-twice scenario.	2026-04-13 09:55:54 +00:00
majdyz	62e4a8d3a4	Merge branch 'fix/copilot-mode-per-session' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:54:21 +00:00
majdyz	c6af52033d	fix(copilot): fix multi-turn cost over-estimation and add cache_creation_tokens extraction Bug 1: Fallback cost estimation was using accumulated turn_prompt_tokens / turn_completion_tokens across all tool-call rounds, causing compounding over-estimation on the 2nd+ turn. Snapshot token counts before each call and pass only the per-call delta to _estimate_cost_from_tokens. Bug 2: turn_cache_creation_tokens was defined but never populated. Extract cache_creation_input_tokens from prompt_tokens_details (available from some providers such as Anthropic via OpenRouter). Add regression tests for both fixes.	2026-04-13 09:53:05 +00:00
majdyz	1df9369dc3	perf(copilot): add effort=low thinking control + raise budget to $15 - Add claude_agent_thinking_effort config (default: 'low') to control thinking depth. 'low' minimizes thinking token usage — the #1 cost driver at 49% of total spend. - Raise max_budget_usd from $5 to $15 — $5 was below p50 ($5.37), causing half of all turns to get budget-killed mid-task. - Log raw SDK usage dict to discover thinking token fields.	2026-04-13 09:43:26 +00:00
majdyz	f6c7d1eaf7	fix(copilot): baseline cost tracking fallback and dashboard cache token display When OpenRouter's x-total-cost header is missing, estimate cost from token counts using a known model pricing table so cost is always logged. Also extract cache token details from streaming usage chunks (prompt_tokens_details.cached_tokens) and pass them through to PlatformCostLog. On the dashboard side, add cache read/write columns to the logs table and user table, and include cache tokens in the UserCostSummary backend model so they surface in the API response.	2026-04-13 09:39:44 +00:00
majdyz	85f76230a9	debug(copilot): log raw SDK usage dict to discover thinking token fields Temporary debug logging to see all fields in ResultMessage.usage — need to confirm if thinking_tokens or similar is available but not being captured.	2026-04-13 09:35:05 +00:00
majdyz	f63440e955	fix(copilot): store mode per session so indicator updates on switch The copilot mode (fast/extended_thinking) was stored as a single global value. When switching between sessions, the mode indicator stayed on whatever was last set globally rather than reflecting the mode each session was created with. Add a sessionModes map to the Zustand store that records the active copilotMode when a session is created and restores it when the user switches back to that session.	2026-04-13 09:32:45 +00:00
majdyz	f52c1e1f24	fix(copilot): raise max_budget_usd from $5 to $15 $5 was too aggressive — p50 cost is $5.37 so half of all turns were getting budget-killed mid-task with no value delivered. $15 covers p75 ($13.07) so ~75% of tasks complete. The thinking token cap is the better cost lever but needs verification first.	2026-04-13 08:47:16 +00:00
majdyz	b5216da2d8	fix(copilot): disable gzip on API responses to prevent ZlibError Add Accept-Encoding: identity to ANTHROPIC_CUSTOM_HEADERS in build_sdk_env() to prevent ZlibError decompression failures in the CLI subprocess. Appended after any existing custom headers (OpenRouter trace headers). See: oven-sh/bun#23149, anthropics/claude-code#18302	2026-04-13 08:26:01 +00:00
majdyz	ffa74177d0	fix: add ::timestamptz casts to raw SQL datetime comparisons in _build_raw_where The raw SQL WHERE clause builder was passing datetime parameters without explicit type casts, causing PostgreSQL to fail with "operator does not exist: timestamp without time zone >= text".	2026-04-13 08:23:43 +00:00
majdyz	b6b94a2244	Merge branch 'fix/sse-replay-deduplication' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:06:29 +00:00
majdyz	7cadce4c7b	fix(copilot): deduplicate SSE-replayed messages by content fingerprint When the SSE connection reconnects, resume_session_stream replays from "0-0" and the replayed UIMessage objects get new IDs from useChat, bypassing the adjacent-only content dedup. Switch deduplicateMessages to track all seen role+context+content fingerprints globally, scoped by the preceding user message to avoid false positives when the assistant legitimately gives identical answers to different prompts.	2026-04-13 08:04:04 +00:00
majdyz	00a20bdfe6	fix(copilot): deduplicate SSE-replayed messages by content fingerprint When the SSE connection reconnects, resume_session_stream replays from "0-0" and the replayed UIMessage objects get new IDs from useChat, bypassing the adjacent-only content dedup. Switch deduplicateMessages to track all seen role+context+content fingerprints globally, scoped by the preceding user message to avoid false positives when the assistant legitimately gives identical answers to different prompts.	2026-04-13 08:03:51 +00:00
majdyz	e0ddb7d4d4	Merge branch 'feat/enhanced-cost-dashboard' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:03:15 +00:00
majdyz	d8d0f752b5	Merge branch 'feat/builder-chat-panel' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:02:58 +00:00
majdyz	c64d5a9c92	Merge branch 'perf/copilot-prompt-caching' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:02:37 +00:00
majdyz	f8bca6f4bc	Merge commit '2cf737dc0508a7753d067ed8425cfc0ef657b29f' into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/copilot/config.py	2026-04-13 08:02:31 +00:00
majdyz	6c21e58d31	Merge branch 'fix/orchestrator-per-iteration-cost' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:50 +00:00
majdyz	895c9a0d29	Merge branch 'feat/copilot-pending-messages' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:45 +00:00
majdyz	84e877e36d	Merge branch 'fix/schedule-agent-cred-setup-ux' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:40 +00:00
majdyz	a504ad6e1e	Merge branch 'chore/sdk-dev-preview-0.1.58-with-proxy' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:33 +00:00
majdyz	ca0c95b593	fix(frontend): add SUBSCRIPTION to CreditTransactionType enum in openapi.json Syncs the OpenAPI spec with the Prisma schema which already includes the SUBSCRIPTION enum value in CreditTransactionType.	2026-04-13 07:13:21 +00:00
majdyz	cbf309c9e4	Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT into feat/copilot-pending-messages	2026-04-13 07:12:49 +00:00
majdyz	6ccb44e0d5	fix(copilot): add 404/429 to route decorator, reformat routes.py, regenerate openapi.json Add responses={404, 429} to the pending endpoint's @router.post decorator so FastAPI auto-generates them in the OpenAPI spec. Previously these were only manually added to openapi.json and the CI schema-check (export + diff) stripped them. Also apply black formatting to the long warning line that was failing the backend lint check.	2026-04-13 07:04:07 +00:00
majdyz	e558c60104	fix(orchestrator): don't propagate non-billing charge errors as tool failures Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were re-raised and caught by the outer generic handler, incorrectly marking a successful tool execution as failed. This could cause the LLM to retry side-effectful operations. Now logs the error and continues to the success path since the tool itself completed successfully.	2026-04-13 07:02:10 +00:00
majdyz	fbad856538	fix(backend/copilot): relax schedule race test assertion for setup_test_data fixture The setup_test_data fixture creates a graph with credentials already embedded in node defaults. The DB-stored credential schema may not surface these as "missing" in build_missing_credentials_from_graph, so assert the key exists rather than asserting non-empty count.	2026-04-13 06:59:18 +00:00
majdyz	3ebfa3d68b	fix(backend/copilot): address round-6 review — DRY validation handler, improve tests - Extract duplicated GraphValidationError handler from _run_agent and _schedule_agent into _handle_graph_validation_race helper method - Use generator expressions instead of list comprehension for short-circuit evaluation in _build_setup_requirements_from_validation_error - Improve mixed-error fallback message to be more user-friendly - Add test for empty node_errors={} edge case - Pin expected credential count in firecrawl fixture tests - Add missing_credentials assertion to schedule race E2E test - Add test for extras present with node_errors=None in service_test	2026-04-13 06:45:28 +00:00
majdyz	5ff46ff207	fix(backend): address review feedback on orchestrator billing - Extract post-execution billing into _handle_post_execution_billing() - Deduplicate IBE notification into _try_send_insufficient_funds_notif() - Combine _charge_usage + _handle_low_balance into single thread dispatch - Sanitize error messages to LLM (no internal details leaked) - Default _is_error to True (fail-closed) for tool responses - Add IBE propagation contract to OrchestratorBlock class docstring - Reduce per-site IBE comments to one-liners referencing class docstring - Fix _resolve_block_cost return type annotation (Block \| None) - Move test imports to module level, fix test_default_block_returns_zero - Add tests for non-IBE billing failure and _charge_usage(count=0) - Fix Black formatting (CI lint blocker)	2026-04-13 06:44:20 +00:00
majdyz	2cf737dc05	fix(backend): address review comments on cross-user prompt caching PR - Add TODO(#12747) to _SystemPromptPreset for cleanup tracking - Update docstring to note SDK version and migration path - Add debug logging in _build_system_prompt_value for observability - Document empty-string edge case in docstring - Trim redundant block comment at call site to single line - Add test for empty-string system_prompt with cache enabled - Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var	2026-04-13 06:43:57 +00:00
majdyz	040637dd68	fix: force cost_usd for percentile/histogram queries, fix test + prettier - Backend: always pass tracking_type=None to _build_raw_where for percentile and histogram queries so they compute stats on cost_usd rows regardless of the caller's tracking_type filter. - Frontend test: use getAllByText for "5" which appears in both the Active Users card and the $1-2 bucket count. - Frontend: fix prettier formatting in PlatformCostContent.tsx.	2026-04-13 06:36:59 +00:00
majdyz	90d8ae0ae2	fix(copilot): map non-E2B file tools in permissions and fix lint formatting In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in file tool names (Write, Edit, Read) to their MCP-prefixed equivalents (mcp__copilot__Write, etc.), causing them to be incorrectly filtered out when users configured tool whitelists. Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file, Write->Write, Edit->Edit. Add test coverage for this case. Also fix black formatting in permissions_test.py that was causing CI lint failure.	2026-04-13 06:34:55 +00:00
majdyz	967f0c97c4	fix(copilot): fix black formatting for single-line ValueError raise	2026-04-13 06:29:25 +00:00
majdyz	7dc4319125	fix: correct group_by count in test_passes_filters_to_queries The 6th group_by (total agg no-tracking-type) only runs when tracking_type is set. This test doesn't pass tracking_type, so the expected count is 5, not 6.	2026-04-13 05:28:12 +00:00

1 2 3 4 5 ...

8634 Commits