AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
majdyz	ec2acfb9e3	fix(frontend): add cache token fields to UserCostSummary in openapi.json The backend added total_cache_read_tokens and total_cache_creation_tokens to UserCostSummary but the OpenAPI spec was not updated, causing frontend build failures.	2026-04-13 10:13:18 +00:00
majdyz	95087cd170	Merge branch 'fix/copilot-mode-per-session' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:58:49 +00:00
majdyz	1e7eadce26	fix(copilot): validate persisted session modes, add removeSessionMode, fix useEffect deps - Validate entries from localStorage before constructing the sessionModes map, filtering out corrupt/unknown mode strings (addresses CodeRabbit review) - Add removeSessionMode action and call it on session delete so the map does not grow unboundedly - Add recordSessionMode to the useEffect dependency array to avoid stale-closure risk - Add clarifying comment to restoreSessionMode no-op branch - Extend tests to cover removeSessionMode, no-op, and corrupt-localStorage behaviour	2026-04-13 09:57:14 +00:00
majdyz	1485d1910c	Merge branch 'fix/sse-replay-deduplication' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:56:12 +00:00
majdyz	89c9c649d8	fix: resolve merge conflicts in UserTable.tsx — keep all columns (avg cost + cache read/write)	2026-04-13 09:55:56 +00:00
majdyz	a17f05f2b1	fix(copilot): scope dedup fingerprint by user message ID instead of text Using user message text as the context key caused the deduplicator to drop the second assistant reply when a user asked the same question twice in one session. Switching to user message ID (which is unique per turn) fixes the false positive while still preventing SSE-replayed duplicates. Adds a regression test covering the same-question-twice scenario.	2026-04-13 09:55:54 +00:00
majdyz	62e4a8d3a4	Merge branch 'fix/copilot-mode-per-session' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 09:54:21 +00:00
majdyz	c6af52033d	fix(copilot): fix multi-turn cost over-estimation and add cache_creation_tokens extraction Bug 1: Fallback cost estimation was using accumulated turn_prompt_tokens / turn_completion_tokens across all tool-call rounds, causing compounding over-estimation on the 2nd+ turn. Snapshot token counts before each call and pass only the per-call delta to _estimate_cost_from_tokens. Bug 2: turn_cache_creation_tokens was defined but never populated. Extract cache_creation_input_tokens from prompt_tokens_details (available from some providers such as Anthropic via OpenRouter). Add regression tests for both fixes.	2026-04-13 09:53:05 +00:00
majdyz	1df9369dc3	perf(copilot): add effort=low thinking control + raise budget to $15 - Add claude_agent_thinking_effort config (default: 'low') to control thinking depth. 'low' minimizes thinking token usage — the #1 cost driver at 49% of total spend. - Raise max_budget_usd from $5 to $15 — $5 was below p50 ($5.37), causing half of all turns to get budget-killed mid-task. - Log raw SDK usage dict to discover thinking token fields.	2026-04-13 09:43:26 +00:00
majdyz	f6c7d1eaf7	fix(copilot): baseline cost tracking fallback and dashboard cache token display When OpenRouter's x-total-cost header is missing, estimate cost from token counts using a known model pricing table so cost is always logged. Also extract cache token details from streaming usage chunks (prompt_tokens_details.cached_tokens) and pass them through to PlatformCostLog. On the dashboard side, add cache read/write columns to the logs table and user table, and include cache tokens in the UserCostSummary backend model so they surface in the API response.	2026-04-13 09:39:44 +00:00
majdyz	85f76230a9	debug(copilot): log raw SDK usage dict to discover thinking token fields Temporary debug logging to see all fields in ResultMessage.usage — need to confirm if thinking_tokens or similar is available but not being captured.	2026-04-13 09:35:05 +00:00
majdyz	f63440e955	fix(copilot): store mode per session so indicator updates on switch The copilot mode (fast/extended_thinking) was stored as a single global value. When switching between sessions, the mode indicator stayed on whatever was last set globally rather than reflecting the mode each session was created with. Add a sessionModes map to the Zustand store that records the active copilotMode when a session is created and restores it when the user switches back to that session.	2026-04-13 09:32:45 +00:00
majdyz	f52c1e1f24	fix(copilot): raise max_budget_usd from $5 to $15 $5 was too aggressive — p50 cost is $5.37 so half of all turns were getting budget-killed mid-task with no value delivered. $15 covers p75 ($13.07) so ~75% of tasks complete. The thinking token cap is the better cost lever but needs verification first.	2026-04-13 08:47:16 +00:00
majdyz	b5216da2d8	fix(copilot): disable gzip on API responses to prevent ZlibError Add Accept-Encoding: identity to ANTHROPIC_CUSTOM_HEADERS in build_sdk_env() to prevent ZlibError decompression failures in the CLI subprocess. Appended after any existing custom headers (OpenRouter trace headers). See: oven-sh/bun#23149, anthropics/claude-code#18302	2026-04-13 08:26:01 +00:00
majdyz	ffa74177d0	fix: add ::timestamptz casts to raw SQL datetime comparisons in _build_raw_where The raw SQL WHERE clause builder was passing datetime parameters without explicit type casts, causing PostgreSQL to fail with "operator does not exist: timestamp without time zone >= text".	2026-04-13 08:23:43 +00:00
majdyz	b6b94a2244	Merge branch 'fix/sse-replay-deduplication' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:06:29 +00:00
majdyz	7cadce4c7b	fix(copilot): deduplicate SSE-replayed messages by content fingerprint When the SSE connection reconnects, resume_session_stream replays from "0-0" and the replayed UIMessage objects get new IDs from useChat, bypassing the adjacent-only content dedup. Switch deduplicateMessages to track all seen role+context+content fingerprints globally, scoped by the preceding user message to avoid false positives when the assistant legitimately gives identical answers to different prompts.	2026-04-13 08:04:04 +00:00
majdyz	00a20bdfe6	fix(copilot): deduplicate SSE-replayed messages by content fingerprint When the SSE connection reconnects, resume_session_stream replays from "0-0" and the replayed UIMessage objects get new IDs from useChat, bypassing the adjacent-only content dedup. Switch deduplicateMessages to track all seen role+context+content fingerprints globally, scoped by the preceding user message to avoid false positives when the assistant legitimately gives identical answers to different prompts.	2026-04-13 08:03:51 +00:00
majdyz	e0ddb7d4d4	Merge branch 'feat/enhanced-cost-dashboard' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:03:15 +00:00
majdyz	d8d0f752b5	Merge branch 'feat/builder-chat-panel' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:02:58 +00:00
majdyz	c64d5a9c92	Merge branch 'perf/copilot-prompt-caching' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:02:37 +00:00
majdyz	f8bca6f4bc	Merge commit '2cf737dc0508a7753d067ed8425cfc0ef657b29f' into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/copilot/config.py	2026-04-13 08:02:31 +00:00
majdyz	6c21e58d31	Merge branch 'fix/orchestrator-per-iteration-cost' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:50 +00:00
majdyz	895c9a0d29	Merge branch 'feat/copilot-pending-messages' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:45 +00:00
majdyz	84e877e36d	Merge branch 'fix/schedule-agent-cred-setup-ux' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:40 +00:00
majdyz	a504ad6e1e	Merge branch 'chore/sdk-dev-preview-0.1.58-with-proxy' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:33 +00:00
majdyz	ca0c95b593	fix(frontend): add SUBSCRIPTION to CreditTransactionType enum in openapi.json Syncs the OpenAPI spec with the Prisma schema which already includes the SUBSCRIPTION enum value in CreditTransactionType.	2026-04-13 07:13:21 +00:00
majdyz	cbf309c9e4	Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT into feat/copilot-pending-messages	2026-04-13 07:12:49 +00:00
majdyz	6ccb44e0d5	fix(copilot): add 404/429 to route decorator, reformat routes.py, regenerate openapi.json Add responses={404, 429} to the pending endpoint's @router.post decorator so FastAPI auto-generates them in the OpenAPI spec. Previously these were only manually added to openapi.json and the CI schema-check (export + diff) stripped them. Also apply black formatting to the long warning line that was failing the backend lint check.	2026-04-13 07:04:07 +00:00
majdyz	e558c60104	fix(orchestrator): don't propagate non-billing charge errors as tool failures Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were re-raised and caught by the outer generic handler, incorrectly marking a successful tool execution as failed. This could cause the LLM to retry side-effectful operations. Now logs the error and continues to the success path since the tool itself completed successfully.	2026-04-13 07:02:10 +00:00
majdyz	fbad856538	fix(backend/copilot): relax schedule race test assertion for setup_test_data fixture The setup_test_data fixture creates a graph with credentials already embedded in node defaults. The DB-stored credential schema may not surface these as "missing" in build_missing_credentials_from_graph, so assert the key exists rather than asserting non-empty count.	2026-04-13 06:59:18 +00:00
majdyz	3ebfa3d68b	fix(backend/copilot): address round-6 review — DRY validation handler, improve tests - Extract duplicated GraphValidationError handler from _run_agent and _schedule_agent into _handle_graph_validation_race helper method - Use generator expressions instead of list comprehension for short-circuit evaluation in _build_setup_requirements_from_validation_error - Improve mixed-error fallback message to be more user-friendly - Add test for empty node_errors={} edge case - Pin expected credential count in firecrawl fixture tests - Add missing_credentials assertion to schedule race E2E test - Add test for extras present with node_errors=None in service_test	2026-04-13 06:45:28 +00:00
majdyz	5ff46ff207	fix(backend): address review feedback on orchestrator billing - Extract post-execution billing into _handle_post_execution_billing() - Deduplicate IBE notification into _try_send_insufficient_funds_notif() - Combine _charge_usage + _handle_low_balance into single thread dispatch - Sanitize error messages to LLM (no internal details leaked) - Default _is_error to True (fail-closed) for tool responses - Add IBE propagation contract to OrchestratorBlock class docstring - Reduce per-site IBE comments to one-liners referencing class docstring - Fix _resolve_block_cost return type annotation (Block \| None) - Move test imports to module level, fix test_default_block_returns_zero - Add tests for non-IBE billing failure and _charge_usage(count=0) - Fix Black formatting (CI lint blocker)	2026-04-13 06:44:20 +00:00
majdyz	2cf737dc05	fix(backend): address review comments on cross-user prompt caching PR - Add TODO(#12747) to _SystemPromptPreset for cleanup tracking - Update docstring to note SDK version and migration path - Add debug logging in _build_system_prompt_value for observability - Document empty-string edge case in docstring - Trim redundant block comment at call site to single line - Add test for empty-string system_prompt with cache enabled - Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var	2026-04-13 06:43:57 +00:00
majdyz	040637dd68	fix: force cost_usd for percentile/histogram queries, fix test + prettier - Backend: always pass tracking_type=None to _build_raw_where for percentile and histogram queries so they compute stats on cost_usd rows regardless of the caller's tracking_type filter. - Frontend test: use getAllByText for "5" which appears in both the Active Users card and the $1-2 bucket count. - Frontend: fix prettier formatting in PlatformCostContent.tsx.	2026-04-13 06:36:59 +00:00
majdyz	90d8ae0ae2	fix(copilot): map non-E2B file tools in permissions and fix lint formatting In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in file tool names (Write, Edit, Read) to their MCP-prefixed equivalents (mcp__copilot__Write, etc.), causing them to be incorrectly filtered out when users configured tool whitelists. Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file, Write->Write, Edit->Edit. Add test coverage for this case. Also fix black formatting in permissions_test.py that was causing CI lint failure.	2026-04-13 06:34:55 +00:00
majdyz	967f0c97c4	fix(copilot): fix black formatting for single-line ValueError raise	2026-04-13 06:29:25 +00:00
majdyz	7dc4319125	fix: correct group_by count in test_passes_filters_to_queries The 6th group_by (total agg no-tracking-type) only runs when tracking_type is set. This test doesn't pass tracking_type, so the expected count is 5, not 6.	2026-04-13 05:28:12 +00:00
majdyz	a8cfe27f6b	fix: use real temp files in CLI path env var tests The path validator rejects non-existent paths, so tests must create real executable temp files via tmp_path instead of hardcoded paths.	2026-04-13 05:28:08 +00:00
majdyz	4cc8ef4409	fix(platform-cost): address PR review — deduplicate filter logic, skip redundant query, improve frontend Backend: - Extract _build_raw_where() helper so raw SQL and Prisma WHERE share filter logic (review item #4 — duplicated filter logic) - Skip redundant total_agg_no_tracking_type_groups query when tracking_type is None since it duplicates total_agg_groups (item #3) - Convert CostBucket from TypedDict to BaseModel for consistency (nit #1) - Replace fragile 8-way positional tuple unpack with indexed list access Frontend: - Make 12 SummaryCards data-driven via a cards config array (item #5) - Use friendlier percentile labels: Typical/Upper/High/Peak Cost (P50/P75/P95/P99) - Update test fixtures with all new dashboard fields (item #1) - Add test assertions for new summary card labels, cost buckets, token values, and user table columns	2026-04-13 05:16:55 +00:00
majdyz	359b7f1b81	fix(copilot): address PR reviewer feedback on CLI path validation and defaults - Reject non-existent and non-file CLI paths at config validation time instead of letting them fail with opaque OS errors at runtime - Add negative test coverage for CLI path validator (non-existent, non-executable, directory paths) - Document breaking default changes (max_turns 1000->50, max_budget $100->$5) in field descriptions with env var override instructions - Narrow broad `except Exception` to `except (ImportError, AttributeError)` in cli_openrouter_compat_test.py	2026-04-13 05:13:56 +00:00
Zamil Majdy	a3b0cea942	fix(frontend/builder): route text parts through MessagePartRenderer Text parts in assistant messages were being rendered as plain <span> elements, bypassing MessagePartRenderer's case "text" handler and parseSpecialMarkers(). This broke styled error/system messages ([ERROR:], [RETRYABLE_ERROR:], [SYSTEM:] markers) and markdown rendering in the builder chat panel. Route all assistant message parts (text and tool) through MessagePartRenderer so parseSpecialMarkers() runs on text content.	2026-04-13 04:42:18 +00:00
majdyz	ae1600a99d	fix(copilot): rename SDK read_tool_result tool and fix path leak in error message - Rename `_READ_TOOL_NAME` from `"Read"` to `"read_tool_result"` so the LLM can distinguish it from `read_file` (working-directory tool). The new name plus an updated description make its narrow scope (tool-results/ paths and workspace:// URIs) unambiguous. - Fix path leak in `_read_file_handler`: use `os.path.basename(file_path)` in the "Path not allowed" error, consistent with write/edit handlers. - Update `permissions.py` comment and all `permissions_test.py` assertions to use the new `mcp__copilot__read_tool_result` name.	2026-04-13 04:27:17 +00:00
majdyz	45f96d5769	fix(copilot): wrap baseline turn-start drain in try/except; add 404/429 to OpenAPI spec Baseline turn-start drain_pending_messages was unprotected — a transient Redis error would propagate up and kill the entire turn stream, unlike the already-protected mid-loop and SDK paths. Wrap with try/except + fallback to [] so a Redis hiccup degrades gracefully. Also adds 404 (session not found) and 429 (rate-limit exceeded) response codes to the pending endpoint's OpenAPI spec so TypeScript clients can handle these error paths correctly.	2026-04-13 04:24:29 +00:00
majdyz	5dbbdf9b27	fix(copilot): address round-6 review nits - Remove redundant inner `ChatConfig` import in `_prewarm_cli` — it was already imported at module scope on line 16 (style guide: inner imports only for heavy optional deps) - Correct stale comment in `sdk_compat_test.py`: 2.1.63/2.1.70 pre-date the context-management regression and are OpenRouter-safe without any env var; only 2.1.97+ requires CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 - Update `_assert_no_forbidden_patterns` error message in `cli_openrouter_compat_test.py`: remove the stale "above 0.1.45" ceiling (we've already upgraded to 0.1.58) and point at the correct remediation steps (add to _KNOWN_GOOD_BUNDLED_CLI_VERSIONS after bisect verification) - Plug test coverage gap in `env_test.py`: add `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS == "1"` assertions to three OpenRouter test methods that were missing it (test_strips_trailing_v1, test_strips_trailing_v1_and_slash, test_no_v1_suffix_left_alone) — guards against the env var being accidentally dropped from a code path that the main test didn't exercise	2026-04-13 04:23:54 +00:00
majdyz	e901b64bed	fix(test): fix _handle_low_balance mock signature to accept positional args The gated_processor fixture's fake_low_balance mock used **kwargs, but production code calls _handle_low_balance with positional args via asyncio.to_thread. This caused a silent TypeError caught by the broad except handler, making the handle_low_balance assertion fail (0 calls instead of 1). Updated mock to match the actual method signature.	2026-04-13 04:22:03 +00:00
majdyz	64c3ef45df	chore: apply Prettier formatting to BuilderChatPanel files Three files were flagged by the CI lint/format check — apply prettier --write to bring them into compliance.	2026-04-13 04:15:37 +00:00
majdyz	77ed619613	fix(frontend/builder): add flowID to tool-call effect deps for correct navigation guard	2026-04-13 04:09:05 +00:00
majdyz	626fe17aac	fix(orchestrator): resolve None future on swallowed errors; add missing tests - Move tool_node_stats None guard before node_exec_future.set_result so that when on_node_execution returns None (swallowed by @async_error_logged), the future carries set_exception(RuntimeError) rather than set_result(None), giving the tracking system an accurate error state - Remove redundant `tool_node_stats is not None` check that was dead code after the early-return guard was added - Add explanatory comment in _charge_extra_iterations_sync docstring explaining why the block lookup is intentionally repeated rather than cached from _charge_usage (two separate thread-pool workers, no shared mutable state) - Add assertion to test_on_node_execution_charges_extra_iterations_when_gate_passes verifying _handle_low_balance is called when extra_cost > 0 - Add test_on_node_execution_failed_ibe_sends_notification covering the FAILED + InsufficientBalanceError path in on_node_execution (lines 822-836) that was previously untested	2026-04-13 04:03:08 +00:00
majdyz	3b7e678b97	fix(frontend/builder): address round-5 review comments on BuilderChatPanel - Add type="button" and focus-visible ring to Stop/Send buttons in PanelInput - Add type="button" to Retry button in MessageList and Apply button in ActionList - Fix MessageList to render plain text directly and only pass dynamic-tool parts to MessagePartRenderer (text parts were being misrouted through a tool renderer) - Replace clearGraphSessionCacheForTesting export with _graphSessionCache for tests — avoids leaking test scaffolding into the production bundle - Add toast notification in undo restore when target node was deleted between apply and undo (prevents silent no-op) - Fix misleading test: remove red-herring mockNodes.push from 'no auto-send' test since the guard is isGraphLoaded===false, not the node array - Add truncation-path coverage to helpers.test.ts (MAX_NODES/MAX_EDGES branches) - Add deleted-node undo test to actionApplicators.test.ts	2026-04-13 04:01:42 +00:00

1 2 3 4 5 ...

8622 Commits