AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
majdyz	00a20bdfe6	fix(copilot): deduplicate SSE-replayed messages by content fingerprint When the SSE connection reconnects, resume_session_stream replays from "0-0" and the replayed UIMessage objects get new IDs from useChat, bypassing the adjacent-only content dedup. Switch deduplicateMessages to track all seen role+context+content fingerprints globally, scoped by the preceding user message to avoid false positives when the assistant legitimately gives identical answers to different prompts.	2026-04-13 08:03:51 +00:00
majdyz	e0ddb7d4d4	Merge branch 'feat/enhanced-cost-dashboard' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:03:15 +00:00
majdyz	d8d0f752b5	Merge branch 'feat/builder-chat-panel' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/data/platform_cost_test.py	2026-04-13 08:02:58 +00:00
majdyz	c64d5a9c92	Merge branch 'perf/copilot-prompt-caching' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:02:37 +00:00
majdyz	f8bca6f4bc	Merge commit '2cf737dc0508a7753d067ed8425cfc0ef657b29f' into preview/all-prs # Conflicts: # autogpt_platform/backend/backend/copilot/config.py	2026-04-13 08:02:31 +00:00
majdyz	6c21e58d31	Merge branch 'fix/orchestrator-per-iteration-cost' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:50 +00:00
majdyz	895c9a0d29	Merge branch 'feat/copilot-pending-messages' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:45 +00:00
majdyz	84e877e36d	Merge branch 'fix/schedule-agent-cred-setup-ux' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:40 +00:00
majdyz	a504ad6e1e	Merge branch 'chore/sdk-dev-preview-0.1.58-with-proxy' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs	2026-04-13 08:01:33 +00:00
majdyz	ca0c95b593	fix(frontend): add SUBSCRIPTION to CreditTransactionType enum in openapi.json Syncs the OpenAPI spec with the Prisma schema which already includes the SUBSCRIPTION enum value in CreditTransactionType.	2026-04-13 07:13:21 +00:00
majdyz	cbf309c9e4	Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT into feat/copilot-pending-messages	2026-04-13 07:12:49 +00:00
majdyz	6ccb44e0d5	fix(copilot): add 404/429 to route decorator, reformat routes.py, regenerate openapi.json Add responses={404, 429} to the pending endpoint's @router.post decorator so FastAPI auto-generates them in the OpenAPI spec. Previously these were only manually added to openapi.json and the CI schema-check (export + diff) stripped them. Also apply black formatting to the long warning line that was failing the backend lint check.	2026-04-13 07:04:07 +00:00
majdyz	e558c60104	fix(orchestrator): don't propagate non-billing charge errors as tool failures Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were re-raised and caught by the outer generic handler, incorrectly marking a successful tool execution as failed. This could cause the LLM to retry side-effectful operations. Now logs the error and continues to the success path since the tool itself completed successfully.	2026-04-13 07:02:10 +00:00
majdyz	fbad856538	fix(backend/copilot): relax schedule race test assertion for setup_test_data fixture The setup_test_data fixture creates a graph with credentials already embedded in node defaults. The DB-stored credential schema may not surface these as "missing" in build_missing_credentials_from_graph, so assert the key exists rather than asserting non-empty count.	2026-04-13 06:59:18 +00:00
majdyz	3ebfa3d68b	fix(backend/copilot): address round-6 review — DRY validation handler, improve tests - Extract duplicated GraphValidationError handler from _run_agent and _schedule_agent into _handle_graph_validation_race helper method - Use generator expressions instead of list comprehension for short-circuit evaluation in _build_setup_requirements_from_validation_error - Improve mixed-error fallback message to be more user-friendly - Add test for empty node_errors={} edge case - Pin expected credential count in firecrawl fixture tests - Add missing_credentials assertion to schedule race E2E test - Add test for extras present with node_errors=None in service_test	2026-04-13 06:45:28 +00:00
majdyz	5ff46ff207	fix(backend): address review feedback on orchestrator billing - Extract post-execution billing into _handle_post_execution_billing() - Deduplicate IBE notification into _try_send_insufficient_funds_notif() - Combine _charge_usage + _handle_low_balance into single thread dispatch - Sanitize error messages to LLM (no internal details leaked) - Default _is_error to True (fail-closed) for tool responses - Add IBE propagation contract to OrchestratorBlock class docstring - Reduce per-site IBE comments to one-liners referencing class docstring - Fix _resolve_block_cost return type annotation (Block \| None) - Move test imports to module level, fix test_default_block_returns_zero - Add tests for non-IBE billing failure and _charge_usage(count=0) - Fix Black formatting (CI lint blocker)	2026-04-13 06:44:20 +00:00
majdyz	2cf737dc05	fix(backend): address review comments on cross-user prompt caching PR - Add TODO(#12747) to _SystemPromptPreset for cleanup tracking - Update docstring to note SDK version and migration path - Add debug logging in _build_system_prompt_value for observability - Document empty-string edge case in docstring - Trim redundant block comment at call site to single line - Add test for empty-string system_prompt with cache enabled - Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var	2026-04-13 06:43:57 +00:00
majdyz	040637dd68	fix: force cost_usd for percentile/histogram queries, fix test + prettier - Backend: always pass tracking_type=None to _build_raw_where for percentile and histogram queries so they compute stats on cost_usd rows regardless of the caller's tracking_type filter. - Frontend test: use getAllByText for "5" which appears in both the Active Users card and the $1-2 bucket count. - Frontend: fix prettier formatting in PlatformCostContent.tsx.	2026-04-13 06:36:59 +00:00
majdyz	90d8ae0ae2	fix(copilot): map non-E2B file tools in permissions and fix lint formatting In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in file tool names (Write, Edit, Read) to their MCP-prefixed equivalents (mcp__copilot__Write, etc.), causing them to be incorrectly filtered out when users configured tool whitelists. Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file, Write->Write, Edit->Edit. Add test coverage for this case. Also fix black formatting in permissions_test.py that was causing CI lint failure.	2026-04-13 06:34:55 +00:00
majdyz	967f0c97c4	fix(copilot): fix black formatting for single-line ValueError raise	2026-04-13 06:29:25 +00:00
majdyz	7dc4319125	fix: correct group_by count in test_passes_filters_to_queries The 6th group_by (total agg no-tracking-type) only runs when tracking_type is set. This test doesn't pass tracking_type, so the expected count is 5, not 6.	2026-04-13 05:28:12 +00:00
majdyz	a8cfe27f6b	fix: use real temp files in CLI path env var tests The path validator rejects non-existent paths, so tests must create real executable temp files via tmp_path instead of hardcoded paths.	2026-04-13 05:28:08 +00:00
majdyz	4cc8ef4409	fix(platform-cost): address PR review — deduplicate filter logic, skip redundant query, improve frontend Backend: - Extract _build_raw_where() helper so raw SQL and Prisma WHERE share filter logic (review item #4 — duplicated filter logic) - Skip redundant total_agg_no_tracking_type_groups query when tracking_type is None since it duplicates total_agg_groups (item #3) - Convert CostBucket from TypedDict to BaseModel for consistency (nit #1) - Replace fragile 8-way positional tuple unpack with indexed list access Frontend: - Make 12 SummaryCards data-driven via a cards config array (item #5) - Use friendlier percentile labels: Typical/Upper/High/Peak Cost (P50/P75/P95/P99) - Update test fixtures with all new dashboard fields (item #1) - Add test assertions for new summary card labels, cost buckets, token values, and user table columns	2026-04-13 05:16:55 +00:00
majdyz	359b7f1b81	fix(copilot): address PR reviewer feedback on CLI path validation and defaults - Reject non-existent and non-file CLI paths at config validation time instead of letting them fail with opaque OS errors at runtime - Add negative test coverage for CLI path validator (non-existent, non-executable, directory paths) - Document breaking default changes (max_turns 1000->50, max_budget $100->$5) in field descriptions with env var override instructions - Narrow broad `except Exception` to `except (ImportError, AttributeError)` in cli_openrouter_compat_test.py	2026-04-13 05:13:56 +00:00
Zamil Majdy	a3b0cea942	fix(frontend/builder): route text parts through MessagePartRenderer Text parts in assistant messages were being rendered as plain <span> elements, bypassing MessagePartRenderer's case "text" handler and parseSpecialMarkers(). This broke styled error/system messages ([ERROR:], [RETRYABLE_ERROR:], [SYSTEM:] markers) and markdown rendering in the builder chat panel. Route all assistant message parts (text and tool) through MessagePartRenderer so parseSpecialMarkers() runs on text content.	2026-04-13 04:42:18 +00:00
majdyz	ae1600a99d	fix(copilot): rename SDK read_tool_result tool and fix path leak in error message - Rename `_READ_TOOL_NAME` from `"Read"` to `"read_tool_result"` so the LLM can distinguish it from `read_file` (working-directory tool). The new name plus an updated description make its narrow scope (tool-results/ paths and workspace:// URIs) unambiguous. - Fix path leak in `_read_file_handler`: use `os.path.basename(file_path)` in the "Path not allowed" error, consistent with write/edit handlers. - Update `permissions.py` comment and all `permissions_test.py` assertions to use the new `mcp__copilot__read_tool_result` name.	2026-04-13 04:27:17 +00:00
majdyz	45f96d5769	fix(copilot): wrap baseline turn-start drain in try/except; add 404/429 to OpenAPI spec Baseline turn-start drain_pending_messages was unprotected — a transient Redis error would propagate up and kill the entire turn stream, unlike the already-protected mid-loop and SDK paths. Wrap with try/except + fallback to [] so a Redis hiccup degrades gracefully. Also adds 404 (session not found) and 429 (rate-limit exceeded) response codes to the pending endpoint's OpenAPI spec so TypeScript clients can handle these error paths correctly.	2026-04-13 04:24:29 +00:00
majdyz	5dbbdf9b27	fix(copilot): address round-6 review nits - Remove redundant inner `ChatConfig` import in `_prewarm_cli` — it was already imported at module scope on line 16 (style guide: inner imports only for heavy optional deps) - Correct stale comment in `sdk_compat_test.py`: 2.1.63/2.1.70 pre-date the context-management regression and are OpenRouter-safe without any env var; only 2.1.97+ requires CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 - Update `_assert_no_forbidden_patterns` error message in `cli_openrouter_compat_test.py`: remove the stale "above 0.1.45" ceiling (we've already upgraded to 0.1.58) and point at the correct remediation steps (add to _KNOWN_GOOD_BUNDLED_CLI_VERSIONS after bisect verification) - Plug test coverage gap in `env_test.py`: add `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS == "1"` assertions to three OpenRouter test methods that were missing it (test_strips_trailing_v1, test_strips_trailing_v1_and_slash, test_no_v1_suffix_left_alone) — guards against the env var being accidentally dropped from a code path that the main test didn't exercise	2026-04-13 04:23:54 +00:00
majdyz	e901b64bed	fix(test): fix _handle_low_balance mock signature to accept positional args The gated_processor fixture's fake_low_balance mock used **kwargs, but production code calls _handle_low_balance with positional args via asyncio.to_thread. This caused a silent TypeError caught by the broad except handler, making the handle_low_balance assertion fail (0 calls instead of 1). Updated mock to match the actual method signature.	2026-04-13 04:22:03 +00:00
majdyz	64c3ef45df	chore: apply Prettier formatting to BuilderChatPanel files Three files were flagged by the CI lint/format check — apply prettier --write to bring them into compliance.	2026-04-13 04:15:37 +00:00
majdyz	77ed619613	fix(frontend/builder): add flowID to tool-call effect deps for correct navigation guard	2026-04-13 04:09:05 +00:00
majdyz	626fe17aac	fix(orchestrator): resolve None future on swallowed errors; add missing tests - Move tool_node_stats None guard before node_exec_future.set_result so that when on_node_execution returns None (swallowed by @async_error_logged), the future carries set_exception(RuntimeError) rather than set_result(None), giving the tracking system an accurate error state - Remove redundant `tool_node_stats is not None` check that was dead code after the early-return guard was added - Add explanatory comment in _charge_extra_iterations_sync docstring explaining why the block lookup is intentionally repeated rather than cached from _charge_usage (two separate thread-pool workers, no shared mutable state) - Add assertion to test_on_node_execution_charges_extra_iterations_when_gate_passes verifying _handle_low_balance is called when extra_cost > 0 - Add test_on_node_execution_failed_ibe_sends_notification covering the FAILED + InsufficientBalanceError path in on_node_execution (lines 822-836) that was previously untested	2026-04-13 04:03:08 +00:00
majdyz	3b7e678b97	fix(frontend/builder): address round-5 review comments on BuilderChatPanel - Add type="button" and focus-visible ring to Stop/Send buttons in PanelInput - Add type="button" to Retry button in MessageList and Apply button in ActionList - Fix MessageList to render plain text directly and only pass dynamic-tool parts to MessagePartRenderer (text parts were being misrouted through a tool renderer) - Replace clearGraphSessionCacheForTesting export with _graphSessionCache for tests — avoids leaking test scaffolding into the production bundle - Add toast notification in undo restore when target node was deleted between apply and undo (prevents silent no-op) - Fix misleading test: remove red-herring mockNodes.push from 'no auto-send' test since the guard is isGraphLoaded===false, not the node array - Add truncation-path coverage to helpers.test.ts (MAX_NODES/MAX_EDGES branches) - Add deleted-node undo test to actionApplicators.test.ts	2026-04-13 04:01:42 +00:00
majdyz	c51471a9df	fix(platform-cost): replace non-null assertion with nullish coalesce, add token total test assertions, add bucket skeleton - UserTable: replace `cost_bearing_request_count!` non-null assertion with `?? 1` nullish coalesce — eliminates the TypeScript anti-pattern and guards against a theoretical divide-by-zero if the guard is refactored - platform_cost_test: add assertions for `total_input_tokens` and `total_output_tokens` in test_returns_dashboard_with_data to cover the "Total Tokens" summary card computation path - PlatformCostContent: add a h-32 skeleton placeholder for the cost-bucket histogram section so the loading state reflects the loaded layout more closely and reduces CLS	2026-04-13 04:00:25 +00:00
majdyz	10980f3799	fix(copilot): wrap SDK turn-start drain in try/except, deduplicate format calls, elevate context length constants - sdk/service.py: wrap drain_pending_messages at turn start in try/except; a transient Redis error no longer kills the entire turn (baseline mid-loop drain was already protected, SDK was missed in round 5) - baseline/service.py: pre-compute format_pending_as_user_message content once per drained message and reuse it for both session.messages and transcript_builder — eliminates the redundant second call per message - routes.py: move _URL_LIMIT/_CONTENT_LIMIT out of the validator body into module-level _CONTEXT_URL_MAX_LENGTH/_CONTEXT_CONTENT_MAX_LENGTH so the contract limits are visible to tooling without reading the implementation	2026-04-13 03:57:54 +00:00
majdyz	4ea5cd5f7f	fix(backend/copilot): address round-5 review comments - Add "do NOT redirect to the Builder for credential setup" guardrail to run_agent description, making it symmetric with create_agent/edit_agent - Scrub error message text from race-path warning logs; log only node IDs and field names to avoid leaking credential IDs/provider details - Add code comment explaining the None-vs-filtering trade-off in _build_setup_requirements_from_validation_error - Add E2E tests for structural-error fallback on both run and schedule paths (verifies ErrorResponse returned, not setup_requirements card)	2026-04-13 03:54:31 +00:00
majdyz	e0d5047974	test(copilot): plug two test coverage gaps found in round-5 review - env_test: add missing CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS assertion to test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key (the other three build_sdk_env test cases already assert it; this case was the only one that didn't, leaving the env-var injection unverified for the openrouter_active=False / no-key path) - sdk_compat_test: add test_sdk_exposes_max_thinking_tokens_option parallel to the existing test_sdk_exposes_cli_path_option — guards against a future SDK rename/removal of max_thinking_tokens silently disabling the Opus thinking-token cost cap	2026-04-13 03:53:41 +00:00
majdyz	ac0d939dd2	fix(copilot): address round-5 review — path leaks, Read partial truncation, concurrent edit test - Use `file_path` (caller-supplied) instead of `resolved` in Write/Edit success messages to avoid leaking `/tmp/copilot-<session>/...` to the LLM - Add partial-truncation guard to `_read_file_handler` (MCP `Read` tool): when `offset`/`limit` are present but `file_path` is missing, return a specific truncation message instead of the generic `file_path is required` - Add `TestConcurrentEditLocking` test that uses `asyncio.gather` to verify two parallel Edit calls on the same file are serialised by `_edit_locks` - Add `autouse` fixture `_clear_edit_locks` to prevent module-level dict from bleeding between test runs	2026-04-13 03:53:17 +00:00
majdyz	929718768a	fix(platform-cost): avg stats use unfiltered agg to stay nonzero when tracking_type filtered When a caller filters the dashboard by tracking_type='tokens', total_agg_groups only contains tokens rows so cost_bearing_requests=0 and avg_cost_microdollars_per_request silently returned 0.0. Symmetrically, filtering by cost_usd gave zero token averages. Add a parallel total_agg_no_tracking_type_groups query (using where_no_tracking_type, mirroring the fix already applied to by_user_tracking_groups) and derive avg_cost_total, avg_input_total, avg_output_total, cost_bearing_requests, and token_bearing_requests from that unfiltered aggregate. The displayed grand totals (total_cost, total_requests, total_input_tokens) remain scoped to the active filter. Also adds test_global_avg_cost_nonzero_when_filtering_by_tokens to cover this case.	2026-04-13 03:42:18 +00:00
majdyz	34832ca70c	test(backend): compat-test the exact preset dict sent to ClaudeAgentOptions The existing compat test for SystemPromptPreset omitted exclude_dynamic_sections, diverging from the actual dict _build_system_prompt_value produces. The new test calls the production helper directly and passes its output through ClaudeAgentOptions, so any SDK version that rejects the extra key is caught at test time.	2026-04-13 03:34:19 +00:00
majdyz	4cd955c758	test: add tests for cost_bearing_request_count fix and tracking_type filter isolation Two new tests in TestGetPlatformCostDashboard: 1. test_cost_bearing_request_count_nonzero_when_filtering_by_tokens: verifies that cost_bearing_request_count per user is correct even when the main tracking_type filter is 'tokens' (regression guard for the bug where by_user_tracking_groups used the filtered where-clause). 2. test_user_tracking_groups_excludes_tracking_type_filter: verifies that the 3rd group_by call (by_user_tracking_groups) does NOT receive a trackingType constraint while the 1st call (by_provider) does.	2026-04-13 03:14:27 +00:00
majdyz	b7f1173cc4	fix: cost_bearing_request_count always 0 when filtering by non-cost_usd tracking type When the caller filters the main view by e.g. tracking_type=tokens, the by_user_tracking_groups query was also filtered, excluding all cost_usd rows and making cost_bearing_request_count zero for every user. Use a separate where_no_tracking_type filter (omitting tracking_type) for this sub-query so cost_usd rows are always present for correct per-user avg cost denominators.	2026-04-13 02:58:08 +00:00
majdyz	88994a62ab	fix(openapi): restore original formatting, insert CostBucket in alphabetical position	2026-04-13 02:44:44 +00:00
majdyz	bd7db8ff03	fix(openapi): move CostBucket schema to alphabetical position, fix cost_buckets field order	2026-04-13 02:37:35 +00:00
majdyz	8babdfe12f	fix(frontend-test): use getAllByText for Known Cost which appears in both card and table header Co-Authored-By:	2026-04-13 02:26:18 +00:00
majdyz	91882be590	fix(type-check): construct CostBucket TypedDict instances to satisfy Pyright Pyright rejects `list[dict[str, Unknown]]` being passed as `list[CostBucket]` because list is invariant. Constructing CostBucket instances explicitly satisfies the type checker across Python 3.11/3.12/3.13.	2026-04-13 02:18:56 +00:00
majdyz	639b69b9d9	fix(api-types): add CostBucket as named schema; fix generated TS model path - Add CostBucket to openapi.json components/schemas so orval generates a costBucket.ts file instead of an inline anonymous type - Use \$ref in cost_buckets items array for proper orval code generation - Create costBucket.ts generated model; update platformCostDashboard.ts to import from it instead of defining CostBucket inline - Update PlatformCostContent.tsx import to use costBucket directly	2026-04-13 02:17:57 +00:00
majdyz	33ff46e96a	style(frontend): apply prettier formatting to PlatformCostContent and openapi.json	2026-04-13 02:15:58 +00:00
majdyz	fbb93e2ddf	fix(frontend-test): update renders empty dashboard assertion for 7 zero-cost cards The PR added 5 new cost summary cards (Avg Cost, P50, P75, P95, P99) that also display \$0.0000 when empty, so the test assertion needed to change from 2 to 7 matching elements.	2026-04-13 02:15:28 +00:00
majdyz	187b4596e0	fix(platform-cost): fix per-user avg cost denominator, NULL bucket, tracking_type filter gap - Add `cost_bearing_request_count` to `UserCostSummary` via a new group-by-(userId,trackingType) query; `UserTable` now divides by this count instead of the mixed `request_count`, eliminating denominator dilution for users with both tokens and cost_usd rows - Guard histogram CASE against NULL costMicrodollars (NULL < N → unknown falls to ELSE '$10+'); add `AND "costMicrodollars" IS NOT NULL` to the histogram WHERE so NULL rows are excluded instead of bucketed - Respect the `tracking_type` dashboard filter in raw SQL percentile and bucket queries; previously the filter was hardcoded to 'cost_usd' even when the caller passed tracking_type='tokens', making those queries return inconsistent data relative to the ORM queries - Add p75 and p99 assertions to test_returns_dashboard_with_data - Update openapi.json and generated TS model for new field	2026-04-13 02:10:40 +00:00

1 2 3 4 5 ...

8605 Commits