Commit Graph

8605 Commits

Author SHA1 Message Date
majdyz
00a20bdfe6 fix(copilot): deduplicate SSE-replayed messages by content fingerprint
When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
2026-04-13 08:03:51 +00:00
majdyz
e0ddb7d4d4 Merge branch 'feat/enhanced-cost-dashboard' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs
# Conflicts:
#	autogpt_platform/backend/backend/data/platform_cost_test.py
2026-04-13 08:03:15 +00:00
majdyz
d8d0f752b5 Merge branch 'feat/builder-chat-panel' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs
# Conflicts:
#	autogpt_platform/backend/backend/data/platform_cost_test.py
2026-04-13 08:02:58 +00:00
majdyz
c64d5a9c92 Merge branch 'perf/copilot-prompt-caching' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs 2026-04-13 08:02:37 +00:00
majdyz
f8bca6f4bc Merge commit '2cf737dc0508a7753d067ed8425cfc0ef657b29f' into preview/all-prs
# Conflicts:
#	autogpt_platform/backend/backend/copilot/config.py
2026-04-13 08:02:31 +00:00
majdyz
6c21e58d31 Merge branch 'fix/orchestrator-per-iteration-cost' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs 2026-04-13 08:01:50 +00:00
majdyz
895c9a0d29 Merge branch 'feat/copilot-pending-messages' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs 2026-04-13 08:01:45 +00:00
majdyz
84e877e36d Merge branch 'fix/schedule-agent-cred-setup-ux' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs 2026-04-13 08:01:40 +00:00
majdyz
a504ad6e1e Merge branch 'chore/sdk-dev-preview-0.1.58-with-proxy' of https://github.com/Significant-Gravitas/AutoGPT into preview/all-prs 2026-04-13 08:01:33 +00:00
majdyz
ca0c95b593 fix(frontend): add SUBSCRIPTION to CreditTransactionType enum in openapi.json
Syncs the OpenAPI spec with the Prisma schema which already includes the
SUBSCRIPTION enum value in CreditTransactionType.
2026-04-13 07:13:21 +00:00
majdyz
cbf309c9e4 Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT into feat/copilot-pending-messages 2026-04-13 07:12:49 +00:00
majdyz
6ccb44e0d5 fix(copilot): add 404/429 to route decorator, reformat routes.py, regenerate openapi.json
Add responses={404, 429} to the pending endpoint's @router.post decorator
so FastAPI auto-generates them in the OpenAPI spec. Previously these were
only manually added to openapi.json and the CI schema-check (export +
diff) stripped them. Also apply black formatting to the long warning
line that was failing the backend lint check.
2026-04-13 07:04:07 +00:00
majdyz
e558c60104 fix(orchestrator): don't propagate non-billing charge errors as tool failures
Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were
re-raised and caught by the outer generic handler, incorrectly marking
a successful tool execution as failed. This could cause the LLM to
retry side-effectful operations. Now logs the error and continues to
the success path since the tool itself completed successfully.
2026-04-13 07:02:10 +00:00
majdyz
fbad856538 fix(backend/copilot): relax schedule race test assertion for setup_test_data fixture
The setup_test_data fixture creates a graph with credentials already
embedded in node defaults. The DB-stored credential schema may not
surface these as "missing" in build_missing_credentials_from_graph,
so assert the key exists rather than asserting non-empty count.
2026-04-13 06:59:18 +00:00
majdyz
3ebfa3d68b fix(backend/copilot): address round-6 review — DRY validation handler, improve tests
- Extract duplicated GraphValidationError handler from _run_agent and
  _schedule_agent into _handle_graph_validation_race helper method
- Use generator expressions instead of list comprehension for
  short-circuit evaluation in _build_setup_requirements_from_validation_error
- Improve mixed-error fallback message to be more user-friendly
- Add test for empty node_errors={} edge case
- Pin expected credential count in firecrawl fixture tests
- Add missing_credentials assertion to schedule race E2E test
- Add test for extras present with node_errors=None in service_test
2026-04-13 06:45:28 +00:00
majdyz
5ff46ff207 fix(backend): address review feedback on orchestrator billing
- Extract post-execution billing into _handle_post_execution_billing()
- Deduplicate IBE notification into _try_send_insufficient_funds_notif()
- Combine _charge_usage + _handle_low_balance into single thread dispatch
- Sanitize error messages to LLM (no internal details leaked)
- Default _is_error to True (fail-closed) for tool responses
- Add IBE propagation contract to OrchestratorBlock class docstring
- Reduce per-site IBE comments to one-liners referencing class docstring
- Fix _resolve_block_cost return type annotation (Block | None)
- Move test imports to module level, fix test_default_block_returns_zero
- Add tests for non-IBE billing failure and _charge_usage(count=0)
- Fix Black formatting (CI lint blocker)
2026-04-13 06:44:20 +00:00
majdyz
2cf737dc05 fix(backend): address review comments on cross-user prompt caching PR
- Add TODO(#12747) to _SystemPromptPreset for cleanup tracking
- Update docstring to note SDK version and migration path
- Add debug logging in _build_system_prompt_value for observability
- Document empty-string edge case in docstring
- Trim redundant block comment at call site to single line
- Add test for empty-string system_prompt with cache enabled
- Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var
2026-04-13 06:43:57 +00:00
majdyz
040637dd68 fix: force cost_usd for percentile/histogram queries, fix test + prettier
- Backend: always pass tracking_type=None to _build_raw_where for
  percentile and histogram queries so they compute stats on cost_usd
  rows regardless of the caller's tracking_type filter.
- Frontend test: use getAllByText for "5" which appears in both the
  Active Users card and the $1-2 bucket count.
- Frontend: fix prettier formatting in PlatformCostContent.tsx.
2026-04-13 06:36:59 +00:00
majdyz
90d8ae0ae2 fix(copilot): map non-E2B file tools in permissions and fix lint formatting
In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in
file tool names (Write, Edit, Read) to their MCP-prefixed equivalents
(mcp__copilot__Write, etc.), causing them to be incorrectly filtered out
when users configured tool whitelists.

Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file,
Write->Write, Edit->Edit. Add test coverage for this case.

Also fix black formatting in permissions_test.py that was causing CI lint
failure.
2026-04-13 06:34:55 +00:00
majdyz
967f0c97c4 fix(copilot): fix black formatting for single-line ValueError raise 2026-04-13 06:29:25 +00:00
majdyz
7dc4319125 fix: correct group_by count in test_passes_filters_to_queries
The 6th group_by (total agg no-tracking-type) only runs when
tracking_type is set. This test doesn't pass tracking_type, so the
expected count is 5, not 6.
2026-04-13 05:28:12 +00:00
majdyz
a8cfe27f6b fix: use real temp files in CLI path env var tests
The path validator rejects non-existent paths, so tests must create
real executable temp files via tmp_path instead of hardcoded paths.
2026-04-13 05:28:08 +00:00
majdyz
4cc8ef4409 fix(platform-cost): address PR review — deduplicate filter logic, skip redundant query, improve frontend
Backend:
- Extract _build_raw_where() helper so raw SQL and Prisma WHERE share
  filter logic (review item #4 — duplicated filter logic)
- Skip redundant total_agg_no_tracking_type_groups query when
  tracking_type is None since it duplicates total_agg_groups (item #3)
- Convert CostBucket from TypedDict to BaseModel for consistency (nit #1)
- Replace fragile 8-way positional tuple unpack with indexed list access

Frontend:
- Make 12 SummaryCards data-driven via a cards config array (item #5)
- Use friendlier percentile labels: Typical/Upper/High/Peak Cost (P50/P75/P95/P99)
- Update test fixtures with all new dashboard fields (item #1)
- Add test assertions for new summary card labels, cost buckets, token
  values, and user table columns
2026-04-13 05:16:55 +00:00
majdyz
359b7f1b81 fix(copilot): address PR reviewer feedback on CLI path validation and defaults
- Reject non-existent and non-file CLI paths at config validation time
  instead of letting them fail with opaque OS errors at runtime
- Add negative test coverage for CLI path validator (non-existent,
  non-executable, directory paths)
- Document breaking default changes (max_turns 1000->50, max_budget
  $100->$5) in field descriptions with env var override instructions
- Narrow broad `except Exception` to `except (ImportError, AttributeError)`
  in cli_openrouter_compat_test.py
2026-04-13 05:13:56 +00:00
Zamil Majdy
a3b0cea942 fix(frontend/builder): route text parts through MessagePartRenderer
Text parts in assistant messages were being rendered as plain <span>
elements, bypassing MessagePartRenderer's case "text" handler and
parseSpecialMarkers(). This broke styled error/system messages
([ERROR:], [RETRYABLE_ERROR:], [SYSTEM:] markers) and markdown
rendering in the builder chat panel.

Route all assistant message parts (text and tool) through
MessagePartRenderer so parseSpecialMarkers() runs on text content.
2026-04-13 04:42:18 +00:00
majdyz
ae1600a99d fix(copilot): rename SDK read_tool_result tool and fix path leak in error message
- Rename `_READ_TOOL_NAME` from `"Read"` to `"read_tool_result"` so the LLM
  can distinguish it from `read_file` (working-directory tool).  The new name
  plus an updated description make its narrow scope (tool-results/ paths and
  workspace:// URIs) unambiguous.
- Fix path leak in `_read_file_handler`: use `os.path.basename(file_path)` in
  the "Path not allowed" error, consistent with write/edit handlers.
- Update `permissions.py` comment and all `permissions_test.py` assertions to
  use the new `mcp__copilot__read_tool_result` name.
2026-04-13 04:27:17 +00:00
majdyz
45f96d5769 fix(copilot): wrap baseline turn-start drain in try/except; add 404/429 to OpenAPI spec
Baseline turn-start drain_pending_messages was unprotected — a transient
Redis error would propagate up and kill the entire turn stream, unlike the
already-protected mid-loop and SDK paths. Wrap with try/except + fallback
to [] so a Redis hiccup degrades gracefully.

Also adds 404 (session not found) and 429 (rate-limit exceeded) response
codes to the pending endpoint's OpenAPI spec so TypeScript clients can
handle these error paths correctly.
2026-04-13 04:24:29 +00:00
majdyz
5dbbdf9b27 fix(copilot): address round-6 review nits
- Remove redundant inner `ChatConfig` import in `_prewarm_cli` — it was
  already imported at module scope on line 16 (style guide: inner imports
  only for heavy optional deps)
- Correct stale comment in `sdk_compat_test.py`: 2.1.63/2.1.70 pre-date
  the context-management regression and are OpenRouter-safe without any
  env var; only 2.1.97+ requires CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
- Update `_assert_no_forbidden_patterns` error message in
  `cli_openrouter_compat_test.py`: remove the stale "above 0.1.45" ceiling
  (we've already upgraded to 0.1.58) and point at the correct remediation
  steps (add to _KNOWN_GOOD_BUNDLED_CLI_VERSIONS after bisect verification)
- Plug test coverage gap in `env_test.py`: add
  `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS == "1"` assertions to three
  OpenRouter test methods that were missing it
  (test_strips_trailing_v1, test_strips_trailing_v1_and_slash,
  test_no_v1_suffix_left_alone) — guards against the env var being
  accidentally dropped from a code path that the main test didn't exercise
2026-04-13 04:23:54 +00:00
majdyz
e901b64bed fix(test): fix _handle_low_balance mock signature to accept positional args
The gated_processor fixture's fake_low_balance mock used **kwargs, but
production code calls _handle_low_balance with positional args via
asyncio.to_thread. This caused a silent TypeError caught by the broad
except handler, making the handle_low_balance assertion fail (0 calls
instead of 1). Updated mock to match the actual method signature.
2026-04-13 04:22:03 +00:00
majdyz
64c3ef45df chore: apply Prettier formatting to BuilderChatPanel files
Three files were flagged by the CI lint/format check — apply prettier
--write to bring them into compliance.
2026-04-13 04:15:37 +00:00
majdyz
77ed619613 fix(frontend/builder): add flowID to tool-call effect deps for correct navigation guard 2026-04-13 04:09:05 +00:00
majdyz
626fe17aac fix(orchestrator): resolve None future on swallowed errors; add missing tests
- Move tool_node_stats None guard before node_exec_future.set_result so
  that when on_node_execution returns None (swallowed by @async_error_logged),
  the future carries set_exception(RuntimeError) rather than set_result(None),
  giving the tracking system an accurate error state
- Remove redundant `tool_node_stats is not None` check that was dead code
  after the early-return guard was added
- Add explanatory comment in _charge_extra_iterations_sync docstring explaining
  why the block lookup is intentionally repeated rather than cached from
  _charge_usage (two separate thread-pool workers, no shared mutable state)
- Add assertion to test_on_node_execution_charges_extra_iterations_when_gate_passes
  verifying _handle_low_balance is called when extra_cost > 0
- Add test_on_node_execution_failed_ibe_sends_notification covering the
  FAILED + InsufficientBalanceError path in on_node_execution (lines 822-836)
  that was previously untested
2026-04-13 04:03:08 +00:00
majdyz
3b7e678b97 fix(frontend/builder): address round-5 review comments on BuilderChatPanel
- Add type="button" and focus-visible ring to Stop/Send buttons in PanelInput
- Add type="button" to Retry button in MessageList and Apply button in ActionList
- Fix MessageList to render plain text directly and only pass dynamic-tool parts
  to MessagePartRenderer (text parts were being misrouted through a tool renderer)
- Replace clearGraphSessionCacheForTesting export with _graphSessionCache for
  tests — avoids leaking test scaffolding into the production bundle
- Add toast notification in undo restore when target node was deleted between
  apply and undo (prevents silent no-op)
- Fix misleading test: remove red-herring mockNodes.push from 'no auto-send' test
  since the guard is isGraphLoaded===false, not the node array
- Add truncation-path coverage to helpers.test.ts (MAX_NODES/MAX_EDGES branches)
- Add deleted-node undo test to actionApplicators.test.ts
2026-04-13 04:01:42 +00:00
majdyz
c51471a9df fix(platform-cost): replace non-null assertion with nullish coalesce, add token total test assertions, add bucket skeleton
- UserTable: replace `cost_bearing_request_count!` non-null assertion with
  `?? 1` nullish coalesce — eliminates the TypeScript anti-pattern and
  guards against a theoretical divide-by-zero if the guard is refactored
- platform_cost_test: add assertions for `total_input_tokens` and
  `total_output_tokens` in test_returns_dashboard_with_data to cover the
  "Total Tokens" summary card computation path
- PlatformCostContent: add a h-32 skeleton placeholder for the cost-bucket
  histogram section so the loading state reflects the loaded layout more
  closely and reduces CLS
2026-04-13 04:00:25 +00:00
majdyz
10980f3799 fix(copilot): wrap SDK turn-start drain in try/except, deduplicate format calls, elevate context length constants
- sdk/service.py: wrap drain_pending_messages at turn start in try/except;
  a transient Redis error no longer kills the entire turn (baseline mid-loop
  drain was already protected, SDK was missed in round 5)
- baseline/service.py: pre-compute format_pending_as_user_message content
  once per drained message and reuse it for both session.messages and
  transcript_builder — eliminates the redundant second call per message
- routes.py: move _URL_LIMIT/_CONTENT_LIMIT out of the validator body into
  module-level _CONTEXT_URL_MAX_LENGTH/_CONTEXT_CONTENT_MAX_LENGTH so the
  contract limits are visible to tooling without reading the implementation
2026-04-13 03:57:54 +00:00
majdyz
4ea5cd5f7f fix(backend/copilot): address round-5 review comments
- Add "do NOT redirect to the Builder for credential setup" guardrail to
  run_agent description, making it symmetric with create_agent/edit_agent
- Scrub error message text from race-path warning logs; log only node IDs
  and field names to avoid leaking credential IDs/provider details
- Add code comment explaining the None-vs-filtering trade-off in
  _build_setup_requirements_from_validation_error
- Add E2E tests for structural-error fallback on both run and schedule
  paths (verifies ErrorResponse returned, not setup_requirements card)
2026-04-13 03:54:31 +00:00
majdyz
e0d5047974 test(copilot): plug two test coverage gaps found in round-5 review
- env_test: add missing CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS assertion
  to test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key
  (the other three build_sdk_env test cases already assert it; this case
  was the only one that didn't, leaving the env-var injection unverified
  for the openrouter_active=False / no-key path)

- sdk_compat_test: add test_sdk_exposes_max_thinking_tokens_option
  parallel to the existing test_sdk_exposes_cli_path_option — guards
  against a future SDK rename/removal of max_thinking_tokens silently
  disabling the Opus thinking-token cost cap
2026-04-13 03:53:41 +00:00
majdyz
ac0d939dd2 fix(copilot): address round-5 review — path leaks, Read partial truncation, concurrent edit test
- Use `file_path` (caller-supplied) instead of `resolved` in Write/Edit
  success messages to avoid leaking `/tmp/copilot-<session>/...` to the LLM
- Add partial-truncation guard to `_read_file_handler` (MCP `Read` tool):
  when `offset`/`limit` are present but `file_path` is missing, return a
  specific truncation message instead of the generic `file_path is required`
- Add `TestConcurrentEditLocking` test that uses `asyncio.gather` to verify
  two parallel Edit calls on the same file are serialised by `_edit_locks`
- Add `autouse` fixture `_clear_edit_locks` to prevent module-level dict
  from bleeding between test runs
2026-04-13 03:53:17 +00:00
majdyz
929718768a fix(platform-cost): avg stats use unfiltered agg to stay nonzero when tracking_type filtered
When a caller filters the dashboard by tracking_type='tokens', total_agg_groups
only contains tokens rows so cost_bearing_requests=0 and avg_cost_microdollars_per_request
silently returned 0.0. Symmetrically, filtering by cost_usd gave zero token averages.

Add a parallel total_agg_no_tracking_type_groups query (using where_no_tracking_type,
mirroring the fix already applied to by_user_tracking_groups) and derive avg_cost_total,
avg_input_total, avg_output_total, cost_bearing_requests, and token_bearing_requests
from that unfiltered aggregate. The displayed grand totals (total_cost, total_requests,
total_input_tokens) remain scoped to the active filter.

Also adds test_global_avg_cost_nonzero_when_filtering_by_tokens to cover this case.
2026-04-13 03:42:18 +00:00
majdyz
34832ca70c test(backend): compat-test the exact preset dict sent to ClaudeAgentOptions
The existing compat test for SystemPromptPreset omitted exclude_dynamic_sections,
diverging from the actual dict _build_system_prompt_value produces. The new test
calls the production helper directly and passes its output through ClaudeAgentOptions,
so any SDK version that rejects the extra key is caught at test time.
2026-04-13 03:34:19 +00:00
majdyz
4cd955c758 test: add tests for cost_bearing_request_count fix and tracking_type filter isolation
Two new tests in TestGetPlatformCostDashboard:
1. test_cost_bearing_request_count_nonzero_when_filtering_by_tokens: verifies
   that cost_bearing_request_count per user is correct even when the main
   tracking_type filter is 'tokens' (regression guard for the bug where
   by_user_tracking_groups used the filtered where-clause).
2. test_user_tracking_groups_excludes_tracking_type_filter: verifies that the
   3rd group_by call (by_user_tracking_groups) does NOT receive a trackingType
   constraint while the 1st call (by_provider) does.
2026-04-13 03:14:27 +00:00
majdyz
b7f1173cc4 fix: cost_bearing_request_count always 0 when filtering by non-cost_usd tracking type
When the caller filters the main view by e.g. tracking_type=tokens, the
by_user_tracking_groups query was also filtered, excluding all cost_usd rows
and making cost_bearing_request_count zero for every user. Use a separate
where_no_tracking_type filter (omitting tracking_type) for this sub-query so
cost_usd rows are always present for correct per-user avg cost denominators.
2026-04-13 02:58:08 +00:00
majdyz
88994a62ab fix(openapi): restore original formatting, insert CostBucket in alphabetical position 2026-04-13 02:44:44 +00:00
majdyz
bd7db8ff03 fix(openapi): move CostBucket schema to alphabetical position, fix cost_buckets field order 2026-04-13 02:37:35 +00:00
majdyz
8babdfe12f fix(frontend-test): use getAllByText for Known Cost which appears in both card and table header
Co-Authored-By:
2026-04-13 02:26:18 +00:00
majdyz
91882be590 fix(type-check): construct CostBucket TypedDict instances to satisfy Pyright
Pyright rejects `list[dict[str, Unknown]]` being passed as `list[CostBucket]`
because list is invariant. Constructing CostBucket instances explicitly
satisfies the type checker across Python 3.11/3.12/3.13.
2026-04-13 02:18:56 +00:00
majdyz
639b69b9d9 fix(api-types): add CostBucket as named schema; fix generated TS model path
- Add CostBucket to openapi.json components/schemas so orval generates
  a costBucket.ts file instead of an inline anonymous type
- Use \$ref in cost_buckets items array for proper orval code generation
- Create costBucket.ts generated model; update platformCostDashboard.ts
  to import from it instead of defining CostBucket inline
- Update PlatformCostContent.tsx import to use costBucket directly
2026-04-13 02:17:57 +00:00
majdyz
33ff46e96a style(frontend): apply prettier formatting to PlatformCostContent and openapi.json 2026-04-13 02:15:58 +00:00
majdyz
fbb93e2ddf fix(frontend-test): update renders empty dashboard assertion for 7 zero-cost cards
The PR added 5 new cost summary cards (Avg Cost, P50, P75, P95, P99)
that also display \$0.0000 when empty, so the test assertion needed to
change from 2 to 7 matching elements.
2026-04-13 02:15:28 +00:00
majdyz
187b4596e0 fix(platform-cost): fix per-user avg cost denominator, NULL bucket, tracking_type filter gap
- Add `cost_bearing_request_count` to `UserCostSummary` via a new
  group-by-(userId,trackingType) query; `UserTable` now divides by
  this count instead of the mixed `request_count`, eliminating
  denominator dilution for users with both tokens and cost_usd rows
- Guard histogram CASE against NULL costMicrodollars (NULL < N → unknown
  falls to ELSE '$10+'); add `AND "costMicrodollars" IS NOT NULL` to
  the histogram WHERE so NULL rows are excluded instead of bucketed
- Respect the `tracking_type` dashboard filter in raw SQL percentile
  and bucket queries; previously the filter was hardcoded to 'cost_usd'
  even when the caller passed tracking_type='tokens', making those
  queries return inconsistent data relative to the ORM queries
- Add p75 and p99 assertions to test_returns_dashboard_with_data
- Update openapi.json and generated TS model for new field
2026-04-13 02:10:40 +00:00