AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-08 03:00:28 -04:00

Author	SHA1	Message	Date
Zamil Majdy	79bc0aed91	fix(backend): guard intermediate DB flush with is_final_attempt flag Add is_final_attempt field to _RetryState so intermediate DB flushes only run on attempt 0 (optimistic — most turns succeed first try) and the final retry attempt. Middle retry attempts may be rolled back, and messages already flushed to DB would persist as orphans since the in-memory rollback (session.messages truncation) has no corresponding DB delete.	2026-04-01 06:44:02 +02:00
Zamil Majdy	2fa5c37413	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-search-cap-and-persistence	2026-04-01 06:15:30 +02:00
Zamil Majdy	d7f324bc9f	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-search-cap-and-persistence	2026-03-31 19:07:29 +02:00
Zamil Majdy	ffcb88251a	style: move StreamHeartbeat import to module top-level	2026-03-31 16:34:45 +02:00
Zamil Majdy	f3aef1ecbc	revert(copilot): remove delete_messages_from_sequence The orphan message scenario (intermediate flush persists messages, then stream attempt is retried and rolls back) is practically unreachable: retries only fire on context-too-long with events_yielded==0, meaning the stream barely started and the flush threshold (30s/10 messages) could not have been reached. Removing the delete operation eliminates the risk of accidentally deleting legitimate messages.	2026-03-31 16:14:28 +02:00
Zamil Majdy	053afde64d	revert(copilot): remove circuit breakers and re-enable Task tool The WebSearch/total tool call caps and Task tool disabling were band-aid fixes that limited capability rather than addressing root causes. The real bug fixes (Redis TTL refresh, intermediate DB persistence, orphan message cleanup, StreamHeartbeat) remain in place. Task concurrency limits (max_subtasks) and prompt-level search best practices provide sufficient guardrails without artificially capping tool usage.	2026-03-31 15:59:07 +02:00
Zamil Majdy	687ee1f280	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-search-cap-and-persistence	2026-03-31 15:18:06 +02:00
Zamil Majdy	41a11d74b3	Merge branch 'fix/copilot-search-cap-and-persistence' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-search-cap-and-persistence	2026-03-31 15:12:40 +02:00
Zamil Majdy	4df0714e2a	fix(copilot): clean up orphaned DB messages in _HandledStreamError rollback	2026-03-31 15:12:18 +02:00
Zamil Majdy	b365a3337b	Reapply "fix(copilot): detect truncated write_workspace_file and guide LLM to source_path" This reverts commit `ac49e72745`.	2026-03-31 12:49:10 +00:00
Zamil Majdy	935b59ce43	Merge branch 'fix/copilot-search-cap-and-persistence' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-search-cap-and-persistence	2026-03-31 14:46:03 +02:00
Zamil Majdy	77112e79a2	fix(copilot): clean up orphaned DB messages on stream attempt rollback When intermediate flushes persist messages during a stream attempt that later fails and is retried, the in-memory rollback now also deletes the orphaned messages from the DB via delete_messages_from_sequence. This prevents stale messages from resurfacing on page reload.	2026-03-31 14:45:21 +02:00
Zamil Majdy	ac49e72745	Revert "fix(copilot): detect truncated write_workspace_file and guide LLM to source_path" This reverts commit `8ae344863e`.	2026-03-31 12:45:04 +00:00
Zamil Majdy	8ae344863e	fix(copilot): detect truncated write_workspace_file and guide LLM to source_path When the LLM tries to inline a very large file as content in write_workspace_file, the SDK silently truncates the tool call arguments to {}. The tool then returns a generic 'filename required' error, which the LLM doesn't understand and retries the same way (wasting 500s+ per attempt, as seen in session c465eff9). Now: when ALL parameters are missing (likely truncation), return an actionable error explaining what happened and how to fix it — write the file to disk with bash_exec first, then use source_path to copy it to workspace. This gives the LLM a clear recovery path instead of a retry loop.	2026-03-31 12:43:42 +00:00
Zamil Majdy	cbd3ebce00	feat(copilot): add proactive budget warnings before hitting tool call caps When WebSearch or total tool call usage reaches 80% of the cap, the PreToolUse hook now returns additionalContext warning the model about remaining budget. This lets the model plan its remaining calls instead of hitting a hard denial wall with no prior notice.	2026-03-31 13:36:22 +02:00
Zamil Majdy	2d00268516	fix(copilot): raise total tool call cap to 500 (10x web search limit) 100 total tool calls per turn was too tight for complex autopilot tasks that involve many file reads/writes and sub-agent delegations. Bumping to 500 keeps the circuit breaker effective against runaway loops while giving legitimate long-running turns sufficient headroom.	2026-03-31 12:41:56 +02:00
Zamil Majdy	751382fcff	fix(backend): clean up _meta_ttl_refresh_at on session completion Remove session entries from the module-level _meta_ttl_refresh_at dict when mark_session_completed is called, preventing unbounded memory growth over the lifetime of the backend process.	2026-03-31 09:19:01 +02:00
Zamil Majdy	780c44c051	fix: update Task tests — Task is now in BLOCKED_TOOLS, always denied Task was added to SDK_DISALLOWED_TOOLS so all Task tests now expect denial. Removed concurrency slot tests since they're unreachable when the tool is blocked at the access level.	2026-03-30 17:04:33 +00:00
Zamil Majdy	77eb07c458	style: black formatting for test files	2026-03-30 16:49:41 +00:00
Zamil Majdy	13a2e623a0	test: add tests for Task disallowed, StreamHeartbeat on task_progress, WebSearch denial budget - tool_adapter_test: verify Task, Bash, WebFetch are in SDK_DISALLOWED_TOOLS - response_adapter_test: verify task_progress emits StreamHeartbeat - security_hooks_test: verify denied WebSearches don't consume total tool budget	2026-03-30 16:46:59 +00:00
Zamil Majdy	8d99660ba0	chore: bump WebSearch cap to 50 per turn	2026-03-30 16:38:26 +00:00
Zamil Majdy	bfbec703ce	fix(copilot): disable SDK Task tool, bump search cap to 30 - Disable the SDK built-in Task (sub-agent) tool by adding it to SDK_DISALLOWED_TOOLS. The AutoPilotBlock via run_block is the preferred delegation mechanism — it has full Langfuse observability, unlike the SDK Task tool which runs opaquely. The Task tool was the root cause of the d2f7cba3 incident: it spawned 5 sub-agents with no shared context, each independently hammering WebSearch with overlapping queries. - Bump WebSearch cap from 15 to 30 per turn — less restrictive while still preventing the worst-case runaway. - Update prompt to reflect Task tool is disabled, point to AutoPilotBlock for sub-agent delegation.	2026-03-30 16:34:15 +00:00
Zamil Majdy	8763a94436	fix(copilot): keep Redis stream alive during sub-agent execution During long sub-agent runs, the SDK sends task_progress SystemMessages that were previously silent (no stream chunks produced). This meant publish_chunk was never called during those gaps, causing BOTH the meta key and stream key to expire in Redis. Fix: - response_adapter: emit StreamHeartbeat for task_progress events, so publish_chunk is called even during sub-agent gaps - stream_registry: refresh stream key TTL alongside meta key in the periodic keepalive block (every 60s) This ensures that as long as the SDK is producing any events (including task_progress), both Redis keys stay alive. Confirmed via live reproduction: session d2f7cba3 T13 ran for 1h45min+ with both keys expired because only task_progress events were arriving.	2026-03-30 16:29:07 +00:00
Zamil Majdy	a504fe532a	fix(copilot): refresh Redis session meta TTL during long-running turns Root cause of empty session on reload: the session meta key in Redis has a 1h TTL set once at create_session time. Turns exceeding 1h (like session d2f7cba3 at 82min) cause the meta key to expire, making get_active_session return False. The resume endpoint then returns 204 and the frontend shows an empty session. Fix: publish_chunk now periodically refreshes the meta key TTL (every 60s) when session_id is provided. stream_and_publish already has session_id and passes it through. This keeps the meta key alive for as long as chunks are being published. GCP logs confirmed the bug: at 09:49 (73 min into the turn), GET_SESSION returned active_session=False, msg_count=1 — the meta key had expired 13 minutes earlier.	2026-03-30 16:06:09 +00:00
Zamil Majdy	b3f52ce3b3	revert: remove Perplexity guidance from code supplement Perplexity guidance is managed in the Langfuse prompt (source of truth), not in the code supplement. Reverts the redundant addition from fc13c30. The Langfuse prompt has been updated with stronger, actionable guidance including block ID, model names, and clear trigger (3+ searches).	2026-03-30 12:30:36 +00:00
Zamil Majdy	0ac208603c	feat(copilot): nudge LLM to use Perplexity for deep research over WebSearch For research-heavy tasks (5+ searches), the prompt now directs the LLM to use run_block with PerplexityBlock (sonar-pro) instead of repeated WebSearch calls. Perplexity returns synthesized, cited answers in a single call — avoiding the 29-54s per-call latency of SDK WebSearch and reducing total search count significantly.	2026-03-30 12:15:06 +00:00
Zamil Majdy	57401a9b13	fix: re-enable intermediate flush for all attempts, add rollback note The is_final_attempt guard disabled intermediate flush for the common case (first attempt succeeds, which is 99%+ of turns). Retries only fire on context-too-long errors with events_yielded==0, meaning the stream barely started and flush threshold was almost certainly not reached. Keep flush always enabled and document the theoretical edge case.	2026-03-30 11:53:33 +00:00
Zamil Majdy	df9ae41c25	fix: update prompt to remove 'per session' scope qualifier for web search cap	2026-03-30 11:50:16 +00:00
Zamil Majdy	bfd152dcc7	fix: guard intermediate flush against retry rollback, fix counter scope labels, reorder checks, top-level imports - Guard intermediate DB flush with is_final_attempt flag to prevent persisting messages from attempts that may be rolled back on retry - Fix 'per session' → 'per turn' in comments/docstrings/denial messages since hooks are recreated per stream invocation - Reorder circuit breaker checks: WebSearch cap before total counter increment so denied searches don't consume total budget slots - Move create_security_hooks import to module top-level in tests per CLAUDE.md coding guidelines	2026-03-30 11:48:21 +00:00
Zamil Majdy	0f92e585ab	fix(copilot): add tool call circuit breakers and intermediate persistence - Add WebSearch call cap (15/session) to prevent runaway research loops - Add total tool call cap (100/turn) as hard circuit breaker - Add web search best practices guidance to system prompt - Add intermediate session persistence (every 30s or 10 messages) - Add tests for WebSearch cap and total tool call cap Addresses findings from session d2f7cba3: 179 WebSearch calls, $20.66 cost, 82 minutes for a single user message.	2026-03-30 11:31:17 +00:00

1 changed files with 12 additions and 1 deletions

									
										13

autogpt_platform/backend/backend/copilot/sdk/service.py
									
												View File
												
				@@ -266,6 +266,7 @@ class _RetryState:

				    adapter: SDKResponseAdapter

				    transcript_builder: TranscriptBuilder

				    usage: _TokenUsage

				    is_final_attempt: bool = True

				@dataclass

				@@ -1493,9 +1494,14 @@ async def _run_stream_attempt(

				            # --- Intermediate persistence ---

				            # Flush session messages to DB periodically so page reloads

				            # show progress during long-running turns.

				            # Guarded by is_final_attempt: earlier retry attempts may be

				            # rolled back in memory (session.messages truncated), but

				            # messages already flushed to DB would persist as orphans.

				            # is_final_attempt is True on attempt 0 (optimistic — most

				            # turns succeed on the first try) and on the last retry.

				            _msgs_since_flush += 1

				            now = time.monotonic()

				            if (

				            if state.is_final_attempt and (

				                _msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD

				                or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS

				            ):

				@@ -1986,6 +1992,11 @@ async def stream_chat_completion_sdk(

				        )

				        for attempt in range(_MAX_STREAM_ATTEMPTS):

				            # Enable intermediate DB flushes on attempt 0 (optimistic: most

				            # turns succeed on the first try) and the last attempt.  Middle

				            # retry attempts may be rolled back, and flushed messages would

				            # persist as DB orphans — so flushes are disabled for those.

				            state.is_final_attempt = attempt == 0 or attempt == _MAX_STREAM_ATTEMPTS - 1

				            # Clear any stale stash signal from the previous attempt so

				            # wait_for_stash() doesn't fire prematurely on a leftover event.

				            reset_stash_event()

Compare commits

30 Commits

dev ... fix/copilo

13

autogpt_platform/backend/backend/copilot/sdk/service.py

View File

Compare commits

30 Commits dev ... fix/copilo

13 autogpt_platform/backend/backend/copilot/sdk/service.py Unescape Escape View File

30 Commits

dev ... fix/copilo

13

autogpt_platform/backend/backend/copilot/sdk/service.py

View File