AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-11 23:35:25 -05:00

Author	SHA1	Message	Date
Nicholas Tindle	f56abcef4f	fix(classic): convert mid-conversation system messages to user messages Some LLM providers (notably Anthropic) don't support system messages in the middle of a conversation. Changed ChatMessage.system() to ChatMessage.user() for all mid-conversation context messages across components (action history, context, skills, system clock, todo, error reporting, LATS, and multi-agent debate strategies). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 12:53:54 -06:00
Nicholas Tindle	6210b3259d	fix(classic): ensure user feedback on denied commands reaches the agent do_not_execute() was not calling append_user_feedback(), so feedback from denied commands only appeared as a tool result message which the model often ignored. Now feedback is also surfaced as a prominent [USER FEEDBACK] user message in the next prompt. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 23:00:08 -06:00
Nicholas Tindle	60f506add9	feat(classic): add Agent Skills (SKILL.md) support Implement the open Agent Skills standard for Classic AutoGPT, enabling modular, progressively-loaded capabilities via SKILL.md files. Skills are discovered from workspace (.autogpt/skills) and global (~/.autogpt/skills) directories with three-level progressive disclosure to minimize token usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:26:15 -06:00
Nicholas Tindle	b3f35953ed	feat(classic): add interactive config command to CLI Add a new `config` command that opens a tabbed TUI for browsing and editing AutoGPT settings. The UI allows users to configure settings interactively rather than manually editing .env files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-02 18:04:53 -06:00
Nicholas Tindle	791e1d8982	fix(classic): resolve CI lint, type, and test failures - Fix line-too-long in test_permissions.py docstring - Fix type annotation in validators.py (callable -> Callable) - Add --fresh flag to benchmark tests to prevent state resumption - Exclude direct_benchmark/adapters from pyright (optional deps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 14:31:11 -06:00
Nicholas Tindle	0040636948	fix(permissions): update wildcard handling for command patterns	2026-01-26 12:42:21 -06:00
Nicholas Tindle	c671af851f	feat(classic): add platform_blocks to Agent, enable via PLATFORM_API_KEY - Add PlatformBlocksComponent to Agent as a default component - Component automatically enables when PLATFORM_API_KEY env var is set - Config now uses UserConfigurable for env var support: - PLATFORM_API_KEY (required to enable) - PLATFORM_URL (default: https://platform.agpt.co) - PLATFORM_BLOCKS_ENABLED (default: true) - PLATFORM_TIMEOUT (default: 60) - API key stored as SecretStr for security Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 17:30:24 -06:00
Nicholas Tindle	7dd181f4b0	feat(classic): make CWD the default agent workspace for CLI mode In CLI mode, agents now work directly in the current directory instead of being sandboxed to .autogpt/agents/{id}/workspace/. Agent state files are still stored in .autogpt/agents/{id}/state.json. Server mode retains the original sandboxed behavior for isolation. Changes: - Add workspace_root parameter to FileManagerComponent to detect CLI mode - Update Agent to pass workspace_root when file_storage is rooted at workspace - Adjust save_state paths based on mode (CLI uses .autogpt/ prefix) - Add use_tools field to ActionProposal for parallel tool execution - Support parallel tool execution in Agent.execute() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 15:20:12 -06:00
Nicholas Tindle	114856cef1	refactor(classic): improve prompt strategies with both general and code-specific guidance - SystemComponent: Keep both general constraints (physical objects) and code-specific constraints (don't modify tests, check dependencies, no secrets) - SystemComponent: Keep both general best practices (self-review, reflection) and code-specific best practices (read before modify, mimic style, verify) - LATS: Keep general phase instructions while adding coding task priorities - one_shot: Remove redundant 'text' field from AssistantThoughts, use 'reasoning' - one_shot: Fix intro to clarify when to use ask_user instead of contradicting it - one_shot: Add efficiency guidelines and parallel execution support - Update UI to display reasoning as main thoughts (remove redundant REASONING line) - Update test fixtures to match new AssistantThoughts schema Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 12:27:32 -06:00
Nicholas Tindle	68b9bd0c51	refactor(classic): use platform API for blocks instead of local loading Simplify the platform_blocks component to fetch blocks from the platform API (/api/v1/blocks) instead of loading them locally from the monorepo. This removes the dependency on having the platform backend code available. - Remove loader.py (no longer needed) - Update client.py with list_blocks() method - Simplify component.py to use API for both search and execute - Remove user_id from config (not needed by API) - Update tests for API-based approach Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 12:16:39 -06:00
Nicholas Tindle	ff076b1f15	feat(classic): add platform blocks component for classic agents Add search_blocks and execute_block commands that expose platform blocks to classic agents: - search_blocks: Local search by name, description, or category (fast, offline) - execute_block: Execute via platform API with automatic credential handling The loader automatically discovers the platform backend from the monorepo structure without requiring manual PYTHONPATH configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-21 13:10:57 -06:00
Nicholas Tindle	57fbab500b	feat(classic): add external benchmark adapters for GAIA, SWE-bench, and AgentBench Integrate standard AI agent benchmarks into the direct_benchmark infrastructure using a plugin-based adapter pattern: - Add BenchmarkAdapter base class with setup(), load_challenges(), and evaluate() - Implement GAIAAdapter for the GAIA benchmark (requires HF token) - Implement SWEBenchAdapter for SWE-bench (requires Docker) - Implement AgentBenchAdapter for AgentBench multi-environment benchmark - Extend HarnessConfig with benchmark options (--benchmark, --benchmark-split, etc.) - Modify ParallelExecutor to use adapter's evaluate() for external benchmarks - Fix runner to record finish step (was being skipped, breaking answer extraction) - Add optional benchmarks dependency group with datasets and huggingface-hub - Increase default benchmark timeout to 900s Usage: poetry run direct-benchmark run \ --benchmark agent-bench \ --benchmark-subset dbbench \ --strategies one_shot \ --models claude Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-21 13:06:32 -06:00
Nicholas Tindle	6faabef24d	fix(classic): always recreate Docker containers for code execution Docker containers cannot have their mount bindings updated after creation. When running benchmarks or multiple agent instances, the same container name could be reused with a different workspace directory, causing the container to still reference the OLD mount path. This resulted in "python: can't open file '/workspace/temp*.py'" errors. The fix: remove existing containers before creating new ones to ensure fresh mount bindings to the current workspace directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 23:57:02 -06:00
Nicholas Tindle	a67d475a69	fix(classic): handle parallel tool calls in action history When prompts encourage parallel tool execution and the LLM makes multiple tool calls simultaneously, the Anthropic API requires a tool_result message for EACH tool_use. Previously, we only created one tool result for the first tool call, causing "tool_use ids were found without tool_result blocks" errors. This fix: - Adds _make_result_messages() to create results for ALL tool calls - Maps tool names to their outputs from parallel execution results - Handles errors per-tool from the _errors list - Falls back gracefully when results are missing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 23:18:15 -06:00
Nicholas Tindle	326554d89a	style(classic): update black to 24.10.0 and reformat Update black version to match pre-commit hook (24.10.0) and reformat all files with the new version. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 10:51:54 -06:00
Nicholas Tindle	a4d7b0142f	fix(classic): resolve all pyright type errors - Add missing strategies (lats, multi_agent_debate) to PromptStrategyName - Fix method override signatures for reasoning_effort parameter - Fix Pydantic Field() overload issues with helper function - Fix BeautifulSoup Tag type narrowing in web_fetch.py - Fix Optional member access in playwright_browser.py and rewoo.py - Convert hasattr patterns to getattr for proper type narrowing - Add proper type casts for Literal types - Fix file storage path type conversions - Exclude legacy challenges/ from pyright checking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 10:41:53 -06:00
Nicholas Tindle	7d6375f59c	style(classic): fix flake8 line length issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 01:25:00 -06:00
Nicholas Tindle	b32bfcaac5	chore: remove test.db from tracking	2026-01-20 01:24:00 -06:00
Nicholas Tindle	5373a6eb6e	style(classic): fix code formatting with black Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 01:23:51 -06:00
Nicholas Tindle	98cde46ccb	style(classic): fix import sorting with isort Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 01:23:33 -06:00
Nicholas Tindle	60fdee1345	fix(classic): resolve linting and formatting issues for CI compliance - Update .flake8 config to exclude workspace directories and ignore E203 - Fix import sorting (isort) across multiple files - Fix code formatting (black) across multiple files - Remove unused imports and fix line length issues (flake8) - Fix f-strings without placeholders and unused variables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 01:16:38 -06:00
Nicholas Tindle	6f2783468c	feat(classic): add sub-agent architecture and LATS/multi-agent debate strategies Add comprehensive sub-agent spawning infrastructure that enables prompt strategies to coordinate multiple agents for advanced reasoning patterns. New files: - forge/agent/execution_context.py: ExecutionContext, ResourceBudget, SubAgentHandle, and AgentFactory protocol for sub-agent lifecycle - agent_factory/default_factory.py: DefaultAgentFactory implementation - prompt_strategies/lats.py: Language Agent Tree Search using MCTS with sub-agents for action expansion and evaluation - prompt_strategies/multi_agent_debate.py: Multi-agent debate with proposal, critique, and consensus phases Key changes: - BaseMultiStepPromptStrategy gains spawn_sub_agent(), run_sub_agent(), spawn_and_run(), and run_parallel() methods - Agent class accepts optional ExecutionContext and injects it into strategies - Sub-agents enabled by default (enable_sub_agents=True) - Resource limits: max_depth=5, max_sub_agents=25, max_cycles=25 All 7 strategies now available in benchmark: one_shot, rewoo, plan_execute, reflexion, tree_of_thoughts, lats, multi_agent_debate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 01:01:28 -06:00
Nicholas Tindle	b849eafb7f	feat(direct_benchmark): enable shell command execution with safety denylist Enable agents to execute shell commands during benchmarks by setting execute_local_commands=True and using denylist mode to block dangerous commands (rm, sudo, chmod, kill, etc.) while allowing safe operations. Also adds ExecutePython challenge to test code execution capability. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:52:06 -06:00
Nicholas Tindle	572c3f5e0d	refactor(classic): consolidate Poetry projects into single pyproject.toml Merge forge/, original_autogpt/, and direct_benchmark/ into a single Poetry project to eliminate cross-project path dependency issues. Changes: - Create classic/pyproject.toml with merged dependencies from all three projects - Remove individual pyproject.toml and poetry.lock files from subdirectories - Update all CLAUDE.md files to reflect commands run from classic/ root - Update all README.md files with new installation and usage instructions All packages are now included via the packages directive: - forge/forge (core agent framework) - original_autogpt/autogpt (AutoGPT agent) - direct_benchmark/direct_benchmark (benchmark harness) CLI entry points preserved: autogpt, serve, direct-benchmark Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:49:56 -06:00
Nicholas Tindle	89003a585d	feat(direct_benchmark): show "would have passed" for timed-out challenges When a challenge times out but the agent's solution would have passed evaluation, this is now clearly indicated: - Completion blocks show "TIMEOUT (would have passed)" in yellow - Recent completions panel shows hourglass icon + "would pass" suffix - Summary table has new "Would Pass" column - Final summary shows "+N would pass" count - Success rate includes "would pass" challenges The evaluator still runs on timed-out challenges to calculate the score, but success remains False. This gives visibility into near-misses that just needed more time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:30:00 -06:00
Nicholas Tindle	0e65785228	fix(direct_benchmark): don't mark timed-out challenges as passed Previously, the evaluator would run on all results including timed-out challenges. If the agent happened to write a working solution before timing out, evaluation would pass and override success=True, resulting in contradictory output showing both PASS and "timed out". Now we skip evaluation for timed-out challenges - they cannot pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:25:41 -06:00
Nicholas Tindle	f07dff1cdd	fix(direct_benchmark): add pytest dependency for challenge evaluation The TicTacToe and other challenges use pytest-based test files for evaluation. Without pytest installed in the benchmark virtualenv, these evaluations were silently failing. Root cause: test.py imports pytest but the package wasn't a dependency, causing ModuleNotFoundError during evaluation subprocess. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:21:12 -06:00
Nicholas Tindle	00e02a4696	feat(direct_benchmark): add run ID to completion blocks Include config:challenge:attempt and timestamp in completion block header for easier debugging and log correlation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:14:23 -06:00
Nicholas Tindle	634bff8277	refactor(forge): replace Selenium with Playwright for web browsing - Remove selenium.py and test_selenium.py - Add playwright_browser.py with WebPlaywrightComponent - Update web component exports to use Playwright - Update dependencies in pyproject.toml/poetry.lock - Minor agent and reflexion strategy improvements - Update CLAUDE.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:57:17 -06:00
Nicholas Tindle	d591f36c7b	fix(direct_benchmark): track cost from LLM provider Previously cost was hardcoded to 0.0. Now extracts cumulative cost from MultiProvider.get_incurred_cost() after each step execution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:37:12 -06:00
Nicholas Tindle	a347bed0b1	feat(direct_benchmark): add incremental resume and selective reset Benchmarks now automatically save progress and resume from where they left off. State is persisted to .benchmark_state.json in reports dir. Features: - Auto-resume: runs skip already-completed challenges - --fresh: clear all state and start over - --retry-failures: re-run only failed challenges - --reset-strategy/model/challenge: selective resets - `state show/clear/reset` subcommands for state management - Config mismatch detection with auto-reset Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:32:27 -06:00
Nicholas Tindle	4eeb6ee2b0	feat(direct_benchmark): add CI mode for non-interactive environments Add --ci flag that disables Rich Live display while preserving completion blocks. Auto-detects CI environment via CI env var or non-TTY stdout. Prints progress every 10 completions for visibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:21:10 -06:00
Nicholas Tindle	7db962b9f9	feat(direct_benchmark): dynamic column layout up to 10 wide - Calculate max columns based on terminal width (up to 10) - Reduced panel width from 35 to 30 chars to fit more - Wider terminals can now show more parallel runs side-by-side Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:15:16 -06:00
Nicholas Tindle	9108b21541	fix(direct_benchmark): parallel execution and always show completion blocks Fixes: - Use run_key (config:challenge) instead of just config_name for tracking active runs - allows multiple challenges from same config to run in parallel - Add asyncio.sleep(0) yields to let multiple tasks acquire semaphore and start before any proceed with work - Always print completion blocks (not just failures) for visibility This should properly show 8/8 active runs when running with --parallel 8. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:13:56 -06:00
Nicholas Tindle	ffe9325296	feat(direct_benchmark): multi-panel UI with copy-paste completion blocks UI improvements: - Multi-column layout: each active config gets its own panel showing challenge name and step history (last 6 steps with status) - Copy-paste completion blocks: when a challenge finishes (especially failures), prints a detailed block with all steps for easy debugging - Configurable logging: suppresses noisy LLM provider warnings unless --debug flag is set - Pass debug flag through harness to UI Example active runs panel: ┌─ one_shot/claude ─┬─ rewoo/claude ────┐ │ ReadFile │ WriteFile │ │ ✓ #1 read_file │ ✓ #1 think │ │ ✓ #2 write_file │ ✓ #2 plan │ │ ● step 3: ... │ ● step 3: ... │ └───────────────────┴───────────────────┘ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:10:34 -06:00
Nicholas Tindle	0a616d9267	feat(direct_benchmark): add step-level logging with colored prefixes - Add step callback to AgentRunner for real-time step logging - BenchmarkUI now shows: - Active runs with current step info - Recent steps panel with colored config prefixes - Proper Live display refresh (implements __rich_console__) - Each config gets a distinct color for easy identification - Verbose mode prints step logs immediately with config prefix - Fix Live display not updating (pass UI object, not rendered content) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:02:20 -06:00
Nicholas Tindle	ab95077e5b	refactor(forge): remove VCR cassettes, use real API calls with skip for forks - Remove vcrpy and pytest-recording dependencies - Remove tests/vcr/ directory and vcr_cassettes submodule - Remove .gitmodules (only had cassette submodule) - Simplify CI workflow - no more cassette checkout/push/PAT_REVIEW - Tests requiring API keys now skip if not set (fork PRs) - Update CLAUDE.md files to remove cassette references - Fix broken agbenchmark path in pyproject.toml Security improvement: removes need for PAT with cross-repo write access. Fork PRs will have API-dependent tests skipped (GitHub protects secrets). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:51:57 -06:00
Nicholas Tindle	804430e243	refactor(classic): migrate from agbenchmark to direct_benchmark harness - Remove old benchmark/ folder with agbenchmark framework - Move challenges to direct_benchmark/challenges/ - Move analysis tools (analyze_reports.py, analyze_failures.py) to direct_benchmark/ - Move challenges_already_beaten.json to direct_benchmark/ - Update CI workflow to use direct_benchmark - Update CLAUDE.md files with new benchmarking instructions - Add benchmarking section to original_autogpt/CLAUDE.md The direct_benchmark harness directly instantiates agents without HTTP server overhead, enabling parallel execution with asyncio semaphore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:29:51 -06:00
Nicholas Tindle	acb320d32d	feat(classic): add noninteractive mode env var and benchmark config logging - Add NONINTERACTIVE_MODE env var support to AppConfig for disabling user interaction during automated runs - Benchmark harness now sets NONINTERACTIVE_MODE=True when starting agents - Add agent configuration logging at server startup (model, strategy, etc.) - Harness logs env vars being passed to agent for verification - Add --agent-output flag to show full agent server output for debugging Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 19:40:24 -06:00
Nicholas Tindle	32f68d5999	feat(classic): add failure analysis tool and improve benchmark output Benchmark improvements: - Add analyze_failures.py for pattern detection and failure analysis - Add informative step output: tool name, args, result status, cost - Add --all and --matrix flags for comprehensive model/strategy testing - Add --analyze-only and --no-analyze flags for flexible analysis control - Auto-run failure analysis after benchmarks with markdown export - Fix directory creation bug in ReportManager (add parents=True) Prompt strategy enhancements: - Implement full plan_execute, reflexion, rewoo, tree_of_thoughts strategies - Add PROMPT_STRATEGY env var support for strategy selection - Add extended thinking support for Anthropic models - Add reasoning effort support for OpenAI o-series models LLM provider improvements: - Add thinking_budget_tokens config for Anthropic extended thinking - Add reasoning_effort config for OpenAI reasoning models - Improve error feedback for LLM self-correction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:58:41 -06:00
Nicholas Tindle	49f56b4e8d	feat(classic): enhance strategy benchmark harness with model comparison and bug fixes - Add model comparison support to test harness (claude, openai, gpt5, opus presets) - Add --models, --smart-llm, --fast-llm, --list-models CLI args - Add real-time logging with timestamps and progress indicators - Fix success parsing bug: read results[0].success instead of non-existent metrics.success - Fix agbenchmark TestResult validation: use exception typename when value is empty - Fix WebArena challenge validation: use strings instead of integers in instantiation_dict - Fix Agent type annotations: create AnyActionProposal union for all prompt strategies - Add pytest integration tests for the strategy benchmark harness Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:07:14 -06:00
Nicholas Tindle	bead811e73	docs(classic): add workspace, settings, and permissions documentation Document the layered configuration system including: - Workspace structure (.autogpt/ directory layout) - Settings location (environment variables, workspace YAML, agent YAML) - Permission system (check order, pattern syntax, approval scopes) - Default security behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 12:17:10 -06:00
Nicholas Tindle	013f728ebf	feat(forge): improve tool call error feedback for LLM self-correction When tool calls fail validation, the error messages now include: - What arguments were actually provided - The expected parameter schema with types and required/optional indicators This helps LLMs understand and fix their mistakes when retrying, rather than just being told a parameter is missing. Example improved error: Invalid function call for write_file: 'contents' is a required property You provided: {"filename": 'story.txt'} Expected parameters: {"filename": string (required), "contents": string (required)} Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 11:49:17 -06:00
Nicholas Tindle	cda9572acd	feat(forge): add lightweight web fetch component Add WebFetchComponent for fast HTTP-based page fetching without browser overhead. Uses trafilatura for intelligent content extraction. Commands: - fetch_webpage: Extract main content as text/markdown/xml - Removes navigation, ads, boilerplate automatically - Extracts page metadata (title, description, author, date) - Extracts and lists page links - Much faster than Selenium-based read_webpage - fetch_raw_html: Get raw HTML for structure inspection - Optional truncation for large pages Features: - Trafilatura-powered content extraction (best-in-class accuracy) - Automatic link extraction with relative URL resolution - Page metadata extraction (OG tags, meta tags) - Configurable timeout, max content length, max links - Proper error handling for timeouts and HTTP errors - 19 comprehensive tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 01:04:22 -06:00
Nicholas Tindle	e0784f8f6b	refactor(forge): simplify deeply nested error handling in Anthropic provider - Extract _get_tool_error_message helper method - Replace 20+ levels of nesting with simple for loop - Improve readability of tool_result construction - Update benchmark poetry.lock Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:15:33 -06:00
Nicholas Tindle	3040f39136	feat(forge): modernize web search with tiered provider system Replace basic DuckDuckGo-only search with a modern tiered system: 1. Tavily (primary) - AI-optimized results with content extraction - AI-generated answer summaries - Relevance scoring - Full page content extraction via search_and_extract command 2. Serper (secondary) - Fast, cheap Google SERP results - $0.30-1.00 per 1K queries - Real Google results without scraping 3. DDGS multi-engine (fallback) - Free, no API key required - Automatic fallback chain: DuckDuckGo → Bing → Brave → Google → etc. - 8 search backends supported Key changes: - Upgrade duckduckgo-search to ddgs v9.10 (renamed successor package) - Add Tavily and Serper API integrations - Implement automatic provider selection and fallback chain - Add search_and_extract command for research with content extraction - Add TAVILY_API_KEY and SERPER_API_KEY to env templates - Update benchmark httpx constraint for ddgs compatibility - 23 comprehensive tests for all providers and fallback scenarios Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:06:42 -06:00
Nicholas Tindle	515504c604	fix(classic): resolve pyright type errors in original_autogpt - Change Agent class to use ActionProposal instead of OneShotAgentActionProposal to support multiple prompt strategy types - Widen display_thoughts parameter type from AssistantThoughts to ModelWithSummary - Fix speak attribute access in agent_protocol_server with hasattr check - Add type: ignore comments for intentional thoughts field overrides in strategies - Remove unused OneShotAgentActionProposal import Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:53:23 -06:00
Nicholas Tindle	18edeaeaf4	fix(classic): fix linting and formatting errors across codebase - Fix 32+ flake8 E501 (line too long) errors by shortening descriptions - Remove unused import in todo.py - Fix test_todo.py argument order (config= keyword) - Add type annotations to fix pyright errors where straightforward - Add noqa comments for flake8 false positives in __init__.py - Remove unused nonlocal declarations in main.py - Run black and isort to fix formatting - Update CLAUDE.md with improved linting commands Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:37:28 -06:00
Nicholas Tindle	44182aff9c	feat(classic): add strategy benchmark test harness for CI - Add test_prompt_strategies.py harness to compare prompt strategies - Add pytest wrapper (test_strategy_benchmark.py) for CI integration - Fix serve command (remove invalid --port flag, use AP_SERVER_PORT env) - Fix test category (interface -> general) - Add aiohttp-retry dependency for agbenchmark - Add pytest markers: slow, integration, requires_agent Usage: poetry run python agbenchmark_config/test_prompt_strategies.py --quick poetry run pytest tests/integration/test_strategy_benchmark.py -v Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:36:19 -06:00
Nicholas Tindle	864c5a7846	fix(classic): approve+feedback now executes command then sends feedback Previously, when a user selected "Once" or "Always" with feedback (via Tab), the command was NOT executed because UserFeedbackProvided was raised before checking the approval scope. This fix changes the architecture from exception-based to return-value-based. Changes: - Add PermissionCheckResult class with allowed, scope, and feedback fields - Change check_command() to return PermissionCheckResult instead of bool - Update prompt_fn signature to return (ApprovalScope, feedback) tuple - Add pending_user_feedback mechanism to EpisodicActionHistory - Update execute() to handle feedback after successful command execution - Feedback message explicitly states "Command executed successfully" - Add on_auto_approve callback for displaying auto-approved commands - Add comprehensive tests for approval/denial with feedback scenarios Behavior: - Once + feedback → Execute command, then send feedback to agent - Always + feedback → Execute command, save permission, send feedback - Deny + feedback → Don't execute, send feedback to agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 22:32:43 -06:00

1 2

82 Commits