Compare commits

..

3 Commits

Author SHA1 Message Date
openhands
6310c070b3 Fix frontend tests for GitHub token documentation changes 2025-03-17 19:22:35 +00:00
dependabot[bot]
41c8c9230b chore(deps): bump the version-all group with 4 updates (#7308)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-17 18:57:56 +00:00
Xingyao Wang
9b9e728cf6 Iterative evaluation with rule-based critic (#7293) 2025-03-17 18:37:35 +00:00
34 changed files with 310 additions and 753 deletions

View File

@@ -11,7 +11,6 @@ on:
paths:
- 'docs/**'
- '.github/workflows/deploy-docs.yml'
- 'pydoc-markdown.yml'
branches:
- main
@@ -40,10 +39,7 @@ jobs:
with:
python-version: '3.12'
- name: Generate Python Docs
run: |
rm -rf docs/modules/python
pip install pydoc-markdown
pydoc-markdown
run: rm -rf docs/modules/python && pip install pydoc-markdown && pydoc-markdown
- name: Install dependencies
run: cd docs && npm ci
- name: Build website

View File

@@ -308,11 +308,6 @@ The agent configuration options are defined in the `[agent]` and `[agent.<agent_
- Default: `false`
- Description: Whether Jupyter is enabled in the action space
- `enable_search_engine`
- Type: `bool`
- Default: `false`
- Description: Whether the search engine tool is enabled in the action space. See [Search Configuration](./search/search-configuration.md) for details.
- `enable_history_truncation`
- Type: `bool`
- Default: `true`

View File

@@ -1,113 +0,0 @@
# Search Configuration
OpenHands provides a search engine capability that allows agents to perform web searches using the Brave Search API. This guide explains how to configure and use the search feature.
## Overview
The search engine feature enables agents to:
- Execute web search queries programmatically
- Get structured results including web pages, news, videos, and FAQs
- Avoid CAPTCHA challenges that often occur when using browser-based search
## Configuration
### Enabling Search
To enable the search engine feature, set the following in your `config.toml`:
```toml
[agent]
enable_search_engine = true
```
Or when using Docker, set the environment variable:
```bash
-e AGENT_ENABLE_SEARCH_ENGINE=true
```
### API Key Setup
The search feature requires a Brave Search API key. You can obtain one from the [Brave Search API Dashboard](https://api.search.brave.com/app/keys).
Set the API key in your `config.toml`:
```toml
[search]
enabled = true
api_key = "your-api-key-here"
```
Or when using Docker:
```bash
-e SEARCH_ENABLED=true
-e SEARCH_API_KEY="your-api-key-here"
```
## Search Results
When a search is performed, the results are returned in a structured format that includes:
- Web search results
- News articles
- Video content
- FAQ entries
- Discussion threads
- Infoboxes (when available)
- Location information (when relevant)
Each result type includes:
- Title
- URL (when applicable)
- Description or snippet
- Additional metadata specific to the result type
## Usage Example
When the search feature is enabled, agents can use the `search_engine` tool to perform searches. For example:
```python
# The agent can make a tool call like this:
{
"name": "search_engine",
"arguments": {
"query": "latest developments in AI"
}
}
```
The search results will be returned in a markdown-formatted structure that's easy for the agent to parse and understand.
## Best Practices
1. **Query Formulation**
- Keep queries focused and specific
- Include relevant keywords
- Avoid overly complex or compound queries
2. **Rate Limiting**
- Be mindful of API rate limits
- Cache results when appropriate
- Implement retries with exponential backoff for failed requests
3. **Error Handling**
- Handle API errors gracefully
- Provide meaningful feedback when searches fail
- Have fallback strategies when search is unavailable
## Troubleshooting
Common issues and solutions:
1. **Search Not Working**
- Verify `enable_search_engine` is set to `true`
- Confirm the Brave API key is correctly set
- Check API key permissions and quotas
2. **No Results**
- Verify the query is not empty
- Try reformulating the search query
- Check for any API response errors
3. **Rate Limiting**
- Monitor API usage
- Implement caching if needed
- Consider upgrading API tier if limits are consistently hit

View File

@@ -18,6 +18,20 @@ Please follow instruction [here](../../README.md#setup) to setup your local deve
## Run Inference (Rollout) on SWE-Bench Instances: Generate Patch from Problem Statement
> [!NOTE]
> **Iterative Evaluation Protocol**
>
> We have an iterative approach for more stable and reproducible results:
> - For each instance, we attempt to generate a solution up to 3 times
> - Each attempt continues until either:
> 1. The agent successfully produces a patch with `AgentFinishAction`, or
> 2. The attempt reaches the maximum iteration limit
> - If an attempt fails, we retry with a fresh attempt (up to the 3-attempt maximum)
> - If your LLM config has temperature=0, we will automatically use temperature=0.1 for the 2nd and 3rd attempts
>
> To enable this iterative protocol, set `export ITERATIVE_EVAL_MODE=true`
### Running Locally with Docker
Make sure your Docker daemon is running, and you have ample disk space (at least 200-500GB, depends on the SWE-Bench set you are running on) for the instance-level docker image.
@@ -45,7 +59,7 @@ to `CodeActAgent`.
default, the script evaluates the entire SWE-bench_Lite test set (300 issues). Note:
in order to use `eval_limit`, you must also set `agent`.
- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By
default, it is set to 30.
default, it is set to 60.
- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
default, it is set to 1.
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, or `princeton-nlp/SWE-bench_Verified`, specifies which dataset to evaluate on.

View File

@@ -37,9 +37,10 @@ from openhands.core.config import (
)
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.critic import AgentFinishedCritic
from openhands.events.action import CmdRunAction, MessageAction
from openhands.events.observation import CmdOutputObservation, ErrorObservation
from openhands.events.serialization.event import event_to_dict
from openhands.events.serialization.event import event_from_dict, event_to_dict
from openhands.runtime.base import Runtime
from openhands.utils.async_utils import call_async_from_sync
from openhands.utils.shutdown_listener import sleep_if_should_continue
@@ -122,7 +123,9 @@ You SHOULD NEVER attempt to browse the web.
# TODO: migrate all swe-bench docker to ghcr.io/openhands
DEFAULT_DOCKER_IMAGE_PREFIX = os.environ.get('EVAL_DOCKER_IMAGE_PREFIX', 'docker.io/xingyaoww/')
DEFAULT_DOCKER_IMAGE_PREFIX = os.environ.get(
'EVAL_DOCKER_IMAGE_PREFIX', 'docker.io/xingyaoww/'
)
logger.info(f'Default docker image prefix: {DEFAULT_DOCKER_IMAGE_PREFIX}')
@@ -637,20 +640,132 @@ if __name__ == '__main__':
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
print(f'### OUTPUT FILE: {output_file} ###')
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8 * 60 * 60, # 8 hour PER instance should be more than enough
max_retries=5,
# Run evaluation in iterative mode:
# If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
ITERATIVE_EVAL_MODE = (
os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
)
ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
)
if not ITERATIVE_EVAL_MODE:
# load the dataset
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=5,
)
else:
critic = AgentFinishedCritic()
def get_cur_output_file_path(attempt: int) -> str:
return (
f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
)
eval_ids = None
for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
cur_output_file = get_cur_output_file_path(attempt)
logger.info(
f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
)
# For deterministic eval, we set temperature to 0.1 for (>1) attempt
# so hopefully we get slightly different results
if attempt > 1 and metadata.llm_config.temperature == 0:
logger.info(
f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
)
metadata.llm_config.temperature = 0.1
# Load instances - at first attempt, we evaluate all instances
# On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
instances = prepare_dataset(
swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
# Run evaluation - but save them to cur_output_file
logger.info(
f'Evaluating {len(instances)} instances for attempt {attempt}...'
)
run_evaluation(
instances,
metadata,
cur_output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=5,
)
# When eval is done, we update eval_ids to the instances that failed the current attempt
instances_failed = []
logger.info(
f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
)
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
history = [event_from_dict(event) for event in instance['history']]
critic_result = critic.evaluate(history)
if not critic_result.success:
instances_failed.append(instance['instance_id'])
logger.info(
f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
)
eval_ids = instances_failed
# If no instances failed, we break
if len(instances_failed) == 0:
break
# Then we should aggregate the results from all attempts into the original output file
# and remove the intermediate files
logger.info(
'Aggregating results from all attempts into the original output file...'
)
fout = open(output_file, 'w')
added_instance_ids = set()
for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
cur_output_file = get_cur_output_file_path(attempt)
if not os.path.exists(cur_output_file):
logger.warning(
f'Intermediate output file {cur_output_file} does not exist. Skipping...'
)
continue
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
if instance['instance_id'] not in added_instance_ids:
fout.write(line)
added_instance_ids.add(instance['instance_id'])
logger.info(
f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
)
fout.close()
logger.info(
f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
)

View File

@@ -25,8 +25,8 @@ if [ -z "$AGENT" ]; then
fi
if [ -z "$MAX_ITER" ]; then
echo "MAX_ITER not specified, use default 100"
MAX_ITER=100
echo "MAX_ITER not specified, use default 60"
MAX_ITER=60
fi
if [ -z "$RUN_WITH_BROWSING" ]; then

View File

@@ -95,7 +95,10 @@ describe("Settings Screen", () => {
await waitFor(() => {
screen.getByTestId("github-token-input");
screen.getByTestId("github-token-help-anchor");
// Check for GitHub link instead of the help anchor
screen.getByRole("link", { name: "GitHub" });
// Check for documentation link
screen.getByRole("link", { name: "documentation" });
screen.getByTestId("language-input");
screen.getByTestId("enable-analytics-switch");
});
@@ -237,10 +240,12 @@ describe("Settings Screen", () => {
await waitFor(() => {
const input = screen.queryByTestId("github-token-input");
const helpAnchor = screen.queryByTestId("github-token-help-anchor");
const githubLink = screen.queryByText("GitHub");
const documentationLink = screen.queryByText("documentation");
expect(input).not.toBeInTheDocument();
expect(helpAnchor).not.toBeInTheDocument();
expect(githubLink).not.toBeInTheDocument();
expect(documentationLink).not.toBeInTheDocument();
});
});

View File

@@ -410,12 +410,31 @@ function AccountSettings() {
placeholder={isGitHubTokenSet ? "**********" : ""}
/>
<HelpLink
testId="github-token-help-anchor"
text="Get your token"
linkText="here"
href="https://github.com/settings/tokens/new?description=openhands-app&scopes=repo,user,workflow"
/>
<p className="text-xs">
Generate a token on{" "}
<b>
<a
href="https://github.com/settings/tokens/new?description=openhands-app&scopes=repo,user,workflow"
target="_blank"
className="underline underline-offset-2"
rel="noopener noreferrer"
>
GitHub
</a>{" "}
</b>
or see the{" "}
<b>
<a
href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token"
target="_blank"
className="underline underline-offset-2"
rel="noopener noreferrer"
>
documentation
</a>
</b>
.
</p>
</>
)}

View File

@@ -70,7 +70,6 @@ class CodeActAgent(Agent):
codeact_enable_browsing=self.config.codeact_enable_browsing,
codeact_enable_jupyter=self.config.codeact_enable_jupyter,
codeact_enable_llm_editor=self.config.codeact_enable_llm_editor,
codeact_enable_search_engine=self.config.enable_search_engine,
llm=self.llm,
)
logger.debug(

View File

@@ -15,7 +15,6 @@ from openhands.agenthub.codeact_agent.tools import (
FinishTool,
IPythonTool,
LLMBasedFileEditTool,
SearchEngineTool,
ThinkTool,
WebReadTool,
create_cmd_run_tool,
@@ -37,7 +36,6 @@ from openhands.events.action import (
FileReadAction,
IPythonRunCellAction,
MessageAction,
SearchAction,
)
from openhands.events.event import FileEditSource, FileReadSource
from openhands.events.tool import ToolCallMetadata
@@ -193,15 +191,6 @@ def response_to_actions(response: ModelResponse) -> list[Action]:
f'Missing required argument "url" in tool call {tool_call.function.name}'
)
action = BrowseURLAction(url=arguments['url'])
# ================================================
# SearchEngineTool (search the web using text queries)
# ================================================
elif tool_call.function.name == SearchEngineTool['function']['name']:
if 'query' not in arguments:
raise FunctionCallNotExistsError(
f'Missing required argument "query" in tool call {tool_call.function.name}'
)
action = SearchAction(query=arguments['query'])
else:
raise FunctionCallNotExistsError(
f'Tool {tool_call.function.name} is not registered. (arguments: {arguments}). Please check the tool name and retry with an existing tool.'
@@ -234,7 +223,6 @@ def get_tools(
codeact_enable_browsing: bool = False,
codeact_enable_llm_editor: bool = False,
codeact_enable_jupyter: bool = False,
codeact_enable_search_engine: bool = False,
llm: LLM | None = None,
) -> list[ChatCompletionToolParam]:
SIMPLIFIED_TOOL_DESCRIPTION_LLM_SUBSTRS = ['gpt-', 'o3', 'o1']
@@ -251,8 +239,6 @@ def get_tools(
ThinkTool,
FinishTool,
]
if codeact_enable_search_engine:
tools.append(SearchEngineTool)
if codeact_enable_browsing:
tools.append(WebReadTool)
tools.append(BrowserTool)

View File

@@ -3,7 +3,6 @@ from .browser import BrowserTool
from .finish import FinishTool
from .ipython import IPythonTool
from .llm_based_edit import LLMBasedFileEditTool
from .search_engine import SearchEngineTool
from .str_replace_editor import create_str_replace_editor_tool
from .think import ThinkTool
from .web_read import WebReadTool
@@ -14,7 +13,6 @@ __all__ = [
'FinishTool',
'IPythonTool',
'LLMBasedFileEditTool',
'SearchEngineTool',
'create_str_replace_editor_tool',
'WebReadTool',
'ThinkTool',

View File

@@ -1,24 +0,0 @@
from litellm import ChatCompletionToolParam, ChatCompletionToolParamFunctionChunk
_SEARCH_ENGINE_DESCRIPTION = """Execute a web search query (similar to Google search).
NOTE: When you need to search for information online, please use the `search_engine` tool rather than the `browser` or `web_read` tools. The `search_engine` tool connects directly to a search engine, which will help avoid CAPTCHA challenges that would otherwise block your access.
"""
SearchEngineTool = ChatCompletionToolParam(
type='function',
function=ChatCompletionToolParamFunctionChunk(
name='search_engine',
description=_SEARCH_ENGINE_DESCRIPTION,
parameters={
'type': 'object',
'properties': {
'query': {
'type': 'string',
'description': 'The web search query (must be a non-empty string).',
},
},
'required': ['query'],
},
),
)

View File

@@ -8,7 +8,6 @@ from openhands.core.config.config_utils import (
from openhands.core.config.extended_config import ExtendedConfig
from openhands.core.config.llm_config import LLMConfig
from openhands.core.config.sandbox_config import SandboxConfig
from openhands.core.config.search_config import SearchConfig
from openhands.core.config.security_config import SecurityConfig
from openhands.core.config.utils import (
finalize_config,
@@ -29,7 +28,6 @@ __all__ = [
'AppConfig',
'LLMConfig',
'SandboxConfig',
'SearchConfig',
'SecurityConfig',
'ExtendedConfig',
'load_app_config',

View File

@@ -2,10 +2,7 @@ from __future__ import annotations
from pydantic import BaseModel, Field, ValidationError
from openhands.core.config.condenser_config import (
CondenserConfig,
NoOpCondenserConfig,
)
from openhands.core.config.condenser_config import CondenserConfig, NoOpCondenserConfig
from openhands.core.logger import openhands_logger as logger
@@ -33,7 +30,6 @@ class AgentConfig(BaseModel):
disabled_microagents: list[str] = Field(default_factory=list)
enable_history_truncation: bool = Field(default=True)
enable_som_visual_browsing: bool = Field(default=False)
enable_search_engine: bool = Field(default=False)
condenser: CondenserConfig = Field(default_factory=NoOpCondenserConfig)
model_config = {'extra': 'forbid'}

View File

@@ -12,7 +12,6 @@ from openhands.core.config.config_utils import (
from openhands.core.config.extended_config import ExtendedConfig
from openhands.core.config.llm_config import LLMConfig
from openhands.core.config.sandbox_config import SandboxConfig
from openhands.core.config.search_config import SearchConfig
from openhands.core.config.security_config import SecurityConfig
@@ -54,7 +53,6 @@ class AppConfig(BaseModel):
default_agent: str = Field(default=OH_DEFAULT_AGENT)
sandbox: SandboxConfig = Field(default_factory=SandboxConfig)
security: SecurityConfig = Field(default_factory=SecurityConfig)
search: SearchConfig = Field(default_factory=SearchConfig)
extended: ExtendedConfig = Field(default_factory=lambda: ExtendedConfig({}))
runtime: str = Field(default='docker')
file_store: str = Field(default='local')

View File

@@ -1,35 +0,0 @@
"""Configuration for search engine functionality."""
import os
from typing import Any
from pydantic import BaseModel, Field, SecretStr
class SearchConfig(BaseModel):
"""Configuration for search engine functionality.
Attributes:
enabled: Whether search engine functionality is enabled.
api_key: The API key for the search engine.
api_url: The base URL for the search API.
"""
enabled: bool = Field(default=False)
api_key: SecretStr | None = Field(default=None)
api_url: str = Field(default="https://api.search.brave.com/res/v1/web/search")
model_config = {"extra": "forbid"}
def model_post_init(self, __context: Any) -> None:
"""Post-initialization hook to assign search-related variables to environment variables.
This ensures that these values are accessible to the search engine at runtime.
"""
super().model_post_init(__context)
# Set environment variables for search engine
if self.api_key:
os.environ["BRAVE_API_KEY"] = self.api_key.get_secret_value()
if self.api_url:
os.environ["BRAVE_API_URL"] = self.api_url

View File

@@ -82,9 +82,6 @@ class ActionTypeSchema(BaseModel):
SEND_PR: str = Field(default='send_pr')
"""Send a PR to github."""
SEARCH: str = Field(default='search')
"""Queries a search engine."""
RECALL: str = Field(default='recall')
"""Retrieves content from a user workspace, microagent, or other source."""

View File

@@ -49,9 +49,6 @@ class ObservationTypeSchema(BaseModel):
CONDENSE: str = Field(default='condense')
"""Result of a condensation operation."""
SEARCH: str = Field(default='search')
"""Result of querying a search engine."""
RECALL: str = Field(default='recall')
"""Result of a recall operation. This can be the workspace context, a microagent, or other types of information."""

View File

@@ -0,0 +1,4 @@
from .base import BaseCritic, CriticResult
from .finish_critic import AgentFinishedCritic
__all__ = ['CriticResult', 'BaseCritic', 'AgentFinishedCritic']

31
openhands/critic/base.py Normal file
View File

@@ -0,0 +1,31 @@
import abc
from pydantic import BaseModel
from openhands.events import Event
class CriticResult(BaseModel):
"""
A critic result is a score and a message.
"""
score: float
message: str
@property
def success(self) -> bool:
"""
Whether the agent is successful.
"""
return self.score >= 0.5
class BaseCritic(abc.ABC):
"""
A critic is a function that takes in a list of events and returns a score about the quality of those events.
"""
@abc.abstractmethod
def evaluate(self, events: list[Event]) -> CriticResult:
pass

View File

@@ -0,0 +1,21 @@
from openhands.critic.base import BaseCritic, CriticResult
from openhands.events import Event
from openhands.events.action import Action, AgentFinishAction
class AgentFinishedCritic(BaseCritic):
"""This is a simple rule-based critic that checks if the last event is an AgentFinishAction.
If not, it will return a score of 0 and a message indicating that the agent did not finish.
"""
def __init__(self):
pass
def evaluate(self, events: list[Event]) -> CriticResult:
last_action = next((h for h in reversed(events) if isinstance(h, Action)), None)
if isinstance(last_action, AgentFinishAction):
return CriticResult(score=1, message='Agent finished.')
else:
return CriticResult(score=0, message='Agent did not finish.')

View File

@@ -17,7 +17,6 @@ from openhands.events.action.files import (
FileWriteAction,
)
from openhands.events.action.message import MessageAction
from openhands.events.action.search_engine import SearchAction
__all__ = [
'Action',
@@ -37,6 +36,5 @@ __all__ = [
'MessageAction',
'ActionConfirmationStatus',
'AgentThinkAction',
'SearchAction',
'RecallAction',
]

View File

@@ -1,24 +0,0 @@
from dataclasses import dataclass
from typing import ClassVar
from openhands.core.schema import ActionType
from openhands.events.action.action import Action
@dataclass
class SearchAction(Action):
query: str
thought: str = ''
action: str = ActionType.SEARCH
runnable: ClassVar[bool] = True
@property
def message(self) -> str:
return f'I am querying the search engine to search for {self.query}'
def __str__(self) -> str:
ret = '**SearchAction**\n'
if self.thought:
ret += f'THOUGHT: {self.thought}\n'
ret += f'QUERY: {self.query}'
return ret

View File

@@ -5,7 +5,6 @@ from openhands.events.observation.agent import (
AgentThinkObservation,
RecallObservation,
)
from openhands.events.observation.search_engine import SearchEngineObservation
from openhands.events.observation.browse import BrowserOutputObservation
from openhands.events.observation.commands import (
CmdOutputMetadata,
@@ -43,7 +42,6 @@ __all__ = [
'SuccessObservation',
'UserRejectObservation',
'AgentCondensationObservation',
'SearchEngineObservation',
'RecallObservation',
'RecallType',
]

View File

@@ -1,22 +0,0 @@
from dataclasses import dataclass
from openhands.core.schema import ObservationType
from openhands.events.observation.observation import Observation
@dataclass
class SearchEngineObservation(Observation):
query: str
observation: str = ObservationType.SEARCH
@property
def message(self) -> str:
return f'Searched for: {self.query}'
def __str__(self) -> str:
ret = (
'**SearchEngineObservation**\n'
f'Query: {self.query}\n'
f'Search Results: {self.content}\n'
)
return ret

View File

@@ -22,7 +22,6 @@ from openhands.events.action.files import (
FileWriteAction,
)
from openhands.events.action.message import MessageAction
from openhands.events.action.search_engine import SearchAction
actions = (
NullAction,
@@ -40,7 +39,6 @@ actions = (
RecallAction,
ChangeAgentStateAction,
MessageAction,
SearchAction,
)
ACTION_TYPE_TO_CLASS = {action_class.action: action_class for action_class in actions} # type: ignore[attr-defined]

View File

@@ -27,7 +27,6 @@ from openhands.events.observation import (
FileEditObservation,
FileReadObservation,
IPythonRunCellObservation,
SearchEngineObservation,
UserRejectObservation,
)
from openhands.events.observation.agent import (
@@ -386,9 +385,6 @@ class ConversationMemory:
elif isinstance(obs, AgentCondensationObservation):
text = truncate_content(obs.content, max_message_chars)
message = Message(role='user', content=[TextContent(text=text)])
elif isinstance(obs, SearchEngineObservation):
text = truncate_content(obs.content, max_message_chars)
message = Message(role='user', content=[TextContent(text=text)])
elif (
isinstance(obs, RecallObservation)
and self.agent_config.enable_prompt_extensions

View File

@@ -41,7 +41,6 @@ from openhands.events.action import (
FileReadAction,
FileWriteAction,
IPythonRunCellAction,
SearchAction,
)
from openhands.events.event import FileEditSource, FileReadSource
from openhands.events.observation import (
@@ -57,7 +56,6 @@ from openhands.events.serialization import event_from_dict, event_to_dict
from openhands.runtime.browser import browse
from openhands.runtime.browser.browser_env import BrowserEnv
from openhands.runtime.plugins import ALL_PLUGINS, JupyterPlugin, Plugin, VSCodePlugin
from openhands.runtime.search_engine.brave_search import search
from openhands.runtime.utils.bash import BashSession
from openhands.runtime.utils.files import insert_lines, read_lines
from openhands.runtime.utils.memory_monitor import MemoryMonitor
@@ -165,6 +163,7 @@ class ActionExecutor:
self.start_time = time.time()
self.last_execution_time = self.start_time
self._initialized = False
self.max_memory_gb: int | None = None
if _override_max_memory_gb := os.environ.get('RUNTIME_MAX_MEMORY_GB', None):
self.max_memory_gb = int(_override_max_memory_gb)
@@ -465,10 +464,6 @@ class ActionExecutor:
async def browse_interactive(self, action: BrowseInteractiveAction) -> Observation:
return await browse(action, self.browser)
async def search(self, action: SearchAction) -> Observation:
obs = await call_sync_from_async(search, action)
return obs
def close(self):
self.memory_monitor.stop_monitoring()
if self.bash_session is not None:

View File

@@ -24,7 +24,6 @@ from openhands.events.action import (
FileReadAction,
FileWriteAction,
IPythonRunCellAction,
SearchAction,
)
from openhands.events.action.action import Action
from openhands.events.action.files import FileEditSource
@@ -298,9 +297,6 @@ class ActionExecutionClient(Runtime):
def browse_interactive(self, action: BrowseInteractiveAction) -> Observation:
return self.send_action_for_execution(action)
def search(self, action: SearchAction) -> Observation:
return self.send_action_for_execution(action)
def close(self) -> None:
# Make sure we don't close the session multiple times
# Can happen in evaluation

View File

@@ -1,3 +0,0 @@
from openhands.runtime.search_engine.brave_search import search
__all__ = ['search']

View File

@@ -1,239 +0,0 @@
import os
import re
import requests
import tenacity
from openhands.core.config import AppConfig
from openhands.events.action import SearchAction
from openhands.events.observation.error import ErrorObservation
from openhands.events.observation.search_engine import SearchEngineObservation
from openhands.utils.tenacity_stop import stop_if_should_exit
def get_title(result):
return f"### Title: {result['title']}\n" if 'title' in result else ''
def get_url(result):
return f"### URL: {result['url']}\n" if 'url' in result else ''
def get_description(result):
return (
f"### Description: {result['description']}\n" if 'description' in result else ''
)
def get_question(result):
return f"### Question: {result['question']}\n" if 'question' in result else ''
def get_answer(result):
return f"### Answer: {result['answer']}\n" if 'answer' in result else ''
def get_cluster(result):
if 'cluster' in result:
output = ''
for i, result_obj in enumerate(result['cluster']):
title = get_title(result_obj)
url = get_url(result_obj)
description = get_description(result_obj)
discussion_output = (
f'### Related webpage\n#{title}#{url}#{description}\n'
if url != ''
else ''
)
output += discussion_output
return output
else:
return ''
def response_to_markdown(results, query):
all_results = {}
# discussions
discussion_results = []
if 'discussions' in results and 'results' in results['discussions']['results']:
for result in results['discussions']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
cluster = get_cluster(result)
discussion_output = f'## Discussion\n{title}{url}{description}{cluster}\n'
discussion_results.append(discussion_output)
all_results['discussions'] = discussion_results
# FAQs
faq_results = []
if 'faq' in results and 'results' in results['faq']:
for result in results['faq']['results']:
title = get_title(result)
url = get_url(result)
question = get_question(result)
answer = get_answer(result)
faq_output = f'## FAQ\n{title}{url}{question}{answer}\n'
faq_results.append(faq_output)
all_results['faq'] = faq_results
# News
news_results = []
if 'news' in results and 'results' in results['news']:
for result in results['news']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
news_output = f'## News\n{title}{url}{description}\n'
news_results.append(news_output)
all_results['news'] = news_results
# Videos
video_results = []
if 'videos' in results and 'results' in results['videos']:
for result in results['videos']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
video_output = f'## Video\n{title}{url}{description}\n'
video_results.append(video_output)
all_results['videos'] = video_results
# Web Search Results
websearch_results = []
if 'web' in results and 'results' in results['web']:
for result in results['web']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
cluster = get_cluster(result)
if cluster:
websearch_output = f'## Webpage\n{title}{url}{description}\n{cluster}\n'
else:
websearch_output = f'## Webpage\n{title}{url}{description}\n'
websearch_results.append(websearch_output)
all_results['web'] = websearch_results
# infobox
infobox_results = []
if 'infobox' in results and 'results' in results['infobox']:
for result in results['infobox']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
infobox_output = f'## Infobox\n{title}{url}{description}\n'
infobox_results.append(infobox_output)
all_results['infobox'] = infobox_results
# locations
location_results = []
if 'locations' in results and 'results' in results['location']:
for result in results['locations']['results']:
title = get_title(result)
url = get_url(result)
description = get_description(result)
location_output = f'## Location\n{title}{url}{description}\n'
location_results.append(location_output)
all_results['locations'] = location_results
markdown = '# Search Results\n\n'
markdown += f'**Searched query**: {query}\n\n'
# ranked results if available
if 'mixed' in results:
for rank_type in ['main', 'top', 'side']:
if rank_type not in results['mixed']:
continue
for ranked_result in results['mixed'][rank_type]:
result_type = ranked_result['type']
if result_type in all_results:
include_all = ranked_result['all']
idx = ranked_result.get('index', None)
if include_all:
markdown += ''.join(all_results[result_type])
elif idx is not None and idx < len(all_results[result_type]):
markdown += all_results[result_type][idx]
for result_list in all_results.values():
for result in result_list:
if result in markdown:
continue
else:
markdown += result
else:
markdown += ''.join(
websearch_results
+ video_results
+ news_results
+ infobox_results
+ faq_results
+ discussion_results
+ location_results
)
return markdown
def return_error(retry_state: tenacity.RetryCallState):
return ErrorObservation('Failed to query Brave Search API.')
@tenacity.retry(
wait=tenacity.wait_exponential(min=2, max=10),
stop=tenacity.stop_after_attempt(5) | stop_if_should_exit(),
retry_error_callback=return_error,
)
def query_api(query: str, API_KEY, BRAVE_SEARCH_URL):
headers = {'Accept': 'application/json', 'X-Subscription-Token': API_KEY}
params: list[tuple[str, str | int | bool]] = [
('q', query),
('count', 20), # Number of results to return, max allowed = 20
('extra_snippets', False), # TODO: Should we keep it as true?
]
response = requests.get(
BRAVE_SEARCH_URL,
headers=headers,
params=params, # type: ignore
timeout=10,
)
response.raise_for_status() # Raise exception for 4XX/5XX responses
results = response.json()
markdown_content = response_to_markdown(results, query)
# TODO: Handle other types of HTML tags? I couldn't find any other tags in brave search responses for the queries I tried.
markdown_content = re.sub(r'</?strong>', '', markdown_content)
return SearchEngineObservation(query=query, content=markdown_content)
def search(action: SearchAction, config: AppConfig):
"""Execute a search query using the Brave Search API.
Args:
action: The search action containing the query.
config: The application configuration.
Returns:
SearchEngineObservation: The search results in markdown format.
ErrorObservation: If the query is empty or search is not enabled.
"""
if not config.search.enabled:
return ErrorObservation(
content='Search engine functionality is not enabled. Enable it by setting search.enabled=true in config.'
)
query = action.query
if query is None or len(query.strip()) == 0:
return ErrorObservation(
content='The query string for search_engine tool must be a non-empty string.'
)
if config.search.api_key is None:
return ErrorObservation(
content='Search API key not configured. Set search.api_key in config.'
)
return query_api(
query=query,
API_KEY=config.search.api_key.get_secret_value(),
BRAVE_SEARCH_URL=config.search.api_url
)

138
poetry.lock generated
View File

@@ -496,18 +496,18 @@ files = [
[[package]]
name = "boto3"
version = "1.37.12"
version = "1.37.13"
description = "The AWS SDK for Python"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "boto3-1.37.12-py3-none-any.whl", hash = "sha256:516feaa0d2afaeda1515216fd09291368a1215754bbccb0f28414c0a91a830a2"},
{file = "boto3-1.37.12.tar.gz", hash = "sha256:9412d404f103ad6d14f033eb29cd5e0cdca2b9b08cbfa9d4dabd1d7be2de2625"},
{file = "boto3-1.37.13-py3-none-any.whl", hash = "sha256:90fa5a91d7d7456219f0b7c4a93b38335dc5cf4613d885da4d4c1d099e04c6b7"},
{file = "boto3-1.37.13.tar.gz", hash = "sha256:295648f887464ab74c5c301a44982df76f9ba39ebfc16be5b8f071ad1a81fe95"},
]
[package.dependencies]
botocore = ">=1.37.12,<1.38.0"
botocore = ">=1.37.13,<1.38.0"
jmespath = ">=0.7.1,<2.0.0"
s3transfer = ">=0.11.0,<0.12.0"
@@ -516,14 +516,14 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
[[package]]
name = "botocore"
version = "1.37.12"
version = "1.37.13"
description = "Low-level, data-driven core of boto 3."
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "botocore-1.37.12-py3-none-any.whl", hash = "sha256:ba1948c883bbabe20d95ff62c3e36954c9269686f7db9361857835677ca3e676"},
{file = "botocore-1.37.12.tar.gz", hash = "sha256:ae2d5328ce6ad02eb615270507235a6e90fd3eeed615a6c0732b5a68b12f2017"},
{file = "botocore-1.37.13-py3-none-any.whl", hash = "sha256:aa417bac0f4d79533080e6e17c0509e149353aec83cfe7879597a7942f7f08d0"},
{file = "botocore-1.37.13.tar.gz", hash = "sha256:60dfb831c54eb466db9b91891a6c8a0c223626caa049969d5d42858ad1e7f8c7"},
]
[package.dependencies]
@@ -3808,14 +3808,14 @@ types-tqdm = "*"
[[package]]
name = "litellm"
version = "1.63.8"
version = "1.63.11"
description = "Library to easily interface with LLM API providers"
optional = false
python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
groups = ["main"]
files = [
{file = "litellm-1.63.8-py3-none-any.whl", hash = "sha256:12615acf16d34b444e13cb9faab89466f63a22330e72e30c7d35e12ebd526188"},
{file = "litellm-1.63.8.tar.gz", hash = "sha256:ae7324fb93a0da2dfd05f8fa301c3ac20dfce05d4651bdb005aeb64c88a76672"},
{file = "litellm-1.63.11-py3-none-any.whl", hash = "sha256:f3915dc35309b164ef2419ad05e5241ddd97f3f47aa036df28365bf889d8ea23"},
{file = "litellm-1.63.11.tar.gz", hash = "sha256:89930895121d0cbf5553e560ed886c45be480ceec0eca3c53ae441473d5d46a4"},
]
[package.dependencies]
@@ -4251,14 +4251,14 @@ files = [
[[package]]
name = "modal"
version = "0.73.102"
version = "0.73.110"
description = "Python client library for Modal"
optional = false
python-versions = ">=3.9"
groups = ["main", "evaluation"]
files = [
{file = "modal-0.73.102-py3-none-any.whl", hash = "sha256:26151ef6164e0b93b0d1961f73d5a715deb72f23e2641215f5410cf58bf403d3"},
{file = "modal-0.73.102.tar.gz", hash = "sha256:198876cf94ff13633283e251d8b37cc1f1bb5e27a7aa547e02072def1f29b66e"},
{file = "modal-0.73.110-py3-none-any.whl", hash = "sha256:5ccdf9ce6e5fbf953738670819a63f02059b65333e270a3fd19a9230b8a6d505"},
{file = "modal-0.73.110.tar.gz", hash = "sha256:d4110c223c975ddd4adbe9e2b9040c4cdbf6dd20625343d1e839b3f1881b33a8"},
]
[package.dependencies]
@@ -4712,67 +4712,67 @@ test = ["pytest", "pytest-console-scripts", "pytest-jupyter", "pytest-tornasync"
[[package]]
name = "numpy"
version = "2.2.3"
version = "2.2.4"
description = "Fundamental package for array computing in Python"
optional = false
python-versions = ">=3.10"
groups = ["main", "evaluation", "test"]
files = [
{file = "numpy-2.2.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:cbc6472e01952d3d1b2772b720428f8b90e2deea8344e854df22b0618e9cce71"},
{file = "numpy-2.2.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:cdfe0c22692a30cd830c0755746473ae66c4a8f2e7bd508b35fb3b6a0813d787"},
{file = "numpy-2.2.3-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:e37242f5324ffd9f7ba5acf96d774f9276aa62a966c0bad8dae692deebec7716"},
{file = "numpy-2.2.3-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:95172a21038c9b423e68be78fd0be6e1b97674cde269b76fe269a5dfa6fadf0b"},
{file = "numpy-2.2.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d5b47c440210c5d1d67e1cf434124e0b5c395eee1f5806fdd89b553ed1acd0a3"},
{file = "numpy-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0391ea3622f5c51a2e29708877d56e3d276827ac5447d7f45e9bc4ade8923c52"},
{file = "numpy-2.2.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:f6b3dfc7661f8842babd8ea07e9897fe3d9b69a1d7e5fbb743e4160f9387833b"},
{file = "numpy-2.2.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:1ad78ce7f18ce4e7df1b2ea4019b5817a2f6a8a16e34ff2775f646adce0a5027"},
{file = "numpy-2.2.3-cp310-cp310-win32.whl", hash = "sha256:5ebeb7ef54a7be11044c33a17b2624abe4307a75893c001a4800857956b41094"},
{file = "numpy-2.2.3-cp310-cp310-win_amd64.whl", hash = "sha256:596140185c7fa113563c67c2e894eabe0daea18cf8e33851738c19f70ce86aeb"},
{file = "numpy-2.2.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:16372619ee728ed67a2a606a614f56d3eabc5b86f8b615c79d01957062826ca8"},
{file = "numpy-2.2.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:5521a06a3148686d9269c53b09f7d399a5725c47bbb5b35747e1cb76326b714b"},
{file = "numpy-2.2.3-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:7c8dde0ca2f77828815fd1aedfdf52e59071a5bae30dac3b4da2a335c672149a"},
{file = "numpy-2.2.3-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:77974aba6c1bc26e3c205c2214f0d5b4305bdc719268b93e768ddb17e3fdd636"},
{file = "numpy-2.2.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d42f9c36d06440e34226e8bd65ff065ca0963aeecada587b937011efa02cdc9d"},
{file = "numpy-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f2712c5179f40af9ddc8f6727f2bd910ea0eb50206daea75f58ddd9fa3f715bb"},
{file = "numpy-2.2.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:c8b0451d2ec95010d1db8ca733afc41f659f425b7f608af569711097fd6014e2"},
{file = "numpy-2.2.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:d9b4a8148c57ecac25a16b0e11798cbe88edf5237b0df99973687dd866f05e1b"},
{file = "numpy-2.2.3-cp311-cp311-win32.whl", hash = "sha256:1f45315b2dc58d8a3e7754fe4e38b6fce132dab284a92851e41b2b344f6441c5"},
{file = "numpy-2.2.3-cp311-cp311-win_amd64.whl", hash = "sha256:9f48ba6f6c13e5e49f3d3efb1b51c8193215c42ac82610a04624906a9270be6f"},
{file = "numpy-2.2.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:12c045f43b1d2915eca6b880a7f4a256f59d62df4f044788c8ba67709412128d"},
{file = "numpy-2.2.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:87eed225fd415bbae787f93a457af7f5990b92a334e346f72070bf569b9c9c95"},
{file = "numpy-2.2.3-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:712a64103d97c404e87d4d7c47fb0c7ff9acccc625ca2002848e0d53288b90ea"},
{file = "numpy-2.2.3-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:a5ae282abe60a2db0fd407072aff4599c279bcd6e9a2475500fc35b00a57c532"},
{file = "numpy-2.2.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5266de33d4c3420973cf9ae3b98b54a2a6d53a559310e3236c4b2b06b9c07d4e"},
{file = "numpy-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3b787adbf04b0db1967798dba8da1af07e387908ed1553a0d6e74c084d1ceafe"},
{file = "numpy-2.2.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:34c1b7e83f94f3b564b35f480f5652a47007dd91f7c839f404d03279cc8dd021"},
{file = "numpy-2.2.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:4d8335b5f1b6e2bce120d55fb17064b0262ff29b459e8493d1785c18ae2553b8"},
{file = "numpy-2.2.3-cp312-cp312-win32.whl", hash = "sha256:4d9828d25fb246bedd31e04c9e75714a4087211ac348cb39c8c5f99dbb6683fe"},
{file = "numpy-2.2.3-cp312-cp312-win_amd64.whl", hash = "sha256:83807d445817326b4bcdaaaf8e8e9f1753da04341eceec705c001ff342002e5d"},
{file = "numpy-2.2.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:7bfdb06b395385ea9b91bf55c1adf1b297c9fdb531552845ff1d3ea6e40d5aba"},
{file = "numpy-2.2.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:23c9f4edbf4c065fddb10a4f6e8b6a244342d95966a48820c614891e5059bb50"},
{file = "numpy-2.2.3-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:a0c03b6be48aaf92525cccf393265e02773be8fd9551a2f9adbe7db1fa2b60f1"},
{file = "numpy-2.2.3-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:2376e317111daa0a6739e50f7ee2a6353f768489102308b0d98fcf4a04f7f3b5"},
{file = "numpy-2.2.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8fb62fe3d206d72fe1cfe31c4a1106ad2b136fcc1606093aeab314f02930fdf2"},
{file = "numpy-2.2.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:52659ad2534427dffcc36aac76bebdd02b67e3b7a619ac67543bc9bfe6b7cdb1"},
{file = "numpy-2.2.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:1b416af7d0ed3271cad0f0a0d0bee0911ed7eba23e66f8424d9f3dfcdcae1304"},
{file = "numpy-2.2.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1402da8e0f435991983d0a9708b779f95a8c98c6b18a171b9f1be09005e64d9d"},
{file = "numpy-2.2.3-cp313-cp313-win32.whl", hash = "sha256:136553f123ee2951bfcfbc264acd34a2fc2f29d7cdf610ce7daf672b6fbaa693"},
{file = "numpy-2.2.3-cp313-cp313-win_amd64.whl", hash = "sha256:5b732c8beef1d7bc2d9e476dbba20aaff6167bf205ad9aa8d30913859e82884b"},
{file = "numpy-2.2.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:435e7a933b9fda8126130b046975a968cc2d833b505475e588339e09f7672890"},
{file = "numpy-2.2.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:7678556eeb0152cbd1522b684dcd215250885993dd00adb93679ec3c0e6e091c"},
{file = "numpy-2.2.3-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:2e8da03bd561504d9b20e7a12340870dfc206c64ea59b4cfee9fceb95070ee94"},
{file = "numpy-2.2.3-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:c9aa4496fd0e17e3843399f533d62857cef5900facf93e735ef65aa4bbc90ef0"},
{file = "numpy-2.2.3-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f4ca91d61a4bf61b0f2228f24bbfa6a9facd5f8af03759fe2a655c50ae2c6610"},
{file = "numpy-2.2.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:deaa09cd492e24fd9b15296844c0ad1b3c976da7907e1c1ed3a0ad21dded6f76"},
{file = "numpy-2.2.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:246535e2f7496b7ac85deffe932896a3577be7af8fb7eebe7146444680297e9a"},
{file = "numpy-2.2.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:daf43a3d1ea699402c5a850e5313680ac355b4adc9770cd5cfc2940e7861f1bf"},
{file = "numpy-2.2.3-cp313-cp313t-win32.whl", hash = "sha256:cf802eef1f0134afb81fef94020351be4fe1d6681aadf9c5e862af6602af64ef"},
{file = "numpy-2.2.3-cp313-cp313t-win_amd64.whl", hash = "sha256:aee2512827ceb6d7f517c8b85aa5d3923afe8fc7a57d028cffcd522f1c6fd082"},
{file = "numpy-2.2.3-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:3c2ec8a0f51d60f1e9c0c5ab116b7fc104b165ada3f6c58abf881cb2eb16044d"},
{file = "numpy-2.2.3-pp310-pypy310_pp73-macosx_14_0_x86_64.whl", hash = "sha256:ed2cf9ed4e8ebc3b754d398cba12f24359f018b416c380f577bbae112ca52fc9"},
{file = "numpy-2.2.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:39261798d208c3095ae4f7bc8eaeb3481ea8c6e03dc48028057d3cbdbdb8937e"},
{file = "numpy-2.2.3-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:783145835458e60fa97afac25d511d00a1eca94d4a8f3ace9fe2043003c678e4"},
{file = "numpy-2.2.3.tar.gz", hash = "sha256:dbdc15f0c81611925f382dfa97b3bd0bc2c1ce19d4fe50482cb0ddc12ba30020"},
{file = "numpy-2.2.4-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:8146f3550d627252269ac42ae660281d673eb6f8b32f113538e0cc2a9aed42b9"},
{file = "numpy-2.2.4-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e642d86b8f956098b564a45e6f6ce68a22c2c97a04f5acd3f221f57b8cb850ae"},
{file = "numpy-2.2.4-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:a84eda42bd12edc36eb5b53bbcc9b406820d3353f1994b6cfe453a33ff101775"},
{file = "numpy-2.2.4-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:4ba5054787e89c59c593a4169830ab362ac2bee8a969249dc56e5d7d20ff8df9"},
{file = "numpy-2.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7716e4a9b7af82c06a2543c53ca476fa0b57e4d760481273e09da04b74ee6ee2"},
{file = "numpy-2.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:adf8c1d66f432ce577d0197dceaac2ac00c0759f573f28516246351c58a85020"},
{file = "numpy-2.2.4-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:218f061d2faa73621fa23d6359442b0fc658d5b9a70801373625d958259eaca3"},
{file = "numpy-2.2.4-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:df2f57871a96bbc1b69733cd4c51dc33bea66146b8c63cacbfed73eec0883017"},
{file = "numpy-2.2.4-cp310-cp310-win32.whl", hash = "sha256:a0258ad1f44f138b791327961caedffbf9612bfa504ab9597157806faa95194a"},
{file = "numpy-2.2.4-cp310-cp310-win_amd64.whl", hash = "sha256:0d54974f9cf14acf49c60f0f7f4084b6579d24d439453d5fc5805d46a165b542"},
{file = "numpy-2.2.4-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e9e0a277bb2eb5d8a7407e14688b85fd8ad628ee4e0c7930415687b6564207a4"},
{file = "numpy-2.2.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:9eeea959168ea555e556b8188da5fa7831e21d91ce031e95ce23747b7609f8a4"},
{file = "numpy-2.2.4-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:bd3ad3b0a40e713fc68f99ecfd07124195333f1e689387c180813f0e94309d6f"},
{file = "numpy-2.2.4-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:cf28633d64294969c019c6df4ff37f5698e8326db68cc2b66576a51fad634880"},
{file = "numpy-2.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2fa8fa7697ad1646b5c93de1719965844e004fcad23c91228aca1cf0800044a1"},
{file = "numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f4162988a360a29af158aeb4a2f4f09ffed6a969c9776f8f3bdee9b06a8ab7e5"},
{file = "numpy-2.2.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:892c10d6a73e0f14935c31229e03325a7b3093fafd6ce0af704be7f894d95687"},
{file = "numpy-2.2.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:db1f1c22173ac1c58db249ae48aa7ead29f534b9a948bc56828337aa84a32ed6"},
{file = "numpy-2.2.4-cp311-cp311-win32.whl", hash = "sha256:ea2bb7e2ae9e37d96835b3576a4fa4b3a97592fbea8ef7c3587078b0068b8f09"},
{file = "numpy-2.2.4-cp311-cp311-win_amd64.whl", hash = "sha256:f7de08cbe5551911886d1ab60de58448c6df0f67d9feb7d1fb21e9875ef95e91"},
{file = "numpy-2.2.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a7b9084668aa0f64e64bd00d27ba5146ef1c3a8835f3bd912e7a9e01326804c4"},
{file = "numpy-2.2.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:dbe512c511956b893d2dacd007d955a3f03d555ae05cfa3ff1c1ff6df8851854"},
{file = "numpy-2.2.4-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:bb649f8b207ab07caebba230d851b579a3c8711a851d29efe15008e31bb4de24"},
{file = "numpy-2.2.4-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:f34dc300df798742b3d06515aa2a0aee20941c13579d7a2f2e10af01ae4901ee"},
{file = "numpy-2.2.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c3f7ac96b16955634e223b579a3e5798df59007ca43e8d451a0e6a50f6bfdfba"},
{file = "numpy-2.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4f92084defa704deadd4e0a5ab1dc52d8ac9e8a8ef617f3fbb853e79b0ea3592"},
{file = "numpy-2.2.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:7a4e84a6283b36632e2a5b56e121961f6542ab886bc9e12f8f9818b3c266bfbb"},
{file = "numpy-2.2.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:11c43995255eb4127115956495f43e9343736edb7fcdb0d973defd9de14cd84f"},
{file = "numpy-2.2.4-cp312-cp312-win32.whl", hash = "sha256:65ef3468b53269eb5fdb3a5c09508c032b793da03251d5f8722b1194f1790c00"},
{file = "numpy-2.2.4-cp312-cp312-win_amd64.whl", hash = "sha256:2aad3c17ed2ff455b8eaafe06bcdae0062a1db77cb99f4b9cbb5f4ecb13c5146"},
{file = "numpy-2.2.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1cf4e5c6a278d620dee9ddeb487dc6a860f9b199eadeecc567f777daace1e9e7"},
{file = "numpy-2.2.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:1974afec0b479e50438fc3648974268f972e2d908ddb6d7fb634598cdb8260a0"},
{file = "numpy-2.2.4-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:79bd5f0a02aa16808fcbc79a9a376a147cc1045f7dfe44c6e7d53fa8b8a79392"},
{file = "numpy-2.2.4-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:3387dd7232804b341165cedcb90694565a6015433ee076c6754775e85d86f1fc"},
{file = "numpy-2.2.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6f527d8fdb0286fd2fd97a2a96c6be17ba4232da346931d967a0630050dfd298"},
{file = "numpy-2.2.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bce43e386c16898b91e162e5baaad90c4b06f9dcbe36282490032cec98dc8ae7"},
{file = "numpy-2.2.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:31504f970f563d99f71a3512d0c01a645b692b12a63630d6aafa0939e52361e6"},
{file = "numpy-2.2.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:81413336ef121a6ba746892fad881a83351ee3e1e4011f52e97fba79233611fd"},
{file = "numpy-2.2.4-cp313-cp313-win32.whl", hash = "sha256:f486038e44caa08dbd97275a9a35a283a8f1d2f0ee60ac260a1790e76660833c"},
{file = "numpy-2.2.4-cp313-cp313-win_amd64.whl", hash = "sha256:207a2b8441cc8b6a2a78c9ddc64d00d20c303d79fba08c577752f080c4007ee3"},
{file = "numpy-2.2.4-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:8120575cb4882318c791f839a4fd66161a6fa46f3f0a5e613071aae35b5dd8f8"},
{file = "numpy-2.2.4-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:a761ba0fa886a7bb33c6c8f6f20213735cb19642c580a931c625ee377ee8bd39"},
{file = "numpy-2.2.4-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:ac0280f1ba4a4bfff363a99a6aceed4f8e123f8a9b234c89140f5e894e452ecd"},
{file = "numpy-2.2.4-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:879cf3a9a2b53a4672a168c21375166171bc3932b7e21f622201811c43cdd3b0"},
{file = "numpy-2.2.4-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f05d4198c1bacc9124018109c5fba2f3201dbe7ab6e92ff100494f236209c960"},
{file = "numpy-2.2.4-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e2f085ce2e813a50dfd0e01fbfc0c12bbe5d2063d99f8b29da30e544fb6483b8"},
{file = "numpy-2.2.4-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:92bda934a791c01d6d9d8e038363c50918ef7c40601552a58ac84c9613a665bc"},
{file = "numpy-2.2.4-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:ee4d528022f4c5ff67332469e10efe06a267e32f4067dc76bb7e2cddf3cd25ff"},
{file = "numpy-2.2.4-cp313-cp313t-win32.whl", hash = "sha256:05c076d531e9998e7e694c36e8b349969c56eadd2cdcd07242958489d79a7286"},
{file = "numpy-2.2.4-cp313-cp313t-win_amd64.whl", hash = "sha256:188dcbca89834cc2e14eb2f106c96d6d46f200fe0200310fc29089657379c58d"},
{file = "numpy-2.2.4-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:7051ee569db5fbac144335e0f3b9c2337e0c8d5c9fee015f259a5bd70772b7e8"},
{file = "numpy-2.2.4-pp310-pypy310_pp73-macosx_14_0_x86_64.whl", hash = "sha256:ab2939cd5bec30a7430cbdb2287b63151b77cf9624de0532d629c9a1c59b1d5c"},
{file = "numpy-2.2.4-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d0f35b19894a9e08639fd60a1ec1978cb7f5f7f1eace62f38dd36be8aecdef4d"},
{file = "numpy-2.2.4-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:b4adfbbc64014976d2f91084915ca4e626fbf2057fb81af209c1a6d776d23e3d"},
{file = "numpy-2.2.4.tar.gz", hash = "sha256:9ba03692a45d3eef66559efe1d1096c4b9b75c0986b5dff5530c378fb8331d4f"},
]
[[package]]

View File

@@ -1,83 +0,0 @@
"""Tests for the Brave Search functionality."""
from unittest.mock import Mock, patch
import pytest
from openhands.core.config import AppConfig, SearchConfig
from openhands.events.action import SearchAction
from openhands.events.observation.error import ErrorObservation
from openhands.events.observation.search_engine import SearchEngineObservation
from openhands.runtime.search_engine.brave_search import search
@pytest.fixture
def mock_config():
"""Create a mock config with search enabled."""
config = AppConfig()
config.search = SearchConfig(
enabled=True,
api_key="test_key",
api_url="https://test.url"
)
return config
@pytest.fixture
def mock_query_api():
"""Create a mock query_api function."""
with patch("openhands.runtime.search_engine.brave_search.query_api") as mock:
mock.return_value = SearchEngineObservation(
query="test query",
content="test content"
)
yield mock
def test_search_disabled(mock_query_api):
"""Test that search returns error when disabled."""
config = AppConfig()
config.search = SearchConfig(enabled=False)
action = SearchAction(query="test query")
result = search(action, config)
assert isinstance(result, ErrorObservation)
assert "not enabled" in result.content
mock_query_api.assert_not_called()
def test_search_no_api_key(mock_query_api):
"""Test that search returns error when API key is not set."""
config = AppConfig()
config.search = SearchConfig(enabled=True)
action = SearchAction(query="test query")
result = search(action, config)
assert isinstance(result, ErrorObservation)
assert "API key not configured" in result.content
mock_query_api.assert_not_called()
def test_search_empty_query(mock_query_api, mock_config):
"""Test that search returns error when query is empty."""
action = SearchAction(query="")
result = search(action, mock_config)
assert isinstance(result, ErrorObservation)
assert "must be a non-empty string" in result.content
mock_query_api.assert_not_called()
def test_search_success(mock_query_api, mock_config):
"""Test that search returns results when everything is configured correctly."""
action = SearchAction(query="test query")
result = search(action, mock_config)
assert isinstance(result, SearchEngineObservation)
assert result.query == "test query"
assert result.content == "test content"
mock_query_api.assert_called_once_with(
query="test query",
API_KEY="test_key",
BRAVE_SEARCH_URL="https://test.url"
)

View File

@@ -25,7 +25,6 @@ from openhands.core.message import ImageContent, Message, TextContent
from openhands.events.action import (
CmdRunAction,
MessageAction,
SearchAction,
)
from openhands.events.event import EventSource
from openhands.events.observation.commands import (
@@ -101,26 +100,22 @@ def test_get_tools_with_options():
codeact_enable_browsing=True,
codeact_enable_jupyter=True,
codeact_enable_llm_editor=True,
codeact_enable_search_engine=True,
)
tool_names = [tool['function']['name'] for tool in tools]
assert 'browser' in tool_names
assert 'execute_ipython_cell' in tool_names
assert 'edit_file' in tool_names
assert 'search_engine' in tool_names
# Test with all options disabled
tools = get_tools(
codeact_enable_browsing=False,
codeact_enable_jupyter=False,
codeact_enable_llm_editor=False,
codeact_enable_search_engine=False,
)
tool_names = [tool['function']['name'] for tool in tools]
assert 'browser' not in tool_names
assert 'execute_ipython_cell' not in tool_names
assert 'edit_file' not in tool_names
assert 'search_engine' not in tool_names
def test_cmd_run_tool():
@@ -181,15 +176,6 @@ def test_web_read_tool():
assert WebReadTool['function']['parameters']['required'] == ['url']
def test_search_engine_tool():
from openhands.agenthub.codeact_agent.tools import SearchEngineTool
assert SearchEngineTool['type'] == 'function'
assert SearchEngineTool['function']['name'] == 'search_engine'
assert 'query' in SearchEngineTool['function']['parameters']['properties']
assert SearchEngineTool['function']['parameters']['required'] == ['query']
def test_browser_tool():
assert BrowserTool['type'] == 'function'
assert BrowserTool['function']['name'] == 'browser'
@@ -226,42 +212,6 @@ def test_browser_tool():
assert 'description' in BrowserTool['function']['parameters']['properties']['code']
def test_response_to_actions_search_engine():
# Test response with search engine tool call
from litellm import ChatCompletionMessageToolCall, Choices, Message, ModelResponse
mock_response = ModelResponse(
id='mock_id',
choices=[
Choices(
message=Message(
content='Let me search for that',
tool_calls=[
ChatCompletionMessageToolCall(
id='tool_call_10',
function={
'name': 'search_engine',
'arguments': '{"query": "test query"}',
},
type='function',
)
],
role='assistant',
),
index=0,
finish_reason='tool_calls',
)
],
model='mock_model',
usage={'total_tokens': 100},
)
actions = response_to_actions(mock_response)
assert len(actions) == 1
assert isinstance(actions[0], SearchAction)
assert actions[0].query == 'test query'
def test_response_to_actions_invalid_tool():
# Test response with invalid tool call
mock_response = Mock()