Compare commits

...

8 Commits

Author SHA1 Message Date
Robert Brennan 155b806bff update nikolaik 2025-03-31 13:24:09 -04:00
tofarr 6ae2984580 Fix for circular import on ConversationValidator (#7583) 2025-03-31 11:09:10 -06:00
dependabot[bot] f12bf985ce chore(deps): bump the version-all group with 6 updates (#7600)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-31 15:44:35 +00:00
Xingyao Wang b6321488bc Update pre-commit instructions in repository memory (#7595)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-31 21:15:45 +08:00
Xingyao Wang 54236f9617 [eval] Support SWE-Bench Multimodal (#7122)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-31 07:42:44 -04:00
Xingyao Wang 2c4496b129 feat: Use LLM-generated natural-language descriptions as conversation title (#7049)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-30 21:34:07 +00:00
Peter Dave Hello 4b177992f8 Clean up apt temporary files in app Dockerfile (#7590) 2025-03-30 16:37:54 +00:00
mkusaka fa61e862e0 Fix broken markdown link for Anthropic billing settings (#7589)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-30 14:23:46 +00:00
16 changed files with 310 additions and 79 deletions
+10 -2
View File
@@ -10,13 +10,21 @@ This repository contains the code for OpenHands, an automated AI software engine
To set up the entire repo, including frontend and backend, run `make build`.
You don't need to do this unless the user asks you to, or if you're trying to run the entire application.
Before pushing any changes, you should ensure that any lint errors or simple test errors have been fixed.
IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
* If you've made changes to the backend, you should run `pre-commit run --all-files --config ./dev_config/python/.pre-commit-config.yaml`
* If you've made changes to the frontend, you should run `cd frontend && npm run lint:fix && npm run build ; cd ..`
The pre-commit hooks MUST pass successfully before pushing any changes to the repository. This is a mandatory requirement to maintain code quality and consistency.
If either command fails, it may have automatically fixed some issues. You should fix any issues that weren't automatically fixed,
then re-run the command to ensure it passes.
then re-run the command to ensure it passes. Common issues include:
- Mypy type errors
- Ruff formatting issues
- Trailing whitespace
- Missing newlines at end of files
## Repository Structure
Backend:
+2 -1
View File
@@ -48,7 +48,8 @@ RUN mkdir -p $FILE_STORE_PATH
RUN mkdir -p $WORKSPACE_BASE
RUN apt-get update -y \
&& apt-get install -y curl ssh sudo
&& apt-get install -y curl ssh sudo \
&& rm -rf /var/lib/apt/lists/*
# Default is 1000, but OSX is often 501
RUN sed -i 's/^UID_MIN.*/UID_MIN 499/' /etc/login.defs
+1 -1
View File
@@ -95,7 +95,7 @@ OpenHands requires an API key to access most language models. Here's how to get
1. [Create an Anthropic account](https://console.anthropic.com/)
2. [Generate an API key](https://console.anthropic.com/settings/keys)
3. [Set up billing(https://console.anthropic.com/settings/billing)
3. [Set up billing](https://console.anthropic.com/settings/billing)
Consider setting usage limits to control costs.
+11 -2
View File
@@ -2,7 +2,9 @@
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).
**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).**
**UPDATE (7/1/2024): We now support the official SWE-Bench dockerized evaluation as announced [here](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240627_docker/README.md).**
@@ -62,7 +64,7 @@ in order to use `eval_limit`, you must also set `agent`.
default, it is set to 60.
- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
default, it is set to 1.
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, or `princeton-nlp/SWE-bench_Verified`, specifies which dataset to evaluate on.
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, `princeton-nlp/SWE-bench_Verified`, or `princeton-nlp/SWE-bench_Multimodal`, specifies which dataset to evaluate on.
- `dataset_split`, split for the huggingface dataset. e.g., `test`, `dev`. Default to `test`.
> [!CAUTION]
@@ -82,6 +84,13 @@ then your command would be:
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
```
For multimodal evaluation, you can use:
```bash
# Example for running multimodal SWE-Bench evaluation
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_vision HEAD CodeActAgent 10 100 1 princeton-nlp/SWE-bench_Multimodal test
```
### Running in parallel with RemoteRuntime
OpenHands Remote Runtime is currently in beta (read [here](https://runtime.all-hands.dev/) for more details), it allows you to run rollout in parallel in the cloud, so you don't need a powerful machine to run evaluation.
+34 -16
View File
@@ -58,7 +58,7 @@ def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
return f'{instance.repo}__{instance.version}'.replace('/', '__')
def get_instruction(instance: pd.Series, metadata: EvalMetadata):
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
instruction = f"""
<uploaded_files>
@@ -114,12 +114,20 @@ Be thorough in your exploration, testing, and reasoning. It's fine if your think
"""
if RUN_WITH_BROWSING:
instruction += """
<IMPORTANT!>
You SHOULD NEVER attempt to browse the web.
</IMPORTANT!>
"""
return instruction
instruction += (
'<IMPORTANT!>\n'
'You SHOULD NEVER attempt to browse the web. '
'</IMPORTANT!>\n'
)
if 'image_assets' in instance:
assets = instance['image_assets']
assert (
'problem_statement' in assets
), 'problem_statement is required in image_assets'
image_urls = assets['problem_statement']
return MessageAction(content=instruction, image_urls=image_urls)
return MessageAction(content=instruction)
# TODO: migrate all swe-bench docker to ghcr.io/openhands
@@ -129,14 +137,18 @@ DEFAULT_DOCKER_IMAGE_PREFIX = os.environ.get(
logger.info(f'Default docker image prefix: {DEFAULT_DOCKER_IMAGE_PREFIX}')
def get_instance_docker_image(instance_id: str, official_image: bool = False) -> str:
if official_image:
def get_instance_docker_image(
instance_id: str,
swebench_official_image: bool = False,
) -> str:
if swebench_official_image:
# Official SWE-Bench image
# swebench/sweb.eval.x86_64.django_1776_django-11333:v1
docker_image_prefix = 'docker.io/swebench/'
repo, name = instance_id.split('__')
image_name = f'sweb.eval.x86_64.{repo}_1776_{name}:latest'
logger.warning(f'Using official SWE-Bench image: {image_name}')
image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'
logger.info(f'Using official SWE-Bench image: {image_name}')
return image_name
else:
# OpenHands version of the image
docker_image_prefix = DEFAULT_DOCKER_IMAGE_PREFIX
@@ -144,7 +156,7 @@ def get_instance_docker_image(instance_id: str, official_image: bool = False) ->
image_name = image_name.replace(
'__', '_s_'
) # to comply with docker image naming convention
return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
def get_config(
@@ -152,12 +164,13 @@ def get_config(
metadata: EvalMetadata,
) -> AppConfig:
# We use a different instance image for the each instance of swe-bench eval
use_official_image = bool(
use_swebench_official_image = bool(
('verified' in metadata.dataset.lower() or 'lite' in metadata.dataset.lower())
and 'swe-gym' not in metadata.dataset.lower()
)
base_container_image = get_instance_docker_image(
instance['instance_id'], use_official_image
instance['instance_id'],
swebench_official_image=use_swebench_official_image,
)
logger.info(
f'Using instance container image: {base_container_image}. '
@@ -493,13 +506,13 @@ def process_instance(
try:
initialize_runtime(runtime, instance)
instruction = get_instruction(instance, metadata)
message_action = get_instruction(instance, metadata)
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State | None = asyncio.run(
run_controller(
config=config,
initial_user_action=MessageAction(content=instruction),
initial_user_action=message_action,
runtime=runtime,
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
metadata.agent_class
@@ -539,6 +552,11 @@ def process_instance(
metrics = get_metrics(state)
# Save the output
instruction = message_action.content
if message_action.image_urls:
instruction += (
'\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
)
output = EvalOutput(
instance_id=instance.instance_id,
instruction=instruction,
@@ -22,11 +22,11 @@ vi.mock("#/context/auth-context", () => ({
describe("ActionSuggestions", () => {
// Setup mocks for each test
vi.clearAllMocks();
(useAuth as any).mockReturnValue({
githubTokenIsSet: true,
});
(useSelector as any).mockReturnValue({
selectedRepository: "test-repo",
});
@@ -66,16 +66,16 @@ describe("ActionSuggestions", () => {
it("should have different prompts for 'Push to Branch' and 'Push & Create PR' buttons", () => {
// This test verifies that the prompts are different in the component
const component = render(<ActionSuggestions onSuggestionsClick={() => {}} />);
// Get the component instance to access the internal values
const pushBranchPrompt = "Please push the changes to a remote branch on GitHub, but do NOT create a pull request. Please use the exact SAME branch name as the one you are currently on.";
const createPRPrompt = "Please push the changes to GitHub and open a pull request. Please create a meaningful branch name that describes the changes.";
// Verify the prompts are different
expect(pushBranchPrompt).not.toEqual(createPRPrompt);
// Verify the PR prompt mentions creating a meaningful branch name
expect(createPRPrompt).toContain("meaningful branch name");
expect(createPRPrompt).not.toContain("SAME branch name");
});
});
});
+1 -1
View File
@@ -47,7 +47,7 @@ class SandboxConfig(BaseModel):
rm_all_containers: bool = Field(default=False)
api_key: str | None = Field(default=None)
base_container_image: str = Field(
default='nikolaik/python-nodejs:python3.12-nodejs22'
default='nikolaik/python-nodejs:python3.13-nodejs23-bullseye'
)
runtime_container_image: str | None = Field(default=None)
user_id: int = Field(default=os.getuid() if hasattr(os, 'getuid') else 1000)
+1 -1
View File
@@ -113,7 +113,7 @@ class RecallObservation(Observation):
f'repo_instructions={self.repo_instructions[:20]}...',
f'runtime_hosts={self.runtime_hosts}',
f'additional_agent_instructions={self.additional_agent_instructions[:20]}...',
f'date={self.date}'
f'date={self.date}',
]
)
else:
@@ -1,4 +1,4 @@
{
"workbench.colorTheme": "Default Dark Modern",
"workbench.startupEditor": "none"
}
}
+2 -2
View File
@@ -23,7 +23,7 @@ from openhands.server.shared import (
sio,
)
from openhands.storage.conversation.conversation_validator import (
ConversationValidatorImpl,
create_conversation_validator,
)
@@ -38,7 +38,7 @@ async def connect(connection_id: str, environ):
raise ConnectionRefusedError('No conversation_id in query params')
cookies_str = environ.get('HTTP_COOKIE', '')
conversation_validator = ConversationValidatorImpl()
conversation_validator = create_conversation_validator()
user_id, github_user_id = await conversation_validator.validate(
conversation_id, cookies_str
)
@@ -243,6 +243,24 @@ async def get_conversation(
try:
metadata = await conversation_store.get_metadata(conversation_id)
is_running = await conversation_manager.is_agent_loop_running(conversation_id)
# Check if we need to update the title
if is_running and metadata:
# Check if the title is a default title (contains the conversation ID)
if metadata.title and conversation_id[:5] in metadata.title:
# Generate a new title
new_title = await auto_generate_title(
conversation_id, get_user_id(request)
)
if new_title:
# Update the metadata
metadata.title = new_title
await conversation_store.save_metadata(metadata)
# Refresh metadata after update
metadata = await conversation_store.get_metadata(conversation_id)
conversation_info = await _get_conversation_info(metadata, is_running)
return conversation_info
except FileNotFoundError:
@@ -265,6 +283,7 @@ def get_default_conversation_title(conversation_id: str) -> str:
async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
"""
Auto-generate a title for a conversation based on the first user message.
Uses LLM-based title generation if available, otherwise falls back to a simple truncation.
Args:
conversation_id: The ID of the conversation
@@ -292,11 +311,39 @@ async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
break
if first_user_message:
# Try LLM-based title generation first
from openhands.core.config.llm_config import LLMConfig
from openhands.utils.conversation_summary import generate_conversation_title
# Get LLM config from user settings
try:
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
settings = await settings_store.load()
if settings and settings.llm_model:
# Create LLM config from settings
llm_config = LLMConfig(
model=settings.llm_model,
api_key=settings.llm_api_key,
base_url=settings.llm_base_url,
)
# Try to generate title using LLM
llm_title = await generate_conversation_title(
first_user_message, llm_config
)
if llm_title:
logger.info(f'Generated title using LLM: {llm_title}')
return llm_title
except Exception as e:
logger.error(f'Error using LLM for title generation: {e}')
# Fall back to simple truncation if LLM generation fails or is unavailable
first_user_message = first_user_message.strip()
title = first_user_message[:30]
if len(first_user_message) > 30:
title += '...'
logger.info(f'Generated title: {title}')
logger.info(f'Generated title using truncation: {title}')
return title
except Exception as e:
logger.error(f'Error generating title: {str(e)}')
@@ -315,10 +362,12 @@ async def update_conversation(
if not metadata:
return False
# If title is empty or unspecified, auto-generate it from the first user message
# If title is empty or unspecified, auto-generate it
if not title or title.isspace():
title = await auto_generate_title(conversation_id, user_id)
if not title:
# If we still don't have a title, use the default
if not title or title.isspace():
title = get_default_conversation_title(conversation_id)
metadata.title = title
@@ -361,9 +410,9 @@ async def _get_conversation_info(
last_updated_at=conversation.last_updated_at,
created_at=conversation.created_at,
selected_repository=conversation.selected_repository,
status=ConversationStatus.RUNNING
if is_running
else ConversationStatus.STOPPED,
status=(
ConversationStatus.RUNNING if is_running else ConversationStatus.STOPPED
),
)
except Exception as e:
logger.error(
@@ -10,8 +10,12 @@ class ConversationValidator:
return None, None
conversation_validator_cls = os.environ.get(
'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
'openhands.storage.conversation.conversation_validator.ConversationValidator',
)
ConversationValidatorImpl = get_impl(ConversationValidator, conversation_validator_cls)
def create_conversation_validator():
conversation_validator_cls = os.environ.get(
'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
'openhands.storage.conversation.conversation_validator.ConversationValidator',
)
ConversationValidatorImpl = get_impl(
ConversationValidator, conversation_validator_cls
)
return ConversationValidatorImpl()
+57
View File
@@ -0,0 +1,57 @@
"""Utility functions for generating conversation summaries."""
from typing import Optional
from openhands.core.config import LLMConfig
from openhands.core.logger import openhands_logger as logger
from openhands.llm.llm import LLM
async def generate_conversation_title(
message: str, llm_config: LLMConfig, max_length: int = 50
) -> Optional[str]:
"""Generate a concise title for a conversation based on the first user message.
Args:
message: The first user message in the conversation.
llm_config: The LLM configuration to use for generating the title.
max_length: The maximum length of the generated title.
Returns:
A concise title for the conversation, or None if generation fails.
"""
if not message or message.strip() == '':
return None
# Truncate very long messages to avoid excessive token usage
if len(message) > 1000:
truncated_message = message[:1000] + '...(truncated)'
else:
truncated_message = message
try:
llm = LLM(llm_config)
# Create a simple prompt for the LLM to generate a title
messages = [
{
'role': 'system',
'content': 'You are a helpful assistant that generates concise, descriptive titles for conversations with OpenHands. OpenHands is a helpful AI agent that can interact with a computer to solve tasks using bash terminal, file editor, and browser. Given a user message (which may be truncated), generate a concise, descriptive title for the conversation. Return only the title, with no additional text, quotes, or explanations.',
},
{
'role': 'user',
'content': f'Generate a title (maximum {max_length} characters) for a conversation that starts with this message:\n\n{truncated_message}',
},
]
response = llm.completion(messages=messages)
title = response.choices[0].message.content.strip()
# Ensure the title isn't too long
if len(title) > max_length:
title = title[: max_length - 3] + '...'
return title
except Exception as e:
logger.error(f'Error generating conversation title: {e}')
return None
+2
View File
@@ -1,4 +1,5 @@
import importlib
from functools import lru_cache
from typing import Type, TypeVar
T = TypeVar('T')
@@ -13,6 +14,7 @@ def import_from(qual_name: str):
return result
@lru_cache()
def get_impl(cls: Type[T], impl_name: str | None) -> Type[T]:
"""Import a named implementation of the specified class"""
if impl_name is None:
Generated
+35 -35
View File
@@ -496,18 +496,18 @@ files = [
[[package]]
name = "boto3"
version = "1.37.22"
version = "1.37.23"
description = "The AWS SDK for Python"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "boto3-1.37.22-py3-none-any.whl", hash = "sha256:a14324d5fa5f4fea00c0e3c69754cbd28100f7fe194693eeecf2dc07446cf4ef"},
{file = "boto3-1.37.22.tar.gz", hash = "sha256:78a0ec0aafbf6044104c98ad80b69e6d1c83d8233fda2c2d241029e6c705c510"},
{file = "boto3-1.37.23-py3-none-any.whl", hash = "sha256:fc462b9fd738bd8a1c121d94d237c6b6a05a2c1cc709d16f5223acb752f7310b"},
{file = "boto3-1.37.23.tar.gz", hash = "sha256:82f4599a34f5eb66e916b9ac8547394f6e5899c19580e74b60237db04cf66d1e"},
]
[package.dependencies]
botocore = ">=1.37.22,<1.38.0"
botocore = ">=1.37.23,<1.38.0"
jmespath = ">=0.7.1,<2.0.0"
s3transfer = ">=0.11.0,<0.12.0"
@@ -516,14 +516,14 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
[[package]]
name = "boto3-stubs"
version = "1.37.22"
description = "Type annotations for boto3 1.37.22 generated with mypy-boto3-builder 8.10.1"
version = "1.37.23"
description = "Type annotations for boto3 1.37.23 generated with mypy-boto3-builder 8.10.1"
optional = false
python-versions = ">=3.8"
groups = ["evaluation"]
files = [
{file = "boto3_stubs-1.37.22-py3-none-any.whl", hash = "sha256:7d41213bef29af9bca6cbf481b00ec1a2535c111ee979ed152249d2c1ec02208"},
{file = "boto3_stubs-1.37.22.tar.gz", hash = "sha256:ad6c1471bd503da253420294ca5060a4a24d53cdc2672503a579d9b779d0e5ce"},
{file = "boto3_stubs-1.37.23-py3-none-any.whl", hash = "sha256:a00884a3df819bdc6b040c857e57a87b4f33df963ee88f8f406b13bf2cd983ca"},
{file = "boto3_stubs-1.37.23.tar.gz", hash = "sha256:011f06dadcd5ef3c627ec9808b9afa4e1837b0f009d82b8209f12a84ffbb3867"},
]
[package.dependencies]
@@ -579,7 +579,7 @@ bedrock-data-automation-runtime = ["mypy-boto3-bedrock-data-automation-runtime (
bedrock-runtime = ["mypy-boto3-bedrock-runtime (>=1.37.0,<1.38.0)"]
billing = ["mypy-boto3-billing (>=1.37.0,<1.38.0)"]
billingconductor = ["mypy-boto3-billingconductor (>=1.37.0,<1.38.0)"]
boto3 = ["boto3 (==1.37.22)"]
boto3 = ["boto3 (==1.37.23)"]
braket = ["mypy-boto3-braket (>=1.37.0,<1.38.0)"]
budgets = ["mypy-boto3-budgets (>=1.37.0,<1.38.0)"]
ce = ["mypy-boto3-ce (>=1.37.0,<1.38.0)"]
@@ -943,14 +943,14 @@ xray = ["mypy-boto3-xray (>=1.37.0,<1.38.0)"]
[[package]]
name = "botocore"
version = "1.37.22"
version = "1.37.23"
description = "Low-level, data-driven core of boto 3."
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "botocore-1.37.22-py3-none-any.whl", hash = "sha256:184db7c9314d13002bc827f511a5140574b5da1acda342d51e093dad6317de98"},
{file = "botocore-1.37.22.tar.gz", hash = "sha256:b3b26f1a90236bcd17d4092f8c85a256b44e9955a16b633319a2f5678d605e9f"},
{file = "botocore-1.37.23-py3-none-any.whl", hash = "sha256:ffbe1f5958adb1c50d72d3ad1018cb265fe349248c08782d334601c0814f0e38"},
{file = "botocore-1.37.23.tar.gz", hash = "sha256:3a249c950cef9ee9ed7b2278500ad83a4ad6456bc433a43abd1864d1b61b2acb"},
]
[package.dependencies]
@@ -2179,20 +2179,20 @@ typing = ["typing-extensions (>=4.12.2) ; python_version < \"3.11\""]
[[package]]
name = "flake8"
version = "7.1.2"
version = "7.2.0"
description = "the modular source code checker: pep8 pyflakes and co"
optional = false
python-versions = ">=3.8.1"
python-versions = ">=3.9"
groups = ["main", "runtime"]
files = [
{file = "flake8-7.1.2-py2.py3-none-any.whl", hash = "sha256:1cbc62e65536f65e6d754dfe6f1bada7f5cf392d6f5db3c2b85892466c3e7c1a"},
{file = "flake8-7.1.2.tar.gz", hash = "sha256:c586ffd0b41540951ae41af572e6790dbd49fc12b3aa2541685d253d9bd504bd"},
{file = "flake8-7.2.0-py2.py3-none-any.whl", hash = "sha256:93b92ba5bdb60754a6da14fa3b93a9361fd00a59632ada61fd7b130436c40343"},
{file = "flake8-7.2.0.tar.gz", hash = "sha256:fa558ae3f6f7dbf2b4f22663e5343b6b6023620461f8d4ff2019ef4b5ee70426"},
]
[package.dependencies]
mccabe = ">=0.7.0,<0.8.0"
pycodestyle = ">=2.12.0,<2.13.0"
pyflakes = ">=3.2.0,<3.3.0"
pycodestyle = ">=2.13.0,<2.14.0"
pyflakes = ">=3.3.0,<3.4.0"
[[package]]
name = "flask"
@@ -4393,14 +4393,14 @@ types-tqdm = "*"
[[package]]
name = "litellm"
version = "1.64.1"
version = "1.65.0"
description = "Library to easily interface with LLM API providers"
optional = false
python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
groups = ["main"]
files = [
{file = "litellm-1.64.1-py3-none-any.whl", hash = "sha256:bd7cb4977dee121551f0322d48b4e51e9a508fc2ac2273e7c5405ca69354e352"},
{file = "litellm-1.64.1.tar.gz", hash = "sha256:73bac891b1fbd77ada4d691e967657c53f48c207d9c3ba414ad0ffe3e7ec8f89"},
{file = "litellm-1.65.0-py3-none-any.whl", hash = "sha256:bbc211f3d03e1830ed7f4304b40f70fa1fa4a2f9109d006ede5f78e83a189aba"},
{file = "litellm-1.65.0.tar.gz", hash = "sha256:147a74d18601ccaaff3ca125eba914ab6e5b5854aff480dce5a52be5b9d52ff8"},
]
[package.dependencies]
@@ -4836,14 +4836,14 @@ files = [
[[package]]
name = "modal"
version = "0.73.131"
version = "0.73.136"
description = "Python client library for Modal"
optional = false
python-versions = ">=3.9"
groups = ["main", "evaluation"]
files = [
{file = "modal-0.73.131-py3-none-any.whl", hash = "sha256:cece493c196a5c932602fa84ef91c60737078c263ac8859acc8c7e13257a6215"},
{file = "modal-0.73.131.tar.gz", hash = "sha256:26809ad9c9bd66d912370454135599b093e7bc2f450d6257d82dd178d00dab62"},
{file = "modal-0.73.136-py3-none-any.whl", hash = "sha256:1f812712ea616cce949c06c5a4b45497d1157879775986de54db9ed2023b79e9"},
{file = "modal-0.73.136.tar.gz", hash = "sha256:e8a6d3961c11e6440b2ab9a7f344fb1beb9aae8b8511df871ce3b2399f194af0"},
]
[package.dependencies]
@@ -6210,14 +6210,14 @@ global = ["pybind11-global (==2.13.6)"]
[[package]]
name = "pycodestyle"
version = "2.12.1"
version = "2.13.0"
description = "Python style guide checker"
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
groups = ["main", "runtime"]
files = [
{file = "pycodestyle-2.12.1-py2.py3-none-any.whl", hash = "sha256:46f0fb92069a7c28ab7bb558f05bfc0110dac69a0cd23c61ea0040283a9d78b3"},
{file = "pycodestyle-2.12.1.tar.gz", hash = "sha256:6838eae08bbce4f6accd5d5572075c63626a15ee3e6f842df996bf62f6d73521"},
{file = "pycodestyle-2.13.0-py2.py3-none-any.whl", hash = "sha256:35863c5974a271c7a726ed228a14a4f6daf49df369d8c50cd9a6f58a5e143ba9"},
{file = "pycodestyle-2.13.0.tar.gz", hash = "sha256:c8415bf09abe81d9c7f872502a6eee881fbe85d8763dd5b9924bb0a01d67efae"},
]
[[package]]
@@ -6448,14 +6448,14 @@ dev = ["black", "flake8", "flake8-black", "isort", "jupyter-console", "mkdocs",
[[package]]
name = "pyflakes"
version = "3.2.0"
version = "3.3.2"
description = "passive checker of Python programs"
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
groups = ["main", "runtime"]
files = [
{file = "pyflakes-3.2.0-py2.py3-none-any.whl", hash = "sha256:84b5be138a2dfbb40689ca07e2152deb896a65c3a3e24c251c5c62489568074a"},
{file = "pyflakes-3.2.0.tar.gz", hash = "sha256:1c61603ff154621fb2a9172037d84dca3500def8c8b630657d1701f026f8af3f"},
{file = "pyflakes-3.3.2-py2.py3-none-any.whl", hash = "sha256:5039c8339cbb1944045f4ee5466908906180f13cc99cc9949348d10f82a5c32a"},
{file = "pyflakes-3.3.2.tar.gz", hash = "sha256:6dfd61d87b97fba5dcfaaf781171ac16be16453be6d816147989e7f6e6a9576b"},
]
[[package]]
@@ -8402,14 +8402,14 @@ test = ["pytest", "tornado (>=4.5)", "typeguard"]
[[package]]
name = "termcolor"
version = "2.5.0"
version = "3.0.0"
description = "ANSI color formatting for output in terminal"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "termcolor-2.5.0-py3-none-any.whl", hash = "sha256:37b17b5fc1e604945c2642c872a3764b5d547a48009871aea3edd3afa180afb8"},
{file = "termcolor-2.5.0.tar.gz", hash = "sha256:998d8d27da6d48442e8e1f016119076b690d962507531df4890fcd2db2ef8a6f"},
{file = "termcolor-3.0.0-py3-none-any.whl", hash = "sha256:fdfdc9f2bdb71c69fbbbaeb7ceae3afef0461076dd2ee265bf7b7c49ddb05ebb"},
{file = "termcolor-3.0.0.tar.gz", hash = "sha256:0cd855c8716383f152ad02bbb39841d6e4694538ff5d424088e56c8b81fde525"},
]
[package.extras]
+83
View File
@@ -0,0 +1,83 @@
"""Tests for the conversation summary generator."""
from unittest.mock import MagicMock, patch
import pytest
from openhands.core.config import LLMConfig
from openhands.utils.conversation_summary import generate_conversation_title
@pytest.mark.asyncio
async def test_generate_conversation_title_empty_message():
"""Test that an empty message returns None."""
result = await generate_conversation_title('', MagicMock())
assert result is None
result = await generate_conversation_title(' ', MagicMock())
assert result is None
@pytest.mark.asyncio
async def test_generate_conversation_title_success():
"""Test successful title generation."""
# Create a proper mock response
mock_response = MagicMock()
mock_response.choices = [MagicMock()]
mock_response.choices[0].message.content = 'Generated Title'
# Create a mock LLM instance with a synchronous completion method
mock_llm = MagicMock()
mock_llm.completion = MagicMock(return_value=mock_response)
# Patch the LLM class to return our mock
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
result = await generate_conversation_title(
'Can you help me with Python?', LLMConfig(model='test-model')
)
assert result == 'Generated Title'
# Verify the mock was called with the expected arguments
mock_llm.completion.assert_called_once()
@pytest.mark.asyncio
async def test_generate_conversation_title_long_title():
"""Test that long titles are truncated."""
# Create a proper mock response with a long title
mock_response = MagicMock()
mock_response.choices = [MagicMock()]
mock_response.choices[
0
].message.content = 'This is a very long title that should be truncated because it exceeds the maximum length'
# Create a mock LLM instance with a synchronous completion method
mock_llm = MagicMock()
mock_llm.completion = MagicMock(return_value=mock_response)
# Patch the LLM class to return our mock
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
result = await generate_conversation_title(
'Can you help me with Python?', LLMConfig(model='test-model'), max_length=30
)
# Verify the title is truncated correctly
assert len(result) <= 30
assert result.endswith('...')
@pytest.mark.asyncio
async def test_generate_conversation_title_exception():
"""Test that exceptions are handled gracefully."""
# Create a mock LLM instance with a synchronous completion method that raises an exception
mock_llm = MagicMock()
mock_llm.completion = MagicMock(side_effect=Exception('Test error'))
# Patch the LLM class to return our mock
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
result = await generate_conversation_title(
'Can you help me with Python?', LLMConfig(model='test-model')
)
# Verify that None is returned when an exception occurs
assert result is None