mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-04-29 03:00:45 -04:00
Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 155b806bff | |||
| 6ae2984580 | |||
| f12bf985ce | |||
| b6321488bc | |||
| 54236f9617 | |||
| 2c4496b129 | |||
| 4b177992f8 | |||
| fa61e862e0 |
@@ -10,13 +10,21 @@ This repository contains the code for OpenHands, an automated AI software engine
|
||||
To set up the entire repo, including frontend and backend, run `make build`.
|
||||
You don't need to do this unless the user asks you to, or if you're trying to run the entire application.
|
||||
|
||||
Before pushing any changes, you should ensure that any lint errors or simple test errors have been fixed.
|
||||
IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
|
||||
|
||||
Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
|
||||
|
||||
* If you've made changes to the backend, you should run `pre-commit run --all-files --config ./dev_config/python/.pre-commit-config.yaml`
|
||||
* If you've made changes to the frontend, you should run `cd frontend && npm run lint:fix && npm run build ; cd ..`
|
||||
|
||||
The pre-commit hooks MUST pass successfully before pushing any changes to the repository. This is a mandatory requirement to maintain code quality and consistency.
|
||||
|
||||
If either command fails, it may have automatically fixed some issues. You should fix any issues that weren't automatically fixed,
|
||||
then re-run the command to ensure it passes.
|
||||
then re-run the command to ensure it passes. Common issues include:
|
||||
- Mypy type errors
|
||||
- Ruff formatting issues
|
||||
- Trailing whitespace
|
||||
- Missing newlines at end of files
|
||||
|
||||
## Repository Structure
|
||||
Backend:
|
||||
|
||||
@@ -48,7 +48,8 @@ RUN mkdir -p $FILE_STORE_PATH
|
||||
RUN mkdir -p $WORKSPACE_BASE
|
||||
|
||||
RUN apt-get update -y \
|
||||
&& apt-get install -y curl ssh sudo
|
||||
&& apt-get install -y curl ssh sudo \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Default is 1000, but OSX is often 501
|
||||
RUN sed -i 's/^UID_MIN.*/UID_MIN 499/' /etc/login.defs
|
||||
|
||||
@@ -95,7 +95,7 @@ OpenHands requires an API key to access most language models. Here's how to get
|
||||
|
||||
1. [Create an Anthropic account](https://console.anthropic.com/)
|
||||
2. [Generate an API key](https://console.anthropic.com/settings/keys)
|
||||
3. [Set up billing(https://console.anthropic.com/settings/billing)
|
||||
3. [Set up billing](https://console.anthropic.com/settings/billing)
|
||||
|
||||
Consider setting usage limits to control costs.
|
||||
|
||||
|
||||
@@ -2,7 +2,9 @@
|
||||
|
||||
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
|
||||
|
||||
**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).
|
||||
**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
|
||||
|
||||
**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).**
|
||||
|
||||
**UPDATE (7/1/2024): We now support the official SWE-Bench dockerized evaluation as announced [here](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240627_docker/README.md).**
|
||||
|
||||
@@ -62,7 +64,7 @@ in order to use `eval_limit`, you must also set `agent`.
|
||||
default, it is set to 60.
|
||||
- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
|
||||
default, it is set to 1.
|
||||
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, or `princeton-nlp/SWE-bench_Verified`, specifies which dataset to evaluate on.
|
||||
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, `princeton-nlp/SWE-bench_Verified`, or `princeton-nlp/SWE-bench_Multimodal`, specifies which dataset to evaluate on.
|
||||
- `dataset_split`, split for the huggingface dataset. e.g., `test`, `dev`. Default to `test`.
|
||||
|
||||
> [!CAUTION]
|
||||
@@ -82,6 +84,13 @@ then your command would be:
|
||||
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
|
||||
```
|
||||
|
||||
For multimodal evaluation, you can use:
|
||||
|
||||
```bash
|
||||
# Example for running multimodal SWE-Bench evaluation
|
||||
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_vision HEAD CodeActAgent 10 100 1 princeton-nlp/SWE-bench_Multimodal test
|
||||
```
|
||||
|
||||
### Running in parallel with RemoteRuntime
|
||||
|
||||
OpenHands Remote Runtime is currently in beta (read [here](https://runtime.all-hands.dev/) for more details), it allows you to run rollout in parallel in the cloud, so you don't need a powerful machine to run evaluation.
|
||||
|
||||
@@ -58,7 +58,7 @@ def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
|
||||
return f'{instance.repo}__{instance.version}'.replace('/', '__')
|
||||
|
||||
|
||||
def get_instruction(instance: pd.Series, metadata: EvalMetadata):
|
||||
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
|
||||
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
|
||||
instruction = f"""
|
||||
<uploaded_files>
|
||||
@@ -114,12 +114,20 @@ Be thorough in your exploration, testing, and reasoning. It's fine if your think
|
||||
"""
|
||||
|
||||
if RUN_WITH_BROWSING:
|
||||
instruction += """
|
||||
<IMPORTANT!>
|
||||
You SHOULD NEVER attempt to browse the web.
|
||||
</IMPORTANT!>
|
||||
"""
|
||||
return instruction
|
||||
instruction += (
|
||||
'<IMPORTANT!>\n'
|
||||
'You SHOULD NEVER attempt to browse the web. '
|
||||
'</IMPORTANT!>\n'
|
||||
)
|
||||
|
||||
if 'image_assets' in instance:
|
||||
assets = instance['image_assets']
|
||||
assert (
|
||||
'problem_statement' in assets
|
||||
), 'problem_statement is required in image_assets'
|
||||
image_urls = assets['problem_statement']
|
||||
return MessageAction(content=instruction, image_urls=image_urls)
|
||||
return MessageAction(content=instruction)
|
||||
|
||||
|
||||
# TODO: migrate all swe-bench docker to ghcr.io/openhands
|
||||
@@ -129,14 +137,18 @@ DEFAULT_DOCKER_IMAGE_PREFIX = os.environ.get(
|
||||
logger.info(f'Default docker image prefix: {DEFAULT_DOCKER_IMAGE_PREFIX}')
|
||||
|
||||
|
||||
def get_instance_docker_image(instance_id: str, official_image: bool = False) -> str:
|
||||
if official_image:
|
||||
def get_instance_docker_image(
|
||||
instance_id: str,
|
||||
swebench_official_image: bool = False,
|
||||
) -> str:
|
||||
if swebench_official_image:
|
||||
# Official SWE-Bench image
|
||||
# swebench/sweb.eval.x86_64.django_1776_django-11333:v1
|
||||
docker_image_prefix = 'docker.io/swebench/'
|
||||
repo, name = instance_id.split('__')
|
||||
image_name = f'sweb.eval.x86_64.{repo}_1776_{name}:latest'
|
||||
logger.warning(f'Using official SWE-Bench image: {image_name}')
|
||||
image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'
|
||||
logger.info(f'Using official SWE-Bench image: {image_name}')
|
||||
return image_name
|
||||
else:
|
||||
# OpenHands version of the image
|
||||
docker_image_prefix = DEFAULT_DOCKER_IMAGE_PREFIX
|
||||
@@ -144,7 +156,7 @@ def get_instance_docker_image(instance_id: str, official_image: bool = False) ->
|
||||
image_name = image_name.replace(
|
||||
'__', '_s_'
|
||||
) # to comply with docker image naming convention
|
||||
return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
|
||||
return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
|
||||
|
||||
|
||||
def get_config(
|
||||
@@ -152,12 +164,13 @@ def get_config(
|
||||
metadata: EvalMetadata,
|
||||
) -> AppConfig:
|
||||
# We use a different instance image for the each instance of swe-bench eval
|
||||
use_official_image = bool(
|
||||
use_swebench_official_image = bool(
|
||||
('verified' in metadata.dataset.lower() or 'lite' in metadata.dataset.lower())
|
||||
and 'swe-gym' not in metadata.dataset.lower()
|
||||
)
|
||||
base_container_image = get_instance_docker_image(
|
||||
instance['instance_id'], use_official_image
|
||||
instance['instance_id'],
|
||||
swebench_official_image=use_swebench_official_image,
|
||||
)
|
||||
logger.info(
|
||||
f'Using instance container image: {base_container_image}. '
|
||||
@@ -493,13 +506,13 @@ def process_instance(
|
||||
try:
|
||||
initialize_runtime(runtime, instance)
|
||||
|
||||
instruction = get_instruction(instance, metadata)
|
||||
message_action = get_instruction(instance, metadata)
|
||||
|
||||
# Here's how you can run the agent (similar to the `main` function) and get the final task state
|
||||
state: State | None = asyncio.run(
|
||||
run_controller(
|
||||
config=config,
|
||||
initial_user_action=MessageAction(content=instruction),
|
||||
initial_user_action=message_action,
|
||||
runtime=runtime,
|
||||
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
|
||||
metadata.agent_class
|
||||
@@ -539,6 +552,11 @@ def process_instance(
|
||||
metrics = get_metrics(state)
|
||||
|
||||
# Save the output
|
||||
instruction = message_action.content
|
||||
if message_action.image_urls:
|
||||
instruction += (
|
||||
'\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
|
||||
)
|
||||
output = EvalOutput(
|
||||
instance_id=instance.instance_id,
|
||||
instruction=instruction,
|
||||
|
||||
@@ -22,11 +22,11 @@ vi.mock("#/context/auth-context", () => ({
|
||||
describe("ActionSuggestions", () => {
|
||||
// Setup mocks for each test
|
||||
vi.clearAllMocks();
|
||||
|
||||
|
||||
(useAuth as any).mockReturnValue({
|
||||
githubTokenIsSet: true,
|
||||
});
|
||||
|
||||
|
||||
(useSelector as any).mockReturnValue({
|
||||
selectedRepository: "test-repo",
|
||||
});
|
||||
@@ -66,16 +66,16 @@ describe("ActionSuggestions", () => {
|
||||
it("should have different prompts for 'Push to Branch' and 'Push & Create PR' buttons", () => {
|
||||
// This test verifies that the prompts are different in the component
|
||||
const component = render(<ActionSuggestions onSuggestionsClick={() => {}} />);
|
||||
|
||||
|
||||
// Get the component instance to access the internal values
|
||||
const pushBranchPrompt = "Please push the changes to a remote branch on GitHub, but do NOT create a pull request. Please use the exact SAME branch name as the one you are currently on.";
|
||||
const createPRPrompt = "Please push the changes to GitHub and open a pull request. Please create a meaningful branch name that describes the changes.";
|
||||
|
||||
|
||||
// Verify the prompts are different
|
||||
expect(pushBranchPrompt).not.toEqual(createPRPrompt);
|
||||
|
||||
|
||||
// Verify the PR prompt mentions creating a meaningful branch name
|
||||
expect(createPRPrompt).toContain("meaningful branch name");
|
||||
expect(createPRPrompt).not.toContain("SAME branch name");
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
@@ -47,7 +47,7 @@ class SandboxConfig(BaseModel):
|
||||
rm_all_containers: bool = Field(default=False)
|
||||
api_key: str | None = Field(default=None)
|
||||
base_container_image: str = Field(
|
||||
default='nikolaik/python-nodejs:python3.12-nodejs22'
|
||||
default='nikolaik/python-nodejs:python3.13-nodejs23-bullseye'
|
||||
)
|
||||
runtime_container_image: str | None = Field(default=None)
|
||||
user_id: int = Field(default=os.getuid() if hasattr(os, 'getuid') else 1000)
|
||||
|
||||
@@ -113,7 +113,7 @@ class RecallObservation(Observation):
|
||||
f'repo_instructions={self.repo_instructions[:20]}...',
|
||||
f'runtime_hosts={self.runtime_hosts}',
|
||||
f'additional_agent_instructions={self.additional_agent_instructions[:20]}...',
|
||||
f'date={self.date}'
|
||||
f'date={self.date}',
|
||||
]
|
||||
)
|
||||
else:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
{
|
||||
"workbench.colorTheme": "Default Dark Modern",
|
||||
"workbench.startupEditor": "none"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -23,7 +23,7 @@ from openhands.server.shared import (
|
||||
sio,
|
||||
)
|
||||
from openhands.storage.conversation.conversation_validator import (
|
||||
ConversationValidatorImpl,
|
||||
create_conversation_validator,
|
||||
)
|
||||
|
||||
|
||||
@@ -38,7 +38,7 @@ async def connect(connection_id: str, environ):
|
||||
raise ConnectionRefusedError('No conversation_id in query params')
|
||||
|
||||
cookies_str = environ.get('HTTP_COOKIE', '')
|
||||
conversation_validator = ConversationValidatorImpl()
|
||||
conversation_validator = create_conversation_validator()
|
||||
user_id, github_user_id = await conversation_validator.validate(
|
||||
conversation_id, cookies_str
|
||||
)
|
||||
|
||||
@@ -243,6 +243,24 @@ async def get_conversation(
|
||||
try:
|
||||
metadata = await conversation_store.get_metadata(conversation_id)
|
||||
is_running = await conversation_manager.is_agent_loop_running(conversation_id)
|
||||
|
||||
# Check if we need to update the title
|
||||
if is_running and metadata:
|
||||
# Check if the title is a default title (contains the conversation ID)
|
||||
if metadata.title and conversation_id[:5] in metadata.title:
|
||||
# Generate a new title
|
||||
new_title = await auto_generate_title(
|
||||
conversation_id, get_user_id(request)
|
||||
)
|
||||
|
||||
if new_title:
|
||||
# Update the metadata
|
||||
metadata.title = new_title
|
||||
await conversation_store.save_metadata(metadata)
|
||||
|
||||
# Refresh metadata after update
|
||||
metadata = await conversation_store.get_metadata(conversation_id)
|
||||
|
||||
conversation_info = await _get_conversation_info(metadata, is_running)
|
||||
return conversation_info
|
||||
except FileNotFoundError:
|
||||
@@ -265,6 +283,7 @@ def get_default_conversation_title(conversation_id: str) -> str:
|
||||
async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
|
||||
"""
|
||||
Auto-generate a title for a conversation based on the first user message.
|
||||
Uses LLM-based title generation if available, otherwise falls back to a simple truncation.
|
||||
|
||||
Args:
|
||||
conversation_id: The ID of the conversation
|
||||
@@ -292,11 +311,39 @@ async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
|
||||
break
|
||||
|
||||
if first_user_message:
|
||||
# Try LLM-based title generation first
|
||||
from openhands.core.config.llm_config import LLMConfig
|
||||
from openhands.utils.conversation_summary import generate_conversation_title
|
||||
|
||||
# Get LLM config from user settings
|
||||
try:
|
||||
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
|
||||
settings = await settings_store.load()
|
||||
|
||||
if settings and settings.llm_model:
|
||||
# Create LLM config from settings
|
||||
llm_config = LLMConfig(
|
||||
model=settings.llm_model,
|
||||
api_key=settings.llm_api_key,
|
||||
base_url=settings.llm_base_url,
|
||||
)
|
||||
|
||||
# Try to generate title using LLM
|
||||
llm_title = await generate_conversation_title(
|
||||
first_user_message, llm_config
|
||||
)
|
||||
if llm_title:
|
||||
logger.info(f'Generated title using LLM: {llm_title}')
|
||||
return llm_title
|
||||
except Exception as e:
|
||||
logger.error(f'Error using LLM for title generation: {e}')
|
||||
|
||||
# Fall back to simple truncation if LLM generation fails or is unavailable
|
||||
first_user_message = first_user_message.strip()
|
||||
title = first_user_message[:30]
|
||||
if len(first_user_message) > 30:
|
||||
title += '...'
|
||||
logger.info(f'Generated title: {title}')
|
||||
logger.info(f'Generated title using truncation: {title}')
|
||||
return title
|
||||
except Exception as e:
|
||||
logger.error(f'Error generating title: {str(e)}')
|
||||
@@ -315,10 +362,12 @@ async def update_conversation(
|
||||
if not metadata:
|
||||
return False
|
||||
|
||||
# If title is empty or unspecified, auto-generate it from the first user message
|
||||
# If title is empty or unspecified, auto-generate it
|
||||
if not title or title.isspace():
|
||||
title = await auto_generate_title(conversation_id, user_id)
|
||||
if not title:
|
||||
|
||||
# If we still don't have a title, use the default
|
||||
if not title or title.isspace():
|
||||
title = get_default_conversation_title(conversation_id)
|
||||
|
||||
metadata.title = title
|
||||
@@ -361,9 +410,9 @@ async def _get_conversation_info(
|
||||
last_updated_at=conversation.last_updated_at,
|
||||
created_at=conversation.created_at,
|
||||
selected_repository=conversation.selected_repository,
|
||||
status=ConversationStatus.RUNNING
|
||||
if is_running
|
||||
else ConversationStatus.STOPPED,
|
||||
status=(
|
||||
ConversationStatus.RUNNING if is_running else ConversationStatus.STOPPED
|
||||
),
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
|
||||
@@ -10,8 +10,12 @@ class ConversationValidator:
|
||||
return None, None
|
||||
|
||||
|
||||
conversation_validator_cls = os.environ.get(
|
||||
'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
|
||||
'openhands.storage.conversation.conversation_validator.ConversationValidator',
|
||||
)
|
||||
ConversationValidatorImpl = get_impl(ConversationValidator, conversation_validator_cls)
|
||||
def create_conversation_validator():
|
||||
conversation_validator_cls = os.environ.get(
|
||||
'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
|
||||
'openhands.storage.conversation.conversation_validator.ConversationValidator',
|
||||
)
|
||||
ConversationValidatorImpl = get_impl(
|
||||
ConversationValidator, conversation_validator_cls
|
||||
)
|
||||
return ConversationValidatorImpl()
|
||||
|
||||
@@ -0,0 +1,57 @@
|
||||
"""Utility functions for generating conversation summaries."""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from openhands.core.config import LLMConfig
|
||||
from openhands.core.logger import openhands_logger as logger
|
||||
from openhands.llm.llm import LLM
|
||||
|
||||
|
||||
async def generate_conversation_title(
|
||||
message: str, llm_config: LLMConfig, max_length: int = 50
|
||||
) -> Optional[str]:
|
||||
"""Generate a concise title for a conversation based on the first user message.
|
||||
|
||||
Args:
|
||||
message: The first user message in the conversation.
|
||||
llm_config: The LLM configuration to use for generating the title.
|
||||
max_length: The maximum length of the generated title.
|
||||
|
||||
Returns:
|
||||
A concise title for the conversation, or None if generation fails.
|
||||
"""
|
||||
if not message or message.strip() == '':
|
||||
return None
|
||||
|
||||
# Truncate very long messages to avoid excessive token usage
|
||||
if len(message) > 1000:
|
||||
truncated_message = message[:1000] + '...(truncated)'
|
||||
else:
|
||||
truncated_message = message
|
||||
|
||||
try:
|
||||
llm = LLM(llm_config)
|
||||
|
||||
# Create a simple prompt for the LLM to generate a title
|
||||
messages = [
|
||||
{
|
||||
'role': 'system',
|
||||
'content': 'You are a helpful assistant that generates concise, descriptive titles for conversations with OpenHands. OpenHands is a helpful AI agent that can interact with a computer to solve tasks using bash terminal, file editor, and browser. Given a user message (which may be truncated), generate a concise, descriptive title for the conversation. Return only the title, with no additional text, quotes, or explanations.',
|
||||
},
|
||||
{
|
||||
'role': 'user',
|
||||
'content': f'Generate a title (maximum {max_length} characters) for a conversation that starts with this message:\n\n{truncated_message}',
|
||||
},
|
||||
]
|
||||
|
||||
response = llm.completion(messages=messages)
|
||||
title = response.choices[0].message.content.strip()
|
||||
|
||||
# Ensure the title isn't too long
|
||||
if len(title) > max_length:
|
||||
title = title[: max_length - 3] + '...'
|
||||
|
||||
return title
|
||||
except Exception as e:
|
||||
logger.error(f'Error generating conversation title: {e}')
|
||||
return None
|
||||
@@ -1,4 +1,5 @@
|
||||
import importlib
|
||||
from functools import lru_cache
|
||||
from typing import Type, TypeVar
|
||||
|
||||
T = TypeVar('T')
|
||||
@@ -13,6 +14,7 @@ def import_from(qual_name: str):
|
||||
return result
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_impl(cls: Type[T], impl_name: str | None) -> Type[T]:
|
||||
"""Import a named implementation of the specified class"""
|
||||
if impl_name is None:
|
||||
|
||||
Generated
+35
-35
@@ -496,18 +496,18 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "boto3"
|
||||
version = "1.37.22"
|
||||
version = "1.37.23"
|
||||
description = "The AWS SDK for Python"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "boto3-1.37.22-py3-none-any.whl", hash = "sha256:a14324d5fa5f4fea00c0e3c69754cbd28100f7fe194693eeecf2dc07446cf4ef"},
|
||||
{file = "boto3-1.37.22.tar.gz", hash = "sha256:78a0ec0aafbf6044104c98ad80b69e6d1c83d8233fda2c2d241029e6c705c510"},
|
||||
{file = "boto3-1.37.23-py3-none-any.whl", hash = "sha256:fc462b9fd738bd8a1c121d94d237c6b6a05a2c1cc709d16f5223acb752f7310b"},
|
||||
{file = "boto3-1.37.23.tar.gz", hash = "sha256:82f4599a34f5eb66e916b9ac8547394f6e5899c19580e74b60237db04cf66d1e"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
botocore = ">=1.37.22,<1.38.0"
|
||||
botocore = ">=1.37.23,<1.38.0"
|
||||
jmespath = ">=0.7.1,<2.0.0"
|
||||
s3transfer = ">=0.11.0,<0.12.0"
|
||||
|
||||
@@ -516,14 +516,14 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
|
||||
|
||||
[[package]]
|
||||
name = "boto3-stubs"
|
||||
version = "1.37.22"
|
||||
description = "Type annotations for boto3 1.37.22 generated with mypy-boto3-builder 8.10.1"
|
||||
version = "1.37.23"
|
||||
description = "Type annotations for boto3 1.37.23 generated with mypy-boto3-builder 8.10.1"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
groups = ["evaluation"]
|
||||
files = [
|
||||
{file = "boto3_stubs-1.37.22-py3-none-any.whl", hash = "sha256:7d41213bef29af9bca6cbf481b00ec1a2535c111ee979ed152249d2c1ec02208"},
|
||||
{file = "boto3_stubs-1.37.22.tar.gz", hash = "sha256:ad6c1471bd503da253420294ca5060a4a24d53cdc2672503a579d9b779d0e5ce"},
|
||||
{file = "boto3_stubs-1.37.23-py3-none-any.whl", hash = "sha256:a00884a3df819bdc6b040c857e57a87b4f33df963ee88f8f406b13bf2cd983ca"},
|
||||
{file = "boto3_stubs-1.37.23.tar.gz", hash = "sha256:011f06dadcd5ef3c627ec9808b9afa4e1837b0f009d82b8209f12a84ffbb3867"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@@ -579,7 +579,7 @@ bedrock-data-automation-runtime = ["mypy-boto3-bedrock-data-automation-runtime (
|
||||
bedrock-runtime = ["mypy-boto3-bedrock-runtime (>=1.37.0,<1.38.0)"]
|
||||
billing = ["mypy-boto3-billing (>=1.37.0,<1.38.0)"]
|
||||
billingconductor = ["mypy-boto3-billingconductor (>=1.37.0,<1.38.0)"]
|
||||
boto3 = ["boto3 (==1.37.22)"]
|
||||
boto3 = ["boto3 (==1.37.23)"]
|
||||
braket = ["mypy-boto3-braket (>=1.37.0,<1.38.0)"]
|
||||
budgets = ["mypy-boto3-budgets (>=1.37.0,<1.38.0)"]
|
||||
ce = ["mypy-boto3-ce (>=1.37.0,<1.38.0)"]
|
||||
@@ -943,14 +943,14 @@ xray = ["mypy-boto3-xray (>=1.37.0,<1.38.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "botocore"
|
||||
version = "1.37.22"
|
||||
version = "1.37.23"
|
||||
description = "Low-level, data-driven core of boto 3."
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "botocore-1.37.22-py3-none-any.whl", hash = "sha256:184db7c9314d13002bc827f511a5140574b5da1acda342d51e093dad6317de98"},
|
||||
{file = "botocore-1.37.22.tar.gz", hash = "sha256:b3b26f1a90236bcd17d4092f8c85a256b44e9955a16b633319a2f5678d605e9f"},
|
||||
{file = "botocore-1.37.23-py3-none-any.whl", hash = "sha256:ffbe1f5958adb1c50d72d3ad1018cb265fe349248c08782d334601c0814f0e38"},
|
||||
{file = "botocore-1.37.23.tar.gz", hash = "sha256:3a249c950cef9ee9ed7b2278500ad83a4ad6456bc433a43abd1864d1b61b2acb"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@@ -2179,20 +2179,20 @@ typing = ["typing-extensions (>=4.12.2) ; python_version < \"3.11\""]
|
||||
|
||||
[[package]]
|
||||
name = "flake8"
|
||||
version = "7.1.2"
|
||||
version = "7.2.0"
|
||||
description = "the modular source code checker: pep8 pyflakes and co"
|
||||
optional = false
|
||||
python-versions = ">=3.8.1"
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main", "runtime"]
|
||||
files = [
|
||||
{file = "flake8-7.1.2-py2.py3-none-any.whl", hash = "sha256:1cbc62e65536f65e6d754dfe6f1bada7f5cf392d6f5db3c2b85892466c3e7c1a"},
|
||||
{file = "flake8-7.1.2.tar.gz", hash = "sha256:c586ffd0b41540951ae41af572e6790dbd49fc12b3aa2541685d253d9bd504bd"},
|
||||
{file = "flake8-7.2.0-py2.py3-none-any.whl", hash = "sha256:93b92ba5bdb60754a6da14fa3b93a9361fd00a59632ada61fd7b130436c40343"},
|
||||
{file = "flake8-7.2.0.tar.gz", hash = "sha256:fa558ae3f6f7dbf2b4f22663e5343b6b6023620461f8d4ff2019ef4b5ee70426"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
mccabe = ">=0.7.0,<0.8.0"
|
||||
pycodestyle = ">=2.12.0,<2.13.0"
|
||||
pyflakes = ">=3.2.0,<3.3.0"
|
||||
pycodestyle = ">=2.13.0,<2.14.0"
|
||||
pyflakes = ">=3.3.0,<3.4.0"
|
||||
|
||||
[[package]]
|
||||
name = "flask"
|
||||
@@ -4393,14 +4393,14 @@ types-tqdm = "*"
|
||||
|
||||
[[package]]
|
||||
name = "litellm"
|
||||
version = "1.64.1"
|
||||
version = "1.65.0"
|
||||
description = "Library to easily interface with LLM API providers"
|
||||
optional = false
|
||||
python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "litellm-1.64.1-py3-none-any.whl", hash = "sha256:bd7cb4977dee121551f0322d48b4e51e9a508fc2ac2273e7c5405ca69354e352"},
|
||||
{file = "litellm-1.64.1.tar.gz", hash = "sha256:73bac891b1fbd77ada4d691e967657c53f48c207d9c3ba414ad0ffe3e7ec8f89"},
|
||||
{file = "litellm-1.65.0-py3-none-any.whl", hash = "sha256:bbc211f3d03e1830ed7f4304b40f70fa1fa4a2f9109d006ede5f78e83a189aba"},
|
||||
{file = "litellm-1.65.0.tar.gz", hash = "sha256:147a74d18601ccaaff3ca125eba914ab6e5b5854aff480dce5a52be5b9d52ff8"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@@ -4836,14 +4836,14 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "modal"
|
||||
version = "0.73.131"
|
||||
version = "0.73.136"
|
||||
description = "Python client library for Modal"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main", "evaluation"]
|
||||
files = [
|
||||
{file = "modal-0.73.131-py3-none-any.whl", hash = "sha256:cece493c196a5c932602fa84ef91c60737078c263ac8859acc8c7e13257a6215"},
|
||||
{file = "modal-0.73.131.tar.gz", hash = "sha256:26809ad9c9bd66d912370454135599b093e7bc2f450d6257d82dd178d00dab62"},
|
||||
{file = "modal-0.73.136-py3-none-any.whl", hash = "sha256:1f812712ea616cce949c06c5a4b45497d1157879775986de54db9ed2023b79e9"},
|
||||
{file = "modal-0.73.136.tar.gz", hash = "sha256:e8a6d3961c11e6440b2ab9a7f344fb1beb9aae8b8511df871ce3b2399f194af0"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@@ -6210,14 +6210,14 @@ global = ["pybind11-global (==2.13.6)"]
|
||||
|
||||
[[package]]
|
||||
name = "pycodestyle"
|
||||
version = "2.12.1"
|
||||
version = "2.13.0"
|
||||
description = "Python style guide checker"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main", "runtime"]
|
||||
files = [
|
||||
{file = "pycodestyle-2.12.1-py2.py3-none-any.whl", hash = "sha256:46f0fb92069a7c28ab7bb558f05bfc0110dac69a0cd23c61ea0040283a9d78b3"},
|
||||
{file = "pycodestyle-2.12.1.tar.gz", hash = "sha256:6838eae08bbce4f6accd5d5572075c63626a15ee3e6f842df996bf62f6d73521"},
|
||||
{file = "pycodestyle-2.13.0-py2.py3-none-any.whl", hash = "sha256:35863c5974a271c7a726ed228a14a4f6daf49df369d8c50cd9a6f58a5e143ba9"},
|
||||
{file = "pycodestyle-2.13.0.tar.gz", hash = "sha256:c8415bf09abe81d9c7f872502a6eee881fbe85d8763dd5b9924bb0a01d67efae"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -6448,14 +6448,14 @@ dev = ["black", "flake8", "flake8-black", "isort", "jupyter-console", "mkdocs",
|
||||
|
||||
[[package]]
|
||||
name = "pyflakes"
|
||||
version = "3.2.0"
|
||||
version = "3.3.2"
|
||||
description = "passive checker of Python programs"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main", "runtime"]
|
||||
files = [
|
||||
{file = "pyflakes-3.2.0-py2.py3-none-any.whl", hash = "sha256:84b5be138a2dfbb40689ca07e2152deb896a65c3a3e24c251c5c62489568074a"},
|
||||
{file = "pyflakes-3.2.0.tar.gz", hash = "sha256:1c61603ff154621fb2a9172037d84dca3500def8c8b630657d1701f026f8af3f"},
|
||||
{file = "pyflakes-3.3.2-py2.py3-none-any.whl", hash = "sha256:5039c8339cbb1944045f4ee5466908906180f13cc99cc9949348d10f82a5c32a"},
|
||||
{file = "pyflakes-3.3.2.tar.gz", hash = "sha256:6dfd61d87b97fba5dcfaaf781171ac16be16453be6d816147989e7f6e6a9576b"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -8402,14 +8402,14 @@ test = ["pytest", "tornado (>=4.5)", "typeguard"]
|
||||
|
||||
[[package]]
|
||||
name = "termcolor"
|
||||
version = "2.5.0"
|
||||
version = "3.0.0"
|
||||
description = "ANSI color formatting for output in terminal"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
groups = ["main"]
|
||||
files = [
|
||||
{file = "termcolor-2.5.0-py3-none-any.whl", hash = "sha256:37b17b5fc1e604945c2642c872a3764b5d547a48009871aea3edd3afa180afb8"},
|
||||
{file = "termcolor-2.5.0.tar.gz", hash = "sha256:998d8d27da6d48442e8e1f016119076b690d962507531df4890fcd2db2ef8a6f"},
|
||||
{file = "termcolor-3.0.0-py3-none-any.whl", hash = "sha256:fdfdc9f2bdb71c69fbbbaeb7ceae3afef0461076dd2ee265bf7b7c49ddb05ebb"},
|
||||
{file = "termcolor-3.0.0.tar.gz", hash = "sha256:0cd855c8716383f152ad02bbb39841d6e4694538ff5d424088e56c8b81fde525"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
"""Tests for the conversation summary generator."""
|
||||
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from openhands.core.config import LLMConfig
|
||||
from openhands.utils.conversation_summary import generate_conversation_title
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_generate_conversation_title_empty_message():
|
||||
"""Test that an empty message returns None."""
|
||||
result = await generate_conversation_title('', MagicMock())
|
||||
assert result is None
|
||||
|
||||
result = await generate_conversation_title(' ', MagicMock())
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_generate_conversation_title_success():
|
||||
"""Test successful title generation."""
|
||||
# Create a proper mock response
|
||||
mock_response = MagicMock()
|
||||
mock_response.choices = [MagicMock()]
|
||||
mock_response.choices[0].message.content = 'Generated Title'
|
||||
|
||||
# Create a mock LLM instance with a synchronous completion method
|
||||
mock_llm = MagicMock()
|
||||
mock_llm.completion = MagicMock(return_value=mock_response)
|
||||
|
||||
# Patch the LLM class to return our mock
|
||||
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
|
||||
result = await generate_conversation_title(
|
||||
'Can you help me with Python?', LLMConfig(model='test-model')
|
||||
)
|
||||
|
||||
assert result == 'Generated Title'
|
||||
# Verify the mock was called with the expected arguments
|
||||
mock_llm.completion.assert_called_once()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_generate_conversation_title_long_title():
|
||||
"""Test that long titles are truncated."""
|
||||
# Create a proper mock response with a long title
|
||||
mock_response = MagicMock()
|
||||
mock_response.choices = [MagicMock()]
|
||||
mock_response.choices[
|
||||
0
|
||||
].message.content = 'This is a very long title that should be truncated because it exceeds the maximum length'
|
||||
|
||||
# Create a mock LLM instance with a synchronous completion method
|
||||
mock_llm = MagicMock()
|
||||
mock_llm.completion = MagicMock(return_value=mock_response)
|
||||
|
||||
# Patch the LLM class to return our mock
|
||||
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
|
||||
result = await generate_conversation_title(
|
||||
'Can you help me with Python?', LLMConfig(model='test-model'), max_length=30
|
||||
)
|
||||
|
||||
# Verify the title is truncated correctly
|
||||
assert len(result) <= 30
|
||||
assert result.endswith('...')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_generate_conversation_title_exception():
|
||||
"""Test that exceptions are handled gracefully."""
|
||||
# Create a mock LLM instance with a synchronous completion method that raises an exception
|
||||
mock_llm = MagicMock()
|
||||
mock_llm.completion = MagicMock(side_effect=Exception('Test error'))
|
||||
|
||||
# Patch the LLM class to return our mock
|
||||
with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
|
||||
result = await generate_conversation_title(
|
||||
'Can you help me with Python?', LLMConfig(model='test-model')
|
||||
)
|
||||
|
||||
# Verify that None is returned when an exception occurs
|
||||
assert result is None
|
||||
Reference in New Issue
Block a user