update nikolaik

Fix for circular import on ConversationValidator (#7583 )
chore(deps): bump the version-all group with 6 updates (#7600 )
2026-04-29 03:00:45 -04:00 · 2025-03-31 13:24:09 -04:00 · 2025-03-31 11:09:10 -06:00 · 2025-03-31 15:44:35 +00:00 · 2025-03-31 21:15:45 +08:00 · 2025-03-31 07:42:44 -04:00
16 changed files with 310 additions and 79 deletions
@@ -10,13 +10,21 @@ This repository contains the code for OpenHands, an automated AI software engine
 To set up the entire repo, including frontend and backend, run `make build`.
 You don't need to do this unless the user asks you to, or if you're trying to run the entire application.

-Before pushing any changes, you should ensure that any lint errors or simple test errors have been fixed.
+IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
+
+Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.

 * If you've made changes to the backend, you should run `pre-commit run --all-files --config ./dev_config/python/.pre-commit-config.yaml`
 * If you've made changes to the frontend, you should run `cd frontend && npm run lint:fix && npm run build ; cd ..`

+The pre-commit hooks MUST pass successfully before pushing any changes to the repository. This is a mandatory requirement to maintain code quality and consistency.
+
 If either command fails, it may have automatically fixed some issues. You should fix any issues that weren't automatically fixed,
-then re-run the command to ensure it passes.
+then re-run the command to ensure it passes. Common issues include:
+- Mypy type errors
+- Ruff formatting issues
+- Trailing whitespace
+- Missing newlines at end of files

 ## Repository Structure
 Backend:
@@ -48,7 +48,8 @@ RUN mkdir -p $FILE_STORE_PATH
 RUN mkdir -p $WORKSPACE_BASE

 RUN apt-get update -y \
-    && apt-get install -y curl ssh sudo
+    && apt-get install -y curl ssh sudo \
+    && rm -rf /var/lib/apt/lists/*

 # Default is 1000, but OSX is often 501
 RUN sed -i 's/^UID_MIN.*/UID_MIN 499/' /etc/login.defs
@@ -95,7 +95,7 @@ OpenHands requires an API key to access most language models. Here's how to get

 1. [Create an Anthropic account](https://console.anthropic.com/)
 2. [Generate an API key](https://console.anthropic.com/settings/keys)
-3. [Set up billing(https://console.anthropic.com/settings/billing)
+3. [Set up billing](https://console.anthropic.com/settings/billing)

 Consider setting usage limits to control costs.

@@ -2,7 +2,9 @@

 This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).

-**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).
+**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
+
+**UPDATE (2/18/2025): We now support running SWE-Gym using the same evaluation harness here. For more details, checkout [this README](./SWE-Gym.md).**

 **UPDATE (7/1/2024): We now support the official SWE-Bench dockerized evaluation as announced [here](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240627_docker/README.md).**

@@ -62,7 +64,7 @@ in order to use `eval_limit`, you must also set `agent`.
 default, it is set to 60.
 - `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
 default, it is set to 1.
- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, or `princeton-nlp/SWE-bench_Verified`, specifies which dataset to evaluate on.
+- `dataset`, a huggingface dataset name. e.g. `princeton-nlp/SWE-bench`, `princeton-nlp/SWE-bench_Lite`, `princeton-nlp/SWE-bench_Verified`, or `princeton-nlp/SWE-bench_Multimodal`, specifies which dataset to evaluate on.
 - `dataset_split`, split for the huggingface dataset. e.g., `test`, `dev`. Default to `test`.

 > [!CAUTION]
@@ -82,6 +84,13 @@ then your command would be:
 ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
 ```

+For multimodal evaluation, you can use:
+
+```bash
+# Example for running multimodal SWE-Bench evaluation
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_vision HEAD CodeActAgent 10 100 1 princeton-nlp/SWE-bench_Multimodal test
+```
+
 ### Running in parallel with RemoteRuntime

 OpenHands Remote Runtime is currently in beta (read [here](https://runtime.all-hands.dev/) for more details), it allows you to run rollout in parallel in the cloud, so you don't need a powerful machine to run evaluation.
@@ -58,7 +58,7 @@ def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
    return f'{instance.repo}__{instance.version}'.replace('/', '__')


-def get_instruction(instance: pd.Series, metadata: EvalMetadata):
+def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
    instruction = f"""
 <uploaded_files>
@@ -114,12 +114,20 @@ Be thorough in your exploration, testing, and reasoning. It's fine if your think
 """

    if RUN_WITH_BROWSING:
-        instruction += """
-<IMPORTANT!>
-You SHOULD NEVER attempt to browse the web.
-</IMPORTANT!>
-"""
-    return instruction
+        instruction += (
+            '<IMPORTANT!>\n'
+            'You SHOULD NEVER attempt to browse the web. '
+            '</IMPORTANT!>\n'
+        )
+
+    if 'image_assets' in instance:
+        assets = instance['image_assets']
+        assert (
+            'problem_statement' in assets
+        ), 'problem_statement is required in image_assets'
+        image_urls = assets['problem_statement']
+        return MessageAction(content=instruction, image_urls=image_urls)
+    return MessageAction(content=instruction)


 # TODO: migrate all swe-bench docker to ghcr.io/openhands
@@ -129,14 +137,18 @@ DEFAULT_DOCKER_IMAGE_PREFIX = os.environ.get(
 logger.info(f'Default docker image prefix: {DEFAULT_DOCKER_IMAGE_PREFIX}')


-def get_instance_docker_image(instance_id: str, official_image: bool = False) -> str:
-    if official_image:
+def get_instance_docker_image(
+    instance_id: str,
+    swebench_official_image: bool = False,
+) -> str:
+    if swebench_official_image:
        # Official SWE-Bench image
        # swebench/sweb.eval.x86_64.django_1776_django-11333:v1
        docker_image_prefix = 'docker.io/swebench/'
        repo, name = instance_id.split('__')
-        image_name = f'sweb.eval.x86_64.{repo}_1776_{name}:latest'
-        logger.warning(f'Using official SWE-Bench image: {image_name}')
+        image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'
+        logger.info(f'Using official SWE-Bench image: {image_name}')
+        return image_name
    else:
        # OpenHands version of the image
        docker_image_prefix = DEFAULT_DOCKER_IMAGE_PREFIX
@@ -144,7 +156,7 @@ def get_instance_docker_image(instance_id: str, official_image: bool = False) ->
        image_name = image_name.replace(
            '__', '_s_'
        )  # to comply with docker image naming convention
-    return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
+        return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()


 def get_config(
@@ -152,12 +164,13 @@ def get_config(
    metadata: EvalMetadata,
 ) -> AppConfig:
    # We use a different instance image for the each instance of swe-bench eval
-    use_official_image = bool(
+    use_swebench_official_image = bool(
        ('verified' in metadata.dataset.lower() or 'lite' in metadata.dataset.lower())
        and 'swe-gym' not in metadata.dataset.lower()
    )
    base_container_image = get_instance_docker_image(
-        instance['instance_id'], use_official_image
+        instance['instance_id'],
+        swebench_official_image=use_swebench_official_image,
    )
    logger.info(
        f'Using instance container image: {base_container_image}. '
@@ -493,13 +506,13 @@ def process_instance(
    try:
        initialize_runtime(runtime, instance)

-        instruction = get_instruction(instance, metadata)
+        message_action = get_instruction(instance, metadata)

        # Here's how you can run the agent (similar to the `main` function) and get the final task state
        state: State | None = asyncio.run(
            run_controller(
                config=config,
-                initial_user_action=MessageAction(content=instruction),
+                initial_user_action=message_action,
                runtime=runtime,
                fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
                    metadata.agent_class
@@ -539,6 +552,11 @@ def process_instance(
    metrics = get_metrics(state)

    # Save the output
+    instruction = message_action.content
+    if message_action.image_urls:
+        instruction += (
+            '\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
+        )
    output = EvalOutput(
        instance_id=instance.instance_id,
        instruction=instruction,
@@ -22,11 +22,11 @@ vi.mock("#/context/auth-context", () => ({
 describe("ActionSuggestions", () => {
  // Setup mocks for each test
  vi.clearAllMocks();
-  
+
  (useAuth as any).mockReturnValue({
    githubTokenIsSet: true,
  });
-  
+
  (useSelector as any).mockReturnValue({
    selectedRepository: "test-repo",
  });
@@ -66,16 +66,16 @@ describe("ActionSuggestions", () => {
  it("should have different prompts for 'Push to Branch' and 'Push & Create PR' buttons", () => {
    // This test verifies that the prompts are different in the component
    const component = render(<ActionSuggestions onSuggestionsClick={() => {}} />);
-    
+
    // Get the component instance to access the internal values
    const pushBranchPrompt = "Please push the changes to a remote branch on GitHub, but do NOT create a pull request. Please use the exact SAME branch name as the one you are currently on.";
    const createPRPrompt = "Please push the changes to GitHub and open a pull request. Please create a meaningful branch name that describes the changes.";
-    
+
    // Verify the prompts are different
    expect(pushBranchPrompt).not.toEqual(createPRPrompt);
-    
+
    // Verify the PR prompt mentions creating a meaningful branch name
    expect(createPRPrompt).toContain("meaningful branch name");
    expect(createPRPrompt).not.toContain("SAME branch name");
  });
-});
+});
@@ -47,7 +47,7 @@ class SandboxConfig(BaseModel):
    rm_all_containers: bool = Field(default=False)
    api_key: str | None = Field(default=None)
    base_container_image: str = Field(
-        default='nikolaik/python-nodejs:python3.12-nodejs22'
+        default='nikolaik/python-nodejs:python3.13-nodejs23-bullseye'
    )
    runtime_container_image: str | None = Field(default=None)
    user_id: int = Field(default=os.getuid() if hasattr(os, 'getuid') else 1000)
@@ -113,7 +113,7 @@ class RecallObservation(Observation):
                    f'repo_instructions={self.repo_instructions[:20]}...',
                    f'runtime_hosts={self.runtime_hosts}',
                    f'additional_agent_instructions={self.additional_agent_instructions[:20]}...',
-                    f'date={self.date}'
+                    f'date={self.date}',
                ]
            )
        else:
@@ -1,4 +1,4 @@
 {
    "workbench.colorTheme": "Default Dark Modern",
    "workbench.startupEditor": "none"
-}
+}
@@ -23,7 +23,7 @@ from openhands.server.shared import (
    sio,
 )
 from openhands.storage.conversation.conversation_validator import (
-    ConversationValidatorImpl,
+    create_conversation_validator,
 )


@@ -38,7 +38,7 @@ async def connect(connection_id: str, environ):
        raise ConnectionRefusedError('No conversation_id in query params')

    cookies_str = environ.get('HTTP_COOKIE', '')
-    conversation_validator = ConversationValidatorImpl()
+    conversation_validator = create_conversation_validator()
    user_id, github_user_id = await conversation_validator.validate(
        conversation_id, cookies_str
    )
@@ -243,6 +243,24 @@ async def get_conversation(
    try:
        metadata = await conversation_store.get_metadata(conversation_id)
        is_running = await conversation_manager.is_agent_loop_running(conversation_id)
+
+        # Check if we need to update the title
+        if is_running and metadata:
+            # Check if the title is a default title (contains the conversation ID)
+            if metadata.title and conversation_id[:5] in metadata.title:
+                # Generate a new title
+                new_title = await auto_generate_title(
+                    conversation_id, get_user_id(request)
+                )
+
+                if new_title:
+                    # Update the metadata
+                    metadata.title = new_title
+                    await conversation_store.save_metadata(metadata)
+
+                    # Refresh metadata after update
+                    metadata = await conversation_store.get_metadata(conversation_id)
+
        conversation_info = await _get_conversation_info(metadata, is_running)
        return conversation_info
    except FileNotFoundError:
@@ -265,6 +283,7 @@ def get_default_conversation_title(conversation_id: str) -> str:
 async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
    """
    Auto-generate a title for a conversation based on the first user message.
+    Uses LLM-based title generation if available, otherwise falls back to a simple truncation.

    Args:
        conversation_id: The ID of the conversation
@@ -292,11 +311,39 @@ async def auto_generate_title(conversation_id: str, user_id: str | None) -> str:
                break

        if first_user_message:
+            # Try LLM-based title generation first
+            from openhands.core.config.llm_config import LLMConfig
+            from openhands.utils.conversation_summary import generate_conversation_title
+
+            # Get LLM config from user settings
+            try:
+                settings_store = await SettingsStoreImpl.get_instance(config, user_id)
+                settings = await settings_store.load()
+
+                if settings and settings.llm_model:
+                    # Create LLM config from settings
+                    llm_config = LLMConfig(
+                        model=settings.llm_model,
+                        api_key=settings.llm_api_key,
+                        base_url=settings.llm_base_url,
+                    )
+
+                    # Try to generate title using LLM
+                    llm_title = await generate_conversation_title(
+                        first_user_message, llm_config
+                    )
+                    if llm_title:
+                        logger.info(f'Generated title using LLM: {llm_title}')
+                        return llm_title
+            except Exception as e:
+                logger.error(f'Error using LLM for title generation: {e}')
+
+            # Fall back to simple truncation if LLM generation fails or is unavailable
            first_user_message = first_user_message.strip()
            title = first_user_message[:30]
            if len(first_user_message) > 30:
                title += '...'
-            logger.info(f'Generated title: {title}')
+            logger.info(f'Generated title using truncation: {title}')
            return title
    except Exception as e:
        logger.error(f'Error generating title: {str(e)}')
@@ -315,10 +362,12 @@ async def update_conversation(
    if not metadata:
        return False

-    # If title is empty or unspecified, auto-generate it from the first user message
+    # If title is empty or unspecified, auto-generate it
    if not title or title.isspace():
        title = await auto_generate_title(conversation_id, user_id)
-        if not title:
+
+        # If we still don't have a title, use the default
+        if not title or title.isspace():
            title = get_default_conversation_title(conversation_id)

    metadata.title = title
@@ -361,9 +410,9 @@ async def _get_conversation_info(
            last_updated_at=conversation.last_updated_at,
            created_at=conversation.created_at,
            selected_repository=conversation.selected_repository,
-            status=ConversationStatus.RUNNING
-            if is_running
-            else ConversationStatus.STOPPED,
+            status=(
+                ConversationStatus.RUNNING if is_running else ConversationStatus.STOPPED
+            ),
        )
    except Exception as e:
        logger.error(
@@ -10,8 +10,12 @@ class ConversationValidator:
        return None, None


-conversation_validator_cls = os.environ.get(
-    'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
-    'openhands.storage.conversation.conversation_validator.ConversationValidator',
-)
-ConversationValidatorImpl = get_impl(ConversationValidator, conversation_validator_cls)
+def create_conversation_validator():
+    conversation_validator_cls = os.environ.get(
+        'OPENHANDS_CONVERSATION_VALIDATOR_CLS',
+        'openhands.storage.conversation.conversation_validator.ConversationValidator',
+    )
+    ConversationValidatorImpl = get_impl(
+        ConversationValidator, conversation_validator_cls
+    )
+    return ConversationValidatorImpl()
@@ -0,0 +1,57 @@
+"""Utility functions for generating conversation summaries."""
+
+from typing import Optional
+
+from openhands.core.config import LLMConfig
+from openhands.core.logger import openhands_logger as logger
+from openhands.llm.llm import LLM
+
+
+async def generate_conversation_title(
+    message: str, llm_config: LLMConfig, max_length: int = 50
+) -> Optional[str]:
+    """Generate a concise title for a conversation based on the first user message.
+
+    Args:
+        message: The first user message in the conversation.
+        llm_config: The LLM configuration to use for generating the title.
+        max_length: The maximum length of the generated title.
+
+    Returns:
+        A concise title for the conversation, or None if generation fails.
+    """
+    if not message or message.strip() == '':
+        return None
+
+    # Truncate very long messages to avoid excessive token usage
+    if len(message) > 1000:
+        truncated_message = message[:1000] + '...(truncated)'
+    else:
+        truncated_message = message
+
+    try:
+        llm = LLM(llm_config)
+
+        # Create a simple prompt for the LLM to generate a title
+        messages = [
+            {
+                'role': 'system',
+                'content': 'You are a helpful assistant that generates concise, descriptive titles for conversations with OpenHands. OpenHands is a helpful AI agent that can interact with a computer to solve tasks using bash terminal, file editor, and browser. Given a user message (which may be truncated), generate a concise, descriptive title for the conversation. Return only the title, with no additional text, quotes, or explanations.',
+            },
+            {
+                'role': 'user',
+                'content': f'Generate a title (maximum {max_length} characters) for a conversation that starts with this message:\n\n{truncated_message}',
+            },
+        ]
+
+        response = llm.completion(messages=messages)
+        title = response.choices[0].message.content.strip()
+
+        # Ensure the title isn't too long
+        if len(title) > max_length:
+            title = title[: max_length - 3] + '...'
+
+        return title
+    except Exception as e:
+        logger.error(f'Error generating conversation title: {e}')
+        return None
@@ -1,4 +1,5 @@
 import importlib
+from functools import lru_cache
 from typing import Type, TypeVar

 T = TypeVar('T')
@@ -13,6 +14,7 @@ def import_from(qual_name: str):
    return result


+@lru_cache()
 def get_impl(cls: Type[T], impl_name: str | None) -> Type[T]:
    """Import a named implementation of the specified class"""
    if impl_name is None:
@@ -496,18 +496,18 @@ files = [

 [[package]]
 name = "boto3"
-version = "1.37.22"
+version = "1.37.23"
 description = "The AWS SDK for Python"
 optional = false
 python-versions = ">=3.8"
 groups = ["main"]
 files = [
-    {file = "boto3-1.37.22-py3-none-any.whl", hash = "sha256:a14324d5fa5f4fea00c0e3c69754cbd28100f7fe194693eeecf2dc07446cf4ef"},
-    {file = "boto3-1.37.22.tar.gz", hash = "sha256:78a0ec0aafbf6044104c98ad80b69e6d1c83d8233fda2c2d241029e6c705c510"},
+    {file = "boto3-1.37.23-py3-none-any.whl", hash = "sha256:fc462b9fd738bd8a1c121d94d237c6b6a05a2c1cc709d16f5223acb752f7310b"},
+    {file = "boto3-1.37.23.tar.gz", hash = "sha256:82f4599a34f5eb66e916b9ac8547394f6e5899c19580e74b60237db04cf66d1e"},
 ]

 [package.dependencies]
-botocore = ">=1.37.22,<1.38.0"
+botocore = ">=1.37.23,<1.38.0"
 jmespath = ">=0.7.1,<2.0.0"
 s3transfer = ">=0.11.0,<0.12.0"

@@ -516,14 +516,14 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]

 [[package]]
 name = "boto3-stubs"
-version = "1.37.22"
-description = "Type annotations for boto3 1.37.22 generated with mypy-boto3-builder 8.10.1"
+version = "1.37.23"
+description = "Type annotations for boto3 1.37.23 generated with mypy-boto3-builder 8.10.1"
 optional = false
 python-versions = ">=3.8"
 groups = ["evaluation"]
 files = [
-    {file = "boto3_stubs-1.37.22-py3-none-any.whl", hash = "sha256:7d41213bef29af9bca6cbf481b00ec1a2535c111ee979ed152249d2c1ec02208"},
-    {file = "boto3_stubs-1.37.22.tar.gz", hash = "sha256:ad6c1471bd503da253420294ca5060a4a24d53cdc2672503a579d9b779d0e5ce"},
+    {file = "boto3_stubs-1.37.23-py3-none-any.whl", hash = "sha256:a00884a3df819bdc6b040c857e57a87b4f33df963ee88f8f406b13bf2cd983ca"},
+    {file = "boto3_stubs-1.37.23.tar.gz", hash = "sha256:011f06dadcd5ef3c627ec9808b9afa4e1837b0f009d82b8209f12a84ffbb3867"},
 ]

 [package.dependencies]
@@ -579,7 +579,7 @@ bedrock-data-automation-runtime = ["mypy-boto3-bedrock-data-automation-runtime (
 bedrock-runtime = ["mypy-boto3-bedrock-runtime (>=1.37.0,<1.38.0)"]
 billing = ["mypy-boto3-billing (>=1.37.0,<1.38.0)"]
 billingconductor = ["mypy-boto3-billingconductor (>=1.37.0,<1.38.0)"]
-boto3 = ["boto3 (==1.37.22)"]
+boto3 = ["boto3 (==1.37.23)"]
 braket = ["mypy-boto3-braket (>=1.37.0,<1.38.0)"]
 budgets = ["mypy-boto3-budgets (>=1.37.0,<1.38.0)"]
 ce = ["mypy-boto3-ce (>=1.37.0,<1.38.0)"]
@@ -943,14 +943,14 @@ xray = ["mypy-boto3-xray (>=1.37.0,<1.38.0)"]

 [[package]]
 name = "botocore"
-version = "1.37.22"
+version = "1.37.23"
 description = "Low-level, data-driven core of boto 3."
 optional = false
 python-versions = ">=3.8"
 groups = ["main"]
 files = [
-    {file = "botocore-1.37.22-py3-none-any.whl", hash = "sha256:184db7c9314d13002bc827f511a5140574b5da1acda342d51e093dad6317de98"},
-    {file = "botocore-1.37.22.tar.gz", hash = "sha256:b3b26f1a90236bcd17d4092f8c85a256b44e9955a16b633319a2f5678d605e9f"},
+    {file = "botocore-1.37.23-py3-none-any.whl", hash = "sha256:ffbe1f5958adb1c50d72d3ad1018cb265fe349248c08782d334601c0814f0e38"},
+    {file = "botocore-1.37.23.tar.gz", hash = "sha256:3a249c950cef9ee9ed7b2278500ad83a4ad6456bc433a43abd1864d1b61b2acb"},
 ]

 [package.dependencies]
@@ -2179,20 +2179,20 @@ typing = ["typing-extensions (>=4.12.2) ; python_version < \"3.11\""]

 [[package]]
 name = "flake8"
-version = "7.1.2"
+version = "7.2.0"
 description = "the modular source code checker: pep8 pyflakes and co"
 optional = false
-python-versions = ">=3.8.1"
+python-versions = ">=3.9"
 groups = ["main", "runtime"]
 files = [
-    {file = "flake8-7.1.2-py2.py3-none-any.whl", hash = "sha256:1cbc62e65536f65e6d754dfe6f1bada7f5cf392d6f5db3c2b85892466c3e7c1a"},
-    {file = "flake8-7.1.2.tar.gz", hash = "sha256:c586ffd0b41540951ae41af572e6790dbd49fc12b3aa2541685d253d9bd504bd"},
+    {file = "flake8-7.2.0-py2.py3-none-any.whl", hash = "sha256:93b92ba5bdb60754a6da14fa3b93a9361fd00a59632ada61fd7b130436c40343"},
+    {file = "flake8-7.2.0.tar.gz", hash = "sha256:fa558ae3f6f7dbf2b4f22663e5343b6b6023620461f8d4ff2019ef4b5ee70426"},
 ]

 [package.dependencies]
 mccabe = ">=0.7.0,<0.8.0"
-pycodestyle = ">=2.12.0,<2.13.0"
-pyflakes = ">=3.2.0,<3.3.0"
+pycodestyle = ">=2.13.0,<2.14.0"
+pyflakes = ">=3.3.0,<3.4.0"

 [[package]]
 name = "flask"
@@ -4393,14 +4393,14 @@ types-tqdm = "*"

 [[package]]
 name = "litellm"
-version = "1.64.1"
+version = "1.65.0"
 description = "Library to easily interface with LLM API providers"
 optional = false
 python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
 groups = ["main"]
 files = [
-    {file = "litellm-1.64.1-py3-none-any.whl", hash = "sha256:bd7cb4977dee121551f0322d48b4e51e9a508fc2ac2273e7c5405ca69354e352"},
-    {file = "litellm-1.64.1.tar.gz", hash = "sha256:73bac891b1fbd77ada4d691e967657c53f48c207d9c3ba414ad0ffe3e7ec8f89"},
+    {file = "litellm-1.65.0-py3-none-any.whl", hash = "sha256:bbc211f3d03e1830ed7f4304b40f70fa1fa4a2f9109d006ede5f78e83a189aba"},
+    {file = "litellm-1.65.0.tar.gz", hash = "sha256:147a74d18601ccaaff3ca125eba914ab6e5b5854aff480dce5a52be5b9d52ff8"},
 ]

 [package.dependencies]
@@ -4836,14 +4836,14 @@ files = [

 [[package]]
 name = "modal"
-version = "0.73.131"
+version = "0.73.136"
 description = "Python client library for Modal"
 optional = false
 python-versions = ">=3.9"
 groups = ["main", "evaluation"]
 files = [
-    {file = "modal-0.73.131-py3-none-any.whl", hash = "sha256:cece493c196a5c932602fa84ef91c60737078c263ac8859acc8c7e13257a6215"},
-    {file = "modal-0.73.131.tar.gz", hash = "sha256:26809ad9c9bd66d912370454135599b093e7bc2f450d6257d82dd178d00dab62"},
+    {file = "modal-0.73.136-py3-none-any.whl", hash = "sha256:1f812712ea616cce949c06c5a4b45497d1157879775986de54db9ed2023b79e9"},
+    {file = "modal-0.73.136.tar.gz", hash = "sha256:e8a6d3961c11e6440b2ab9a7f344fb1beb9aae8b8511df871ce3b2399f194af0"},
 ]

 [package.dependencies]
@@ -6210,14 +6210,14 @@ global = ["pybind11-global (==2.13.6)"]

 [[package]]
 name = "pycodestyle"
-version = "2.12.1"
+version = "2.13.0"
 description = "Python style guide checker"
 optional = false
-python-versions = ">=3.8"
+python-versions = ">=3.9"
 groups = ["main", "runtime"]
 files = [
-    {file = "pycodestyle-2.12.1-py2.py3-none-any.whl", hash = "sha256:46f0fb92069a7c28ab7bb558f05bfc0110dac69a0cd23c61ea0040283a9d78b3"},
-    {file = "pycodestyle-2.12.1.tar.gz", hash = "sha256:6838eae08bbce4f6accd5d5572075c63626a15ee3e6f842df996bf62f6d73521"},
+    {file = "pycodestyle-2.13.0-py2.py3-none-any.whl", hash = "sha256:35863c5974a271c7a726ed228a14a4f6daf49df369d8c50cd9a6f58a5e143ba9"},
+    {file = "pycodestyle-2.13.0.tar.gz", hash = "sha256:c8415bf09abe81d9c7f872502a6eee881fbe85d8763dd5b9924bb0a01d67efae"},
 ]

 [[package]]
@@ -6448,14 +6448,14 @@ dev = ["black", "flake8", "flake8-black", "isort", "jupyter-console", "mkdocs",

 [[package]]
 name = "pyflakes"
-version = "3.2.0"
+version = "3.3.2"
 description = "passive checker of Python programs"
 optional = false
-python-versions = ">=3.8"
+python-versions = ">=3.9"
 groups = ["main", "runtime"]
 files = [
-    {file = "pyflakes-3.2.0-py2.py3-none-any.whl", hash = "sha256:84b5be138a2dfbb40689ca07e2152deb896a65c3a3e24c251c5c62489568074a"},
-    {file = "pyflakes-3.2.0.tar.gz", hash = "sha256:1c61603ff154621fb2a9172037d84dca3500def8c8b630657d1701f026f8af3f"},
+    {file = "pyflakes-3.3.2-py2.py3-none-any.whl", hash = "sha256:5039c8339cbb1944045f4ee5466908906180f13cc99cc9949348d10f82a5c32a"},
+    {file = "pyflakes-3.3.2.tar.gz", hash = "sha256:6dfd61d87b97fba5dcfaaf781171ac16be16453be6d816147989e7f6e6a9576b"},
 ]

 [[package]]
@@ -8402,14 +8402,14 @@ test = ["pytest", "tornado (>=4.5)", "typeguard"]

 [[package]]
 name = "termcolor"
-version = "2.5.0"
+version = "3.0.0"
 description = "ANSI color formatting for output in terminal"
 optional = false
 python-versions = ">=3.9"
 groups = ["main"]
 files = [
-    {file = "termcolor-2.5.0-py3-none-any.whl", hash = "sha256:37b17b5fc1e604945c2642c872a3764b5d547a48009871aea3edd3afa180afb8"},
-    {file = "termcolor-2.5.0.tar.gz", hash = "sha256:998d8d27da6d48442e8e1f016119076b690d962507531df4890fcd2db2ef8a6f"},
+    {file = "termcolor-3.0.0-py3-none-any.whl", hash = "sha256:fdfdc9f2bdb71c69fbbbaeb7ceae3afef0461076dd2ee265bf7b7c49ddb05ebb"},
+    {file = "termcolor-3.0.0.tar.gz", hash = "sha256:0cd855c8716383f152ad02bbb39841d6e4694538ff5d424088e56c8b81fde525"},
 ]

 [package.extras]
@@ -0,0 +1,83 @@
+"""Tests for the conversation summary generator."""
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from openhands.core.config import LLMConfig
+from openhands.utils.conversation_summary import generate_conversation_title
+
+
+@pytest.mark.asyncio
+async def test_generate_conversation_title_empty_message():
+    """Test that an empty message returns None."""
+    result = await generate_conversation_title('', MagicMock())
+    assert result is None
+
+    result = await generate_conversation_title('   ', MagicMock())
+    assert result is None
+
+
+@pytest.mark.asyncio
+async def test_generate_conversation_title_success():
+    """Test successful title generation."""
+    # Create a proper mock response
+    mock_response = MagicMock()
+    mock_response.choices = [MagicMock()]
+    mock_response.choices[0].message.content = 'Generated Title'
+
+    # Create a mock LLM instance with a synchronous completion method
+    mock_llm = MagicMock()
+    mock_llm.completion = MagicMock(return_value=mock_response)
+
+    # Patch the LLM class to return our mock
+    with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
+        result = await generate_conversation_title(
+            'Can you help me with Python?', LLMConfig(model='test-model')
+        )
+
+    assert result == 'Generated Title'
+    # Verify the mock was called with the expected arguments
+    mock_llm.completion.assert_called_once()
+
+
+@pytest.mark.asyncio
+async def test_generate_conversation_title_long_title():
+    """Test that long titles are truncated."""
+    # Create a proper mock response with a long title
+    mock_response = MagicMock()
+    mock_response.choices = [MagicMock()]
+    mock_response.choices[
+        0
+    ].message.content = 'This is a very long title that should be truncated because it exceeds the maximum length'
+
+    # Create a mock LLM instance with a synchronous completion method
+    mock_llm = MagicMock()
+    mock_llm.completion = MagicMock(return_value=mock_response)
+
+    # Patch the LLM class to return our mock
+    with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
+        result = await generate_conversation_title(
+            'Can you help me with Python?', LLMConfig(model='test-model'), max_length=30
+        )
+
+    # Verify the title is truncated correctly
+    assert len(result) <= 30
+    assert result.endswith('...')
+
+
+@pytest.mark.asyncio
+async def test_generate_conversation_title_exception():
+    """Test that exceptions are handled gracefully."""
+    # Create a mock LLM instance with a synchronous completion method that raises an exception
+    mock_llm = MagicMock()
+    mock_llm.completion = MagicMock(side_effect=Exception('Test error'))
+
+    # Patch the LLM class to return our mock
+    with patch('openhands.utils.conversation_summary.LLM', return_value=mock_llm):
+        result = await generate_conversation_title(
+            'Can you help me with Python?', LLMConfig(model='test-model')
+        )
+
+    # Verify that None is returned when an exception occurs
+    assert result is None
Author	SHA1	Message	Date
Robert Brennan	155b806bff	update nikolaik	2025-03-31 13:24:09 -04:00
tofarr	6ae2984580	Fix for circular import on ConversationValidator (#7583 )	2025-03-31 11:09:10 -06:00
dependabot[bot]	f12bf985ce	chore(deps): bump the version-all group with 6 updates (#7600 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-03-31 15:44:35 +00:00
Xingyao Wang	b6321488bc	Update pre-commit instructions in repository memory (#7595 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 21:15:45 +08:00
Xingyao Wang	54236f9617	[eval] Support SWE-Bench Multimodal (#7122 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 07:42:44 -04:00
Xingyao Wang	2c4496b129	feat: Use LLM-generated natural-language descriptions as conversation title (#7049 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-30 21:34:07 +00:00
Peter Dave Hello	4b177992f8	Clean up apt temporary files in app Dockerfile (#7590 )	2025-03-30 16:37:54 +00:00
mkusaka	fa61e862e0	Fix broken markdown link for Anthropic billing settings (#7589 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-30 14:23:46 +00:00