Change how changes tab loads items

2026-04-29 03:00:45 -04:00 · 2025-05-27 20:05:11 +04:00
28 changed files with 149 additions and 909 deletions
@@ -26,7 +26,6 @@ Backend:
 - Located in the `openhands` directory
 - Testing:
  - All tests are in `tests/unit/test_*.py`
-  - To run all the unit tests, run `poetry run pytest --forked -n auto -svv ./tests/unit`
  - To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality
  - Write all tests with pytest

@@ -1,10 +1,8 @@
 # Development Guide

 This guide is for people working on OpenHands and editing the source code.
-If you wish to contribute your changes, check out the
-[CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md)
-on how to clone and setup the project initially before moving on. Otherwise,
-you can clone the OpenHands project directly.
+If you wish to contribute your changes, check out the [CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md) on how to clone and setup the project 
+initially before moving on. Otherwise, you can clone the OpenHands project directly.

 ## Start the Server for Development

@@ -21,20 +19,9 @@ you can clone the OpenHands project directly.

 Make sure you have all these dependencies installed before moving on to `make build`.

-#### Dev container
-
-There is a [dev container](https://containers.dev/) available which provides a
-pre-configured environment with all the necessary dependencies installed if you
-are using a [supported editor or tool](https://containers.dev/supporting). For
-example, if you are using Visual Studio Code (VS Code) with the
-[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
-extension installed, you can open the project in a dev container by using the
-_Dev Container: Reopen in Container_ command from the Command Palette
-(Ctrl+Shift+P).
-
 #### Develop without sudo access

-If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
+If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use 
 `conda` or `mamba` to manage the packages for you:

 ```bash
@@ -50,7 +37,7 @@ mamba install conda-forge::poetry

 ### 2. Build and Setup The Environment

-Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
+Begin by building the project which includes setting up the environment and installing dependencies. This step ensures 
 that OpenHands is ready to run on your system:

 ```bash
@@ -67,11 +54,11 @@ To configure the LM of your choice, run:
 make setup-config
 ```

-This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
-tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
+This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is 
+tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI, 
 please set the model in the UI.

-Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
+Note: If you have previously run OpenHands using the docker command, you may have already set some environmental 
 variables in your terminal. The final configurations are set from highest to lowest priority:
 Environment variables > config.toml variables > default variables

@@ -90,14 +77,14 @@ make run

 #### Option B: Individual Server Startup

- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
+- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on 
 backend-related tasks or configurations.

  ```bash
  make start-backend
  ```

- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
+- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related 
 components or interface enhancements.
  ```bash
  make start-frontend
@@ -133,7 +120,7 @@ poetry run pytest ./tests/unit/test_*.py

 ### 9. Use existing Docker image

-To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
+To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker 
 container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.

 Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.39-nikolaik`
@@ -325,15 +325,6 @@ classpath = "my_package.my_module.MyCustomAgent"
 # Useful when deploying OpenHands in a remote machine where you need to expose a specific port.
 #vscode_port = 41234

-# Volume mounts in the format 'host_path:container_path[:mode]'
-# e.g. '/my/host/dir:/workspace:rw'
-# Multiple mounts can be specified using commas
-# e.g. '/path1:/workspace/path1,/path2:/workspace/path2:ro'
-
-# Configure volumes under the [sandbox] section:
-# [sandbox]
-# volumes = "/my/host/dir:/workspace:rw,/path2:/workspace/path2:ro"
-
 #################################### Security ###################################
 # Configuration for security features
 ##############################################################################
@@ -331,8 +331,6 @@ The agent configuration options are defined in the `[agent]` and `[agent.<agent_

 The sandbox configuration options are defined in the `[sandbox]` section of the `config.toml` file.

-
-
 To use these with the docker command, pass in `-e SANDBOX_<option>`. Example: `-e SANDBOX_TIMEOUT`.

 ### Execution
@@ -2,8 +2,6 @@

 This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).

-**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
-
 **UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**

 **UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
@@ -1,92 +0,0 @@
-# SWE-Interact Benchmark
-
-This document explains how to use the [Interactive SWE-Bench](https://arxiv.org/abs/2502.13069) benchmark scripts for running and evaluating interactive software engineering tasks.
-
-## Setting things up
-After following the [README](./README.md) to set up the environment, you would need to additionally add LLM configurations for simulated human users. In the original [paper](https://arxiv.org/abs/2502.13069), we use gpt-4o as the simulated human user. You can add the following to your `config.toml` file:
-
-```toml
-[llm.fake_user]
-model="litellm_proxy/gpt-4o-2024-08-06"
-api_key="<your-api-key>"
-temperature = 0.0
-base_url = "https://llm-proxy.eval.all-hands.dev"
-```
-
-## Running the Benchmark
-
-The main script for running the benchmark is `run_infer_interact.sh`. Here's how to use it:
-
-```bash
-bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh <model_config> <commit_hash> <agent> <eval_limit> <max_iter> <num_workers> <split>
-```
-
-### Parameters:
-
- `model_config`: Path to the LLM configuration file (e.g., `llm.claude-3-7-sonnet`)
- `commit_hash`: Git commit hash to use (e.g., `HEAD`)
- `agent`: The agent class to use (e.g., `CodeActAgent`)
- `eval_limit`: Number of examples to evaluate (e.g., `500`)
- `max_iter`: Maximum number of iterations per task (e.g., `100`)
- `num_workers`: Number of parallel workers (e.g., `1`)
- `split`: Dataset split to use (e.g., `test`)
-
-### Example:
-
-```bash
-bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh llm.claude-3-7-sonnet HEAD CodeActAgent 500 100 1 test
-```
-
-### Additional Environment Variables:
-
-You can customize the behavior using these environment variables:
-
- `RUN_WITH_BROWSING`: Enable/disable web browsing (default: false)
- `USE_HINT_TEXT`: Enable/disable hint text (default: false)
- `EVAL_CONDENSER`: Specify a condenser configuration
- `EXP_NAME`: Add a custom experiment name to the output
- `N_RUNS`: Number of runs to perform (default: 1)
- `SKIP_RUNS`: Comma-separated list of run numbers to skip
-
-## Evaluating Results
-
-After running the benchmark, you can evaluate the results using `eval_infer.sh`:
-
-```bash
-./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh <output_file> <instance_id> <dataset> <split>
-```
-
-### Parameters:
-
- `output_file`: Path to the output JSONL file
- `instance_id`: The specific instance ID to evaluate
- `dataset`: Dataset name (e.g., `cmu-lti/interactive-swe`)
- `split`: Dataset split (e.g., `test`)
-
-### Example:
-
-```bash
-./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/cmu-lti__interactive-swe-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.39.0-no-hint-run_1/output.jsonl sphinx-doc__sphinx-8721 cmu-lti/interactive-swe test
-```
-
-## Output Structure
-
-The benchmark outputs are stored in the `evaluation/evaluation_outputs/outputs/` directory with the following structure:
-
-```
-evaluation/evaluation_outputs/outputs/
-└── cmu-lti__interactive-swe-{split}/
-    └── {agent}/
-        └── {model}-{date}_maxiter_{max_iter}_N_{version}-{options}-run_{run_number}/
-            └── output.jsonl
-```
-
-Where:
- `{split}` is the dataset split (e.g., test)
- `{agent}` is the agent class name
- `{model}` is the model name
- `{date}` is the run date
- `{max_iter}` is the maximum iterations
- `{version}` is the OpenHands version
- `{options}` includes any additional options (e.g., no-hint, with-browsing)
- `{run_number}` is the run number
@@ -1,411 +0,0 @@
-import asyncio
-import json
-import os
-
-import pandas as pd
-from datasets import load_dataset
-from litellm import completion as litellm_completion
-
-import openhands.agenthub
-from evaluation.benchmarks.swe_bench.run_infer import (
-    AgentFinishedCritic,
-    complete_runtime,
-    filter_dataset,
-    get_config,
-    initialize_runtime,
-)
-from evaluation.benchmarks.swe_bench.run_infer import (
-    get_instruction as base_get_instruction,
-)
-from evaluation.utils.shared import (
-    EvalException,
-    EvalMetadata,
-    EvalOutput,
-    make_metadata,
-    prepare_dataset,
-    reset_logger_for_multiprocessing,
-    run_evaluation,
-)
-from openhands.controller.state.state import State
-from openhands.core.config import (
-    get_llm_config_arg,
-    get_parser,
-)
-from openhands.core.config.condenser_config import NoOpCondenserConfig
-from openhands.core.config.utils import get_condenser_config_arg
-from openhands.core.logger import openhands_logger as logger
-from openhands.core.main import create_runtime, run_controller
-from openhands.events.action import MessageAction
-from openhands.events.serialization.event import event_from_dict, event_to_dict
-from openhands.utils.async_utils import call_async_from_sync
-
-USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
-USE_INSTANCE_IMAGE = os.environ.get('USE_INSTANCE_IMAGE', 'false').lower() == 'true'
-RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'false'
-
-
-class FakeUser:
-    def __init__(self, issue, hints, files):
-        self.system_message = f"""
-        You are a GitHub user reporting an issue. Here are the details of your issue and environment:
-
-        Issue: {issue}
-
-        Hints: {hints}
-
-        Files relative to your current directory: {files}
-
-        Your task is to respond to questions from a coder who is trying to solve your issue. The coder has a summarized version of the issue you have. Follow these rules:
-        1. If the coder asks a question that is directly related to the information in the issue you have, provide that information.
-        2. Always stay in character as a user reporting an issue, not as an AI assistant.
-        3. Keep your responses concise and to the point.
-        4. The coder has limited turns to solve the issue. Do not interact with the coder beyond 3 turns.
-
-        Respond with "I don't have that information" if the question is unrelated or you're unsure.
-        """
-        self.chat_history = [{'role': 'system', 'content': self.system_message}]
-        self.turns = 0
-        # Get LLM config from config.toml
-        self.llm_config = get_llm_config_arg(
-            'llm.fake_user'
-        )  # You can change 'fake_user' to any config name you want
-
-    def generate_reply(self, question):
-        if self.turns > 3:
-            return 'Please continue working on the task. Do NOT ask for more help.'
-        self.chat_history.append({'role': 'user', 'content': question.content})
-
-        response = litellm_completion(
-            model=self.llm_config.model,
-            messages=self.chat_history,
-            api_key=self.llm_config.api_key.get_secret_value(),
-            temperature=self.llm_config.temperature,
-            base_url=self.llm_config.base_url,
-        )
-
-        reply = response.choices[0].message.content
-        self.chat_history.append({'role': 'assistant', 'content': reply})
-        self.turns += 1
-        return reply
-
-
-# Global variable for fake user
-fake_user = None
-
-
-def get_fake_user_response(state: State) -> str:
-    global fake_user
-    if not fake_user:
-        return 'Please continue working on the task.'
-    last_agent_message = state.get_last_agent_message()
-    if last_agent_message:
-        return fake_user.generate_reply(last_agent_message)
-    return 'Please continue working on the task.'
-
-
-AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
-    'CodeActAgent': get_fake_user_response,
-}
-
-
-def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
-    instance_copy = instance.copy()
-    instance_copy.problem_statement = f'{instance.problem_statement}\n\nHints:\nThe user has not provided all the necessary details about the issue, and there are some hidden details that are helpful. Please ask the user specific questions using non-code commands to gather the relevant information that the user has to help you solve the issue. Ensure you have all the details you require to solve the issue.'
-    return base_get_instruction(instance_copy, metadata)
-
-
-def process_instance(
-    instance: pd.Series,
-    metadata: EvalMetadata,
-    reset_logger: bool = True,
-) -> EvalOutput:
-    config = get_config(instance, metadata)
-    global fake_user
-    original_issue = instance.original_issue
-    issue = str(original_issue)
-    fake_user = FakeUser(issue=issue, hints=instance.hints_text, files=instance.files)
-
-    # Setup the logger properly, so you can run multi-processing to parallelize the evaluation
-    if reset_logger:
-        log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
-        reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
-    else:
-        logger.info(f'Starting evaluation for instance {instance.instance_id}.')
-
-    runtime = create_runtime(config)
-    call_async_from_sync(runtime.connect)
-
-    try:
-        initialize_runtime(runtime, instance, metadata)
-
-        message_action = get_instruction(instance, metadata)
-
-        # Here's how you can run the agent (similar to the `main` function) and get the final task state
-        state: State | None = asyncio.run(
-            run_controller(
-                config=config,
-                initial_user_action=message_action,
-                runtime=runtime,
-                fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
-                    metadata.agent_class
-                ],
-            )
-        )
-
-        # if fatal error, throw EvalError to trigger re-run
-        if (
-            state
-            and state.last_error
-            and 'fatal error during agent execution' in state.last_error
-            and 'stuck in a loop' not in state.last_error
-        ):
-            raise EvalException('Fatal error detected: ' + state.last_error)
-
-        # Get git patch
-        return_val = complete_runtime(runtime, instance)
-        git_patch = return_val['git_patch']
-        logger.info(
-            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
-        )
-    finally:
-        runtime.close()
-
-    # Prepare test result
-    test_result = {
-        'git_patch': git_patch,
-    }
-
-    if state is None:
-        raise ValueError('State should not be None.')
-
-    histories = [event_to_dict(event) for event in state.history]
-    metrics = state.metrics.get() if state.metrics else None
-
-    # Save the output
-    instruction = message_action.content
-    if message_action.image_urls:
-        instruction += (
-            '\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
-        )
-    output = EvalOutput(
-        instance_id=instance.instance_id,
-        instruction=instruction,
-        instance=instance.to_dict(),
-        test_result=test_result,
-        metadata=metadata,
-        history=histories,
-        metrics=metrics,
-        error=state.last_error if state and state.last_error else None,
-    )
-    return output
-
-
-if __name__ == '__main__':
-    parser = get_parser()
-    parser.add_argument(
-        '--dataset',
-        type=str,
-        default='cmu-lti/interactive-swe',
-        help='dataset to evaluate on',
-    )
-    parser.add_argument(
-        '--split',
-        type=str,
-        default='test',
-        help='split to evaluate on',
-    )
-
-    args, _ = parser.parse_known_args()
-
-    # Load dataset from huggingface datasets
-    dataset = load_dataset(args.dataset, split=args.split)
-    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
-    logger.info(
-        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
-    )
-    llm_config = None
-    if args.llm_config:
-        llm_config = get_llm_config_arg(args.llm_config)
-        llm_config.log_completions = True
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
-        llm_config.modify_params = False
-
-    if llm_config is None:
-        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
-
-    # Get condenser config from environment variable
-    condenser_name = os.environ.get('EVAL_CONDENSER')
-    if condenser_name:
-        condenser_config = get_condenser_config_arg(condenser_name)
-        if condenser_config is None:
-            raise ValueError(
-                f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
-            )
-    else:
-        # If no specific condenser config is provided via env var, default to NoOpCondenser
-        condenser_config = NoOpCondenserConfig()
-        logger.debug(
-            'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
-        )
-
-    details = {'mode': 'interact'}
-    _agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
-
-    dataset_descrption = (
-        args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
-    )
-    metadata = make_metadata(
-        llm_config,
-        dataset_descrption,
-        args.agent_cls,
-        args.max_iterations,
-        args.eval_note,
-        args.eval_output_dir,
-        details=details,
-        condenser_config=condenser_config,
-    )
-
-    output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
-    print(f'### OUTPUT FILE: {output_file} ###')
-
-    # Run evaluation in iterative mode:
-    # If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
-    ITERATIVE_EVAL_MODE = (
-        os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
-    )
-    ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
-        os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
-    )
-
-    if not ITERATIVE_EVAL_MODE:
-        # load the dataset
-        instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
-        if len(instances) > 0 and not isinstance(
-            instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
-        ):
-            for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
-                instances[col] = instances[col].apply(lambda x: str(x))
-        run_evaluation(
-            instances,
-            metadata,
-            output_file,
-            args.eval_num_workers,
-            process_instance,
-            timeout_seconds=8
-            * 60
-            * 60,  # 8 hour PER instance should be more than enough
-            max_retries=5,
-        )
-    else:
-        critic = AgentFinishedCritic()
-
-        def get_cur_output_file_path(attempt: int) -> str:
-            return (
-                f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
-            )
-
-        eval_ids = None
-        for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
-            cur_output_file = get_cur_output_file_path(attempt)
-            logger.info(
-                f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
-            )
-
-            # For deterministic eval, we set temperature to 0.1 for (>1) attempt
-            # so hopefully we get slightly different results
-            if attempt > 1 and metadata.llm_config.temperature == 0:
-                logger.info(
-                    f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
-                )
-                metadata.llm_config.temperature = 0.1
-
-            # Load instances - at first attempt, we evaluate all instances
-            # On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
-            instances = prepare_dataset(
-                swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
-            )
-            if len(instances) > 0 and not isinstance(
-                instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
-            ):
-                for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
-                    instances[col] = instances[col].apply(lambda x: str(x))
-
-            # Run evaluation - but save them to cur_output_file
-            logger.info(
-                f'Evaluating {len(instances)} instances for attempt {attempt}...'
-            )
-            run_evaluation(
-                instances,
-                metadata,
-                cur_output_file,
-                args.eval_num_workers,
-                process_instance,
-                timeout_seconds=8
-                * 60
-                * 60,  # 8 hour PER instance should be more than enough
-                max_retries=5,
-            )
-
-            # When eval is done, we update eval_ids to the instances that failed the current attempt
-            instances_failed = []
-            logger.info(
-                f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
-            )
-            with open(cur_output_file, 'r') as f:
-                for line in f:
-                    instance = json.loads(line)
-                    try:
-                        history = [
-                            event_from_dict(event) for event in instance['history']
-                        ]
-                        critic_result = critic.evaluate(
-                            history, instance['test_result'].get('git_patch', '')
-                        )
-                        if not critic_result.success:
-                            instances_failed.append(instance['instance_id'])
-                    except Exception as e:
-                        logger.error(
-                            f'Error loading history for instance {instance["instance_id"]}: {e}'
-                        )
-                        instances_failed.append(instance['instance_id'])
-            logger.info(
-                f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
-            )
-            eval_ids = instances_failed
-
-            # If no instances failed, we break
-            if len(instances_failed) == 0:
-                break
-
-        # Then we should aggregate the results from all attempts into the original output file
-        # and remove the intermediate files
-        logger.info(
-            'Aggregating results from all attempts into the original output file...'
-        )
-        fout = open(output_file, 'w')
-        added_instance_ids = set()
-        for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
-            cur_output_file = get_cur_output_file_path(attempt)
-            if not os.path.exists(cur_output_file):
-                logger.warning(
-                    f'Intermediate output file {cur_output_file} does not exist. Skipping...'
-                )
-                continue
-
-            with open(cur_output_file, 'r') as f:
-                for line in f:
-                    instance = json.loads(line)
-                    # Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
-                    if (
-                        instance['instance_id'] not in added_instance_ids
-                        and instance['test_result'].get('git_patch', '').strip()
-                    ):
-                        fout.write(line)
-                        added_instance_ids.add(instance['instance_id'])
-            logger.info(
-                f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
-            )
-        fout.close()
-        logger.info(
-            f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
-        )
@@ -1,131 +0,0 @@
-#!/usr/bin/env bash
-set -eo pipefail
-
-source "evaluation/utils/version_control.sh"
-
-MODEL_CONFIG=$1
-COMMIT_HASH=$2
-AGENT=$3
-EVAL_LIMIT=$4
-MAX_ITER=$5
-NUM_WORKERS=$6
-SPLIT=$8
-N_RUNS=$9
-
-
-if [ -z "$NUM_WORKERS" ]; then
-  NUM_WORKERS=1
-  echo "Number of workers not specified, use default $NUM_WORKERS"
-fi
-checkout_eval_branch
-
-if [ -z "$AGENT" ]; then
-  echo "Agent not specified, use default CodeActAgent"
-  AGENT="CodeActAgent"
-fi
-
-if [ -z "$MAX_ITER" ]; then
-  echo "MAX_ITER not specified, use default 100"
-  MAX_ITER=100
-fi
-
-if [ -z "$RUN_WITH_BROWSING" ]; then
-  echo "RUN_WITH_BROWSING not specified, use default false"
-  RUN_WITH_BROWSING=false
-fi
-
-
-if [ -z "$DATASET" ]; then
-  echo "DATASET not specified, use default cmu-lti/interactive-swe"
-  DATASET="cmu-lti/interactive-swe"
-fi
-
-if [ -z "$SPLIT" ]; then
-  echo "SPLIT not specified, use default test"
-  SPLIT="test"
-fi
-
-if [ -n "$EVAL_CONDENSER" ]; then
-  echo "Using Condenser Config: $EVAL_CONDENSER"
-else
-  echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
-fi
-
-export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
-echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
-
-get_openhands_version
-
-echo "AGENT: $AGENT"
-echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
-echo "MODEL_CONFIG: $MODEL_CONFIG"
-echo "DATASET: $DATASET"
-echo "SPLIT: $SPLIT"
-echo "MAX_ITER: $MAX_ITER"
-echo "NUM_WORKERS: $NUM_WORKERS"
-echo "COMMIT_HASH: $COMMIT_HASH"
-echo "EVAL_CONDENSER: $EVAL_CONDENSER"
-
-# Default to NOT use Hint
-if [ -z "$USE_HINT_TEXT" ]; then
-  export USE_HINT_TEXT=false
-fi
-echo "USE_HINT_TEXT: $USE_HINT_TEXT"
-EVAL_NOTE="$OPENHANDS_VERSION"
-# if not using Hint, add -no-hint to the eval note
-if [ "$USE_HINT_TEXT" = false ]; then
-  EVAL_NOTE="$EVAL_NOTE-no-hint"
-fi
-
-if [ "$RUN_WITH_BROWSING" = true ]; then
-  EVAL_NOTE="$EVAL_NOTE-with-browsing"
-fi
-
-if [ -n "$EXP_NAME" ]; then
-  EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
-fi
-# Add condenser config to eval note if provided
-if [ -n "$EVAL_CONDENSER" ]; then
-  EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
-fi
-
-function run_eval() {
-  local eval_note="${1}"
-  COMMAND="poetry run python evaluation/benchmarks/swe_bench/run_infer_interact.py \
-    --agent-cls $AGENT \
-    --llm-config $MODEL_CONFIG \
-    --max-iterations $MAX_ITER \
-    --eval-num-workers $NUM_WORKERS \
-    --eval-note $eval_note \
-    --dataset $DATASET \
-    --split $SPLIT"
-
-  if [ -n "$EVAL_LIMIT" ]; then
-    echo "EVAL_LIMIT: $EVAL_LIMIT"
-    COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
-  fi
-
-  # Run the command
-  eval $COMMAND
-}
-
-unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
-if [ -z "$N_RUNS" ]; then
-  N_RUNS=1
-  echo "N_RUNS not specified, use default $N_RUNS"
-fi
-
-# Skip runs if the run number is in the SKIP_RUNS list
-# read from env variable SKIP_RUNS as a comma separated list of run numbers
-SKIP_RUNS=(${SKIP_RUNS//,/ })
-for i in $(seq 1 $N_RUNS); do
-  if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
-    echo "Skipping run $i"
-    continue
-  fi
-  current_eval_note="$EVAL_NOTE-run_$i"
-  echo "EVAL_NOTE: $current_eval_note"
-  run_eval $current_eval_note
-done
-
-checkout_original_branch
@@ -26,6 +26,7 @@ import { downloadTrajectory } from "#/utils/download-trajectory";
 import { displayErrorToast } from "#/utils/custom-toast-handlers";
 import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
 import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
+import i18n from "#/i18n";
 import { ErrorMessageBanner } from "./error-message-banner";
 import { shouldRenderEvent } from "./event-content-helpers/should-render-event";

@@ -180,7 +181,11 @@ export function ChatInterface() {
          {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
        </div>

-        {errorMessage && <ErrorMessageBanner message={errorMessage} />}
+        {errorMessage && (
+          <ErrorMessageBanner
+            message={i18n.exists(errorMessage) ? t(errorMessage) : errorMessage}
+          />
+        )}

        <InteractiveChatBox
          onSubmit={handleSendMessage}
@@ -1,7 +1,3 @@
-import { Trans } from "react-i18next";
-import { Link } from "react-router";
-import i18n from "#/i18n";
-
 interface ErrorMessageBannerProps {
  message: string;
 }
@@ -9,23 +5,7 @@ interface ErrorMessageBannerProps {
 export function ErrorMessageBanner({ message }: ErrorMessageBannerProps) {
  return (
    <div className="w-full rounded-lg p-2 text-black border border-red-800 bg-red-500">
-      {i18n.exists(message) ? (
-        <Trans
-          i18nKey={message}
-          components={{
-            a: (
-              <Link
-                className="underline font-bold cursor-pointer"
-                to="/settings/billing"
-              >
-                link
-              </Link>
-            ),
-          }}
-        />
-      ) : (
-        message
-      )}
+      {message}
    </div>
  );
 }
@@ -1,11 +1,6 @@
 import { OpenHandsAction } from "#/types/core/actions";
 import { OpenHandsEventType } from "#/types/core/base";
-import {
-  isCommandAction,
-  isCommandObservation,
-  isOpenHandsAction,
-  isOpenHandsObservation,
-} from "#/types/core/guards";
+import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
 import { OpenHandsObservation } from "#/types/core/observations";

 const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
@@ -20,21 +15,11 @@ export const shouldRenderEvent = (
  event: OpenHandsAction | OpenHandsObservation,
 ) => {
  if (isOpenHandsAction(event)) {
-    if (isCommandAction(event) && event.source === "user") {
-      // For user commands, we always hide them from the chat interface
-      return false;
-    }
-
    const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
    return !noRenderList.includes(event.action);
  }

  if (isOpenHandsObservation(event)) {
-    if (isCommandObservation(event) && event.source === "user") {
-      // For user commands, we always hide them from the chat interface
-      return false;
-    }
-
    return !COMMON_NO_RENDER_LIST.includes(event.observation);
  }

@@ -2,10 +2,32 @@ import React from "react";
 import { OpenHandsAction } from "#/types/core/actions";
 import { OpenHandsObservation } from "#/types/core/observations";
 import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
+import { OpenHandsEventType } from "#/types/core/base";
 import { EventMessage } from "./event-message";
 import { ChatMessage } from "./chat-message";
 import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";

+const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
+  "system",
+  "agent_state_changed",
+  "change_agent_state",
+];
+
+const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
+
+const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
+  if (isOpenHandsAction(event)) {
+    const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
+    return !noRenderList.includes(event.action);
+  }
+
+  if (isOpenHandsObservation(event)) {
+    return !COMMON_NO_RENDER_LIST.includes(event.observation);
+  }
+
+  return true;
+};
+
 interface MessagesProps {
  messages: (OpenHandsAction | OpenHandsObservation)[];
  isAwaitingUserConfirmation: boolean;
@@ -32,7 +54,7 @@ export const Messages: React.FC<MessagesProps> = React.memo(

    return (
      <>
-        {messages.map((message, index) => (
+        {messages.filter(shouldRenderEvent).map((message, index) => (
          <EventMessage
            key={index}
            event={message}
@@ -213,18 +213,14 @@ export function WsClientProvider({

      // Invalidate diffs cache when a file is edited or written
      if (
-        isFileEditAction(event) ||
-        isFileWriteAction(event) ||
-        isCommandAction(event)
+        !messageRateHandler.isUnderThreshold &&
+        (isFileEditAction(event) ||
+          isFileWriteAction(event) ||
+          isCommandAction(event))
      ) {
-        queryClient.invalidateQueries(
-          {
-            queryKey: ["file_changes", conversationId],
-          },
-          // Do not refetch if we are still receiving messages at a high rate (e.g., loading an existing conversation)
-          // This prevents unnecessary refetches when the user is still receiving messages
-          { cancelRefetch: false },
-        );
+        queryClient.invalidateQueries({
+          queryKey: ["file_changes", conversationId],
+        });

        // Invalidate file diff cache when a file is edited or written
        if (!isCommandAction(event)) {
@@ -1,14 +1,21 @@
 import { useQueries, useQuery } from "@tanstack/react-query";
 import axios from "axios";
 import React from "react";
+import { useSelector } from "react-redux";
 import OpenHands from "#/api/open-hands";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
+import { RootState } from "#/store";
 import { useConversationId } from "#/hooks/use-conversation-id";
-import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
+import { useActiveConversation } from "./use-active-conversation";

 export const useActiveHost = () => {
+  const { curAgentState } = useSelector((state: RootState) => state.agent);
  const [activeHost, setActiveHost] = React.useState<string | null>(null);
  const { conversationId } = useConversationId();
-  const runtimeIsReady = useRuntimeIsReady();
+  const { data: conversation } = useActiveConversation();
+  const enabled =
+    conversation?.status === "RUNNING" &&
+    RUNTIME_INACTIVE_STATES.includes(curAgentState);

  const { data } = useQuery({
    queryKey: [conversationId, "hosts"],
@@ -16,7 +23,7 @@ export const useActiveHost = () => {
      const hosts = await OpenHands.getWebHosts(conversationId);
      return { hosts };
    },
-    enabled: runtimeIsReady && !!conversationId,
+    enabled,
    initialData: { hosts: [] },
    meta: {
      disableToast: true,
@@ -1,15 +1,23 @@
 import { useQuery } from "@tanstack/react-query";
 import React from "react";
+import { useSelector } from "react-redux";
 import OpenHands from "#/api/open-hands";
 import { useConversationId } from "#/hooks/use-conversation-id";
 import { GitChange } from "#/api/open-hands.types";
-import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
+import { RootState } from "#/store";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
+import { useActiveConversation } from "./use-active-conversation";

 export const useGetGitChanges = () => {
  const { conversationId } = useConversationId();
+  const { data: conversation } = useActiveConversation();
  const [orderedChanges, setOrderedChanges] = React.useState<GitChange[]>([]);
  const previousDataRef = React.useRef<GitChange[]>(null);
-  const runtimeIsReady = useRuntimeIsReady();
+
+  const { curAgentState } = useSelector((state: RootState) => state.agent);
+  const enabled =
+    conversation?.status === "RUNNING" &&
+    RUNTIME_INACTIVE_STATES.includes(curAgentState);

  const result = useQuery({
    queryKey: ["file_changes", conversationId],
@@ -17,7 +25,7 @@ export const useGetGitChanges = () => {
    retry: false,
    staleTime: 1000 * 60 * 5, // 5 minutes
    gcTime: 1000 * 60 * 15, // 15 minutes
-    enabled: runtimeIsReady && !!conversationId,
+    enabled,
    meta: {
      disableToast: true,
    },
@@ -1,10 +1,13 @@
 import { useQuery } from "@tanstack/react-query";
 import { useTranslation } from "react-i18next";
+import { useSelector } from "react-redux";
 import OpenHands from "#/api/open-hands";
 import { useConversationId } from "#/hooks/use-conversation-id";
 import { I18nKey } from "#/i18n/declaration";
+import { RootState } from "#/store";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
 import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
-import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
+import { useActiveConversation } from "./use-active-conversation";

 // Define the return type for the VS Code URL query
 interface VSCodeUrlResult {
@@ -15,7 +18,11 @@ interface VSCodeUrlResult {
 export const useVSCodeUrl = () => {
  const { t } = useTranslation();
  const { conversationId } = useConversationId();
-  const runtimeIsReady = useRuntimeIsReady();
+  const { data: conversation } = useActiveConversation();
+  const { curAgentState } = useSelector((state: RootState) => state.agent);
+  const enabled =
+    conversation?.status === "RUNNING" &&
+    RUNTIME_INACTIVE_STATES.includes(curAgentState);

  return useQuery<VSCodeUrlResult>({
    queryKey: ["vscode_url", conversationId],
@@ -33,7 +40,7 @@ export const useVSCodeUrl = () => {
        error: t(I18nKey.VSCODE$URL_NOT_AVAILABLE),
      };
    },
-    enabled: runtimeIsReady && !!conversationId,
+    enabled,
    refetchOnMount: true,
    retry: 3,
  });
@@ -1,19 +0,0 @@
-import { useSelector } from "react-redux";
-import { RootState } from "#/store";
-import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
-import { useActiveConversation } from "./query/use-active-conversation";
-
-/**
- * Hook to determine if the runtime is ready for operations
- *
- * @returns boolean indicating if the runtime is ready
- */
-export const useRuntimeIsReady = (): boolean => {
-  const { data: conversation } = useActiveConversation();
-  const { curAgentState } = useSelector((state: RootState) => state.agent);
-
-  return (
-    conversation?.status === "RUNNING" &&
-    !RUNTIME_INACTIVE_STATES.includes(curAgentState)
-  );
-};
@@ -6400,20 +6400,20 @@
        "uk": "Запит не вдалося виконати через внутрішню помилку сервера."
    },
    "STATUS$ERROR_LLM_OUT_OF_CREDITS": {
-        "en": "You're out of OpenHands Credits. <a>Add funds</a>",
-        "ja": "OpenHandsクレジットが不足しています。<a>資金を追加</a>",
-        "zh-CN": "您的OpenHands点数已用完。<a>添加资金</a>",
-        "zh-TW": "您的OpenHands點數已用完。<a>添加資金</a>",
-        "ko-KR": "OpenHands 크레딧이 소진되었습니다. <a>자금 추가</a>",
-        "no": "Du er tom for OpenHands-kreditter. <a>Legg til midler</a>",
-        "it": "Hai esaurito i crediti OpenHands. <a>Aggiungi fondi</a>",
-        "pt": "Você está sem créditos OpenHands. <a>Adicionar fundos</a>",
-        "es": "Te has quedado sin créditos de OpenHands. <a>Añadir fondos</a>",
-        "ar": "لقد نفدت رصيدك من OpenHands. <a>إضافة رصيد</a>",
-        "fr": "Vous n'avez plus de crédits OpenHands. <a>Ajouter des fonds</a>",
-        "tr": "OpenHands kredileriniz tükendi. <a>Bakiye ekle</a>",
-        "de": "Ihre OpenHands-Guthaben sind aufgebraucht. <a>Guthaben hinzufügen</a>",
-        "uk": "У вас закінчилися кредити OpenHands. <a>Додати кошти</a>"
+        "en": "You're out of OpenHands Credits",
+        "ja": "OpenHandsクレジットが不足しています",
+        "zh-CN": "您的OpenHands点数已用完",
+        "zh-TW": "您的OpenHands點數已用完",
+        "ko-KR": "OpenHands 크레딧이 소진되었습니다",
+        "no": "Du er tom for OpenHands-kreditter",
+        "it": "Hai esaurito i crediti OpenHands",
+        "pt": "Você está sem créditos OpenHands",
+        "es": "Te has quedado sin créditos de OpenHands",
+        "ar": "لقد نفدت رصيدك من OpenHands",
+        "fr": "Vous n'avez plus de crédits OpenHands",
+        "tr": "OpenHands kredileriniz tükendi",
+        "de": "Ihre OpenHands-Guthaben sind aufgebraucht",
+        "uk": "У вас закінчилися кредити OpenHands"
    },
    "STATUS$ERROR_LLM_CONTENT_POLICY_VIOLATION": {
        "en": "Content policy violation. The output was blocked by content filtering policy.",
@@ -8780,7 +8780,7 @@
        "ar": "إرسال...",
        "fr": "Envoi...",
        "tr": "Gönderiliyor...",
-        "de": "Senden...",
+        "de": "Senden...",  
        "uk": "Відправляємо..."
    },
    "FEEDBACK$SUBMITTING_MESSAGE": {
@@ -20,7 +20,7 @@ export interface SystemMessageAction extends OpenHandsActionEvent<"system"> {
 }

 export interface CommandAction extends OpenHandsActionEvent<"run"> {
-  source: "agent" | "user";
+  source: "agent";
  args: {
    command: string;
    security_risk: ActionSecurityRisk;
@@ -4,7 +4,6 @@ import {
  AssistantMessageAction,
  OpenHandsAction,
  SystemMessageAction,
-  CommandAction,
 } from "./actions";
 import {
  AgentStateChangeObservation,
@@ -42,10 +41,6 @@ export const isErrorObservation = (
 ): event is ErrorObservation =>
  isOpenHandsObservation(event) && event.observation === "error";

-export const isCommandAction = (
-  event: OpenHandsParsedEvent,
-): event is CommandAction => isOpenHandsAction(event) && event.action === "run";
-
 export const isAgentStateChangeObservation = (
  event: OpenHandsParsedEvent,
 ): event is AgentStateChangeObservation =>
@@ -11,7 +11,7 @@ export interface AgentStateChangeObservation
 }

 export interface CommandObservation extends OpenHandsObservationEvent<"run"> {
-  source: "agent" | "user";
+  source: "agent";
  extras: {
    command: string;
    hidden?: boolean;
@@ -582,7 +582,7 @@ def _extract_and_validate_params(
    found_params = set()
    for param_match in param_matches:
        param_name = param_match.group(1)
-        param_value = param_match.group(2)
+        param_value = param_match.group(2).strip()

        # Validate parameter is allowed
        if allowed_params and param_name not in allowed_params:
@@ -1013,12 +1013,12 @@ if __name__ == '__main__':

        if not os.path.exists(full_path):
            # if user just removed a folder, prevent server error 500 in UI
-            return JSONResponse(content=[])
+            return []

        try:
            # Check if the directory exists
            if not os.path.exists(full_path) or not os.path.isdir(full_path):
-                return JSONResponse(content=[])
+                return []

            entries = os.listdir(full_path)

@@ -1047,11 +1047,11 @@ if __name__ == '__main__':

            # Combine sorted directories and files
            sorted_entries = directories + files
-            return JSONResponse(content=sorted_entries)
+            return sorted_entries

        except Exception as e:
            logger.error(f'Error listing files: {e}')
-            return JSONResponse(content=[])
+            return []

    logger.debug(f'Starting action execution API on port {args.port}')
    run(app, host='0.0.0.0', port=args.port)
@@ -48,34 +48,6 @@ def create_provider_tokens_object(
    return MappingProxyType(provider_information)


-async def setup_init_convo_settings(
-    user_id: str | None, providers_set: list[ProviderType]
-) -> ConversationInitData:
-    settings_store = await SettingsStoreImpl.get_instance(config, user_id)
-    settings = await settings_store.load()
-
-    secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
-    user_secrets: UserSecrets | None = await secrets_store.load()
-
-    if not settings:
-        raise ConnectionRefusedError(
-            'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
-        )
-
-    session_init_args: dict = {}
-    session_init_args = {**settings.__dict__, **session_init_args}
-
-    git_provider_tokens = create_provider_tokens_object(providers_set)
-    if server_config.app_mode != AppMode.SAAS and user_secrets:
-        git_provider_tokens = user_secrets.provider_tokens
-
-    session_init_args['git_provider_tokens'] = git_provider_tokens
-    if user_secrets:
-        session_init_args['custom_secrets'] = user_secrets.custom_secrets
-
-    return ConversationInitData(**session_init_args)
-
-
@sio.event
 async def connect(connection_id: str, environ: dict) -> None:
    try:
@@ -113,7 +85,30 @@ async def connect(connection_id: str, environ: dict) -> None:
            conversation_id, cookies_str, authorization_header
        )

-        conversation_init_data = await setup_init_convo_settings(user_id, providers_set)
+        settings_store = await SettingsStoreImpl.get_instance(config, user_id)
+        settings = await settings_store.load()
+
+        secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
+        user_secrets: UserSecrets | None = await secrets_store.load()
+
+        if not settings:
+            raise ConnectionRefusedError(
+                'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
+            )
+        session_init_args: dict = {}
+        if settings:
+            session_init_args = {**settings.__dict__, **session_init_args}
+
+        git_provider_tokens = create_provider_tokens_object(providers_set)
+        if server_config.app_mode != AppMode.SAAS and user_secrets:
+            git_provider_tokens = user_secrets.provider_tokens
+
+        session_init_args['git_provider_tokens'] = git_provider_tokens
+        if user_secrets:
+            session_init_args['custom_secrets'] = user_secrets.custom_secrets
+
+        conversation_init_data = ConversationInitData(**session_init_args)
+
        agent_loop_info = await conversation_manager.join_conversation(
            conversation_id,
            connection_id,
@@ -188,7 +188,10 @@ async def load_custom_secrets_names(
 ) -> GETCustomSecrets | JSONResponse:
    try:
        if not user_secrets:
-            return GETCustomSecrets(custom_secrets=[])
+            return JSONResponse(
+                status_code=status.HTTP_404_NOT_FOUND,
+                content={'error': 'User secrets not found'},
+            )

        custom_secrets: list[CustomSecretWithoutValueModel] = []
        if user_secrets.custom_secrets:
@@ -217,30 +220,31 @@ async def create_custom_secret(
 ) -> JSONResponse:
    try:
        existing_secrets = await secrets_store.load()
-        custom_secrets = dict(existing_secrets.custom_secrets) if existing_secrets else {}
+        if existing_secrets:
+            custom_secrets = dict(existing_secrets.custom_secrets)

-        secret_name = incoming_secret.name
-        secret_value = incoming_secret.value
-        secret_description = incoming_secret.description
+            secret_name = incoming_secret.name
+            secret_value = incoming_secret.value
+            secret_description = incoming_secret.description

-        if secret_name in custom_secrets:
-            return JSONResponse(
-                status_code=status.HTTP_400_BAD_REQUEST,
-                content={'message': f'Secret {secret_name} already exists'},
+            if secret_name in custom_secrets:
+                return JSONResponse(
+                    status_code=status.HTTP_400_BAD_REQUEST,
+                    content={'message': f'Secret {secret_name} already exists'},
+                )
+
+            custom_secrets[secret_name] = CustomSecret(
+                secret=secret_value,
+                description=secret_description or '',
            )

-        custom_secrets[secret_name] = CustomSecret(
-            secret=secret_value,
-            description=secret_description or '',
-        )
+            # Create a new UserSecrets that preserves provider tokens
+            updated_user_secrets = UserSecrets(
+                custom_secrets=custom_secrets,
+                provider_tokens=existing_secrets.provider_tokens,
+            )

-        # Create a new UserSecrets that preserves provider tokens
-        updated_user_secrets = UserSecrets(
-            custom_secrets=custom_secrets,
-            provider_tokens=existing_secrets.provider_tokens if existing_secrets else {},
-        )
-
-        await secrets_store.store(updated_user_secrets)
+            await secrets_store.store(updated_user_secrets)

        return JSONResponse(
            status_code=status.HTTP_201_CREATED,
@@ -683,29 +683,6 @@ def test_agent_config_condenser_with_no_enabled():
    assert isinstance(agent_config.condenser, NoOpCondenserConfig)


-def test_sandbox_volumes_toml(default_config, temp_toml_file):
-    """Test that volumes configuration under [sandbox] works correctly."""
-    with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
-        toml_file.write("""
-[sandbox]
-volumes = "/home/user/mydir:/workspace:rw,/data:/data:ro"
-timeout = 1
-""")
-
-    load_from_toml(default_config, temp_toml_file)
-    finalize_config(default_config)
-
-    # Check that sandbox.volumes is set correctly
-    assert (
-        default_config.sandbox.volumes
-        == '/home/user/mydir:/workspace:rw,/data:/data:ro'
-    )
-    assert default_config.workspace_mount_path == '/home/user/mydir'
-    assert default_config.workspace_mount_path_in_sandbox == '/workspace'
-    assert default_config.workspace_base == '/home/user/mydir'
-    assert default_config.sandbox.timeout == 1
-
-
 def test_condenser_config_from_toml_basic(default_config, temp_toml_file):
    """Test loading basic condenser configuration from TOML."""
    with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
@@ -652,34 +652,6 @@ NON_FNCALL_RESPONSE_MESSAGE = {
 <parameter=command>view</parameter>
 <parameter=path>/test/file.py</parameter>
 <parameter=view_range>[1, 10]</parameter>
-</function>""",
-        ),
-        # Test case with indented code block to verify indentation is preserved
-        (
-            [
-                {
-                    'index': 1,
-                    'function': {
-                        'arguments': '{"command": "str_replace", "path": "/test/file.py", "old_str": "def example():\\n    pass", "new_str": "def example():\\n    # This is indented\\n    print(\\"hello\\")\\n    return True"}',
-                        'name': 'str_replace_editor',
-                    },
-                    'id': 'test_id',
-                    'type': 'function',
-                }
-            ],
-            """<function=str_replace_editor>
-<parameter=command>str_replace</parameter>
-<parameter=path>/test/file.py</parameter>
-<parameter=old_str>
-def example():
-    pass
-</parameter>
-<parameter=new_str>
-def example():
-    # This is indented
-    print("hello")
-    return True
-</parameter>
 </function>""",
        ),
    ],
@@ -138,39 +138,6 @@ async def test_add_custom_secret(test_client, file_secrets_store):
    )


-@pytest.mark.asyncio
-async def test_create_custom_secret_with_no_existing_secrets(
-    test_client, file_secrets_store
-):
-    """Test creating a custom secret when there are no existing secrets at all."""
-
-    # Don't store any initial settings - this simulates a completely new user
-    # or a situation where the secrets store is empty
-
-    # Make the POST request to add a custom secret
-    add_secret_data = {
-        'name': 'NEW_API_KEY',
-        'value': 'new-api-key-value',
-        'description': 'Test API Key',
-    }
-    response = test_client.post('/api/secrets', json=add_secret_data)
-    assert response.status_code == 201
-
-    # Verify that the settings were stored with the new secret
-    stored_settings = await file_secrets_store.load()
-
-    # Check that the secret was added
-    assert 'NEW_API_KEY' in stored_settings.custom_secrets
-    assert (
-        stored_settings.custom_secrets['NEW_API_KEY'].secret.get_secret_value()
-        == 'new-api-key-value'
-    )
-    assert stored_settings.custom_secrets['NEW_API_KEY'].description == 'Test API Key'
-
-    # Check that provider_tokens is an empty dict, not None
-    assert stored_settings.provider_tokens == {}
-
-
@pytest.mark.asyncio
 async def test_update_existing_custom_secret(test_client, file_secrets_store):
    """Test updating an existing custom secret's name and description (cannot change value once set)."""