mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-04-29 03:00:45 -04:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| da500222d5 |
@@ -26,7 +26,6 @@ Backend:
|
||||
- Located in the `openhands` directory
|
||||
- Testing:
|
||||
- All tests are in `tests/unit/test_*.py`
|
||||
- To run all the unit tests, run `poetry run pytest --forked -n auto -svv ./tests/unit`
|
||||
- To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality
|
||||
- Write all tests with pytest
|
||||
|
||||
|
||||
+10
-23
@@ -1,10 +1,8 @@
|
||||
# Development Guide
|
||||
|
||||
This guide is for people working on OpenHands and editing the source code.
|
||||
If you wish to contribute your changes, check out the
|
||||
[CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md)
|
||||
on how to clone and setup the project initially before moving on. Otherwise,
|
||||
you can clone the OpenHands project directly.
|
||||
If you wish to contribute your changes, check out the [CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md) on how to clone and setup the project
|
||||
initially before moving on. Otherwise, you can clone the OpenHands project directly.
|
||||
|
||||
## Start the Server for Development
|
||||
|
||||
@@ -21,20 +19,9 @@ you can clone the OpenHands project directly.
|
||||
|
||||
Make sure you have all these dependencies installed before moving on to `make build`.
|
||||
|
||||
#### Dev container
|
||||
|
||||
There is a [dev container](https://containers.dev/) available which provides a
|
||||
pre-configured environment with all the necessary dependencies installed if you
|
||||
are using a [supported editor or tool](https://containers.dev/supporting). For
|
||||
example, if you are using Visual Studio Code (VS Code) with the
|
||||
[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
|
||||
extension installed, you can open the project in a dev container by using the
|
||||
_Dev Container: Reopen in Container_ command from the Command Palette
|
||||
(Ctrl+Shift+P).
|
||||
|
||||
#### Develop without sudo access
|
||||
|
||||
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
|
||||
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
|
||||
`conda` or `mamba` to manage the packages for you:
|
||||
|
||||
```bash
|
||||
@@ -50,7 +37,7 @@ mamba install conda-forge::poetry
|
||||
|
||||
### 2. Build and Setup The Environment
|
||||
|
||||
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
|
||||
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
|
||||
that OpenHands is ready to run on your system:
|
||||
|
||||
```bash
|
||||
@@ -67,11 +54,11 @@ To configure the LM of your choice, run:
|
||||
make setup-config
|
||||
```
|
||||
|
||||
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
|
||||
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
|
||||
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
|
||||
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
|
||||
please set the model in the UI.
|
||||
|
||||
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
|
||||
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
|
||||
variables in your terminal. The final configurations are set from highest to lowest priority:
|
||||
Environment variables > config.toml variables > default variables
|
||||
|
||||
@@ -90,14 +77,14 @@ make run
|
||||
|
||||
#### Option B: Individual Server Startup
|
||||
|
||||
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
|
||||
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
|
||||
backend-related tasks or configurations.
|
||||
|
||||
```bash
|
||||
make start-backend
|
||||
```
|
||||
|
||||
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
|
||||
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
|
||||
components or interface enhancements.
|
||||
```bash
|
||||
make start-frontend
|
||||
@@ -133,7 +120,7 @@ poetry run pytest ./tests/unit/test_*.py
|
||||
|
||||
### 9. Use existing Docker image
|
||||
|
||||
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
|
||||
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
|
||||
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
|
||||
|
||||
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.39-nikolaik`
|
||||
|
||||
@@ -325,15 +325,6 @@ classpath = "my_package.my_module.MyCustomAgent"
|
||||
# Useful when deploying OpenHands in a remote machine where you need to expose a specific port.
|
||||
#vscode_port = 41234
|
||||
|
||||
# Volume mounts in the format 'host_path:container_path[:mode]'
|
||||
# e.g. '/my/host/dir:/workspace:rw'
|
||||
# Multiple mounts can be specified using commas
|
||||
# e.g. '/path1:/workspace/path1,/path2:/workspace/path2:ro'
|
||||
|
||||
# Configure volumes under the [sandbox] section:
|
||||
# [sandbox]
|
||||
# volumes = "/my/host/dir:/workspace:rw,/path2:/workspace/path2:ro"
|
||||
|
||||
#################################### Security ###################################
|
||||
# Configuration for security features
|
||||
##############################################################################
|
||||
|
||||
@@ -331,8 +331,6 @@ The agent configuration options are defined in the `[agent]` and `[agent.<agent_
|
||||
|
||||
The sandbox configuration options are defined in the `[sandbox]` section of the `config.toml` file.
|
||||
|
||||
|
||||
|
||||
To use these with the docker command, pass in `-e SANDBOX_<option>`. Example: `-e SANDBOX_TIMEOUT`.
|
||||
|
||||
### Execution
|
||||
|
||||
@@ -2,8 +2,6 @@
|
||||
|
||||
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
|
||||
|
||||
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
|
||||
|
||||
**UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
|
||||
|
||||
**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
# SWE-Interact Benchmark
|
||||
|
||||
This document explains how to use the [Interactive SWE-Bench](https://arxiv.org/abs/2502.13069) benchmark scripts for running and evaluating interactive software engineering tasks.
|
||||
|
||||
## Setting things up
|
||||
After following the [README](./README.md) to set up the environment, you would need to additionally add LLM configurations for simulated human users. In the original [paper](https://arxiv.org/abs/2502.13069), we use gpt-4o as the simulated human user. You can add the following to your `config.toml` file:
|
||||
|
||||
```toml
|
||||
[llm.fake_user]
|
||||
model="litellm_proxy/gpt-4o-2024-08-06"
|
||||
api_key="<your-api-key>"
|
||||
temperature = 0.0
|
||||
base_url = "https://llm-proxy.eval.all-hands.dev"
|
||||
```
|
||||
|
||||
## Running the Benchmark
|
||||
|
||||
The main script for running the benchmark is `run_infer_interact.sh`. Here's how to use it:
|
||||
|
||||
```bash
|
||||
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh <model_config> <commit_hash> <agent> <eval_limit> <max_iter> <num_workers> <split>
|
||||
```
|
||||
|
||||
### Parameters:
|
||||
|
||||
- `model_config`: Path to the LLM configuration file (e.g., `llm.claude-3-7-sonnet`)
|
||||
- `commit_hash`: Git commit hash to use (e.g., `HEAD`)
|
||||
- `agent`: The agent class to use (e.g., `CodeActAgent`)
|
||||
- `eval_limit`: Number of examples to evaluate (e.g., `500`)
|
||||
- `max_iter`: Maximum number of iterations per task (e.g., `100`)
|
||||
- `num_workers`: Number of parallel workers (e.g., `1`)
|
||||
- `split`: Dataset split to use (e.g., `test`)
|
||||
|
||||
### Example:
|
||||
|
||||
```bash
|
||||
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh llm.claude-3-7-sonnet HEAD CodeActAgent 500 100 1 test
|
||||
```
|
||||
|
||||
### Additional Environment Variables:
|
||||
|
||||
You can customize the behavior using these environment variables:
|
||||
|
||||
- `RUN_WITH_BROWSING`: Enable/disable web browsing (default: false)
|
||||
- `USE_HINT_TEXT`: Enable/disable hint text (default: false)
|
||||
- `EVAL_CONDENSER`: Specify a condenser configuration
|
||||
- `EXP_NAME`: Add a custom experiment name to the output
|
||||
- `N_RUNS`: Number of runs to perform (default: 1)
|
||||
- `SKIP_RUNS`: Comma-separated list of run numbers to skip
|
||||
|
||||
## Evaluating Results
|
||||
|
||||
After running the benchmark, you can evaluate the results using `eval_infer.sh`:
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh <output_file> <instance_id> <dataset> <split>
|
||||
```
|
||||
|
||||
### Parameters:
|
||||
|
||||
- `output_file`: Path to the output JSONL file
|
||||
- `instance_id`: The specific instance ID to evaluate
|
||||
- `dataset`: Dataset name (e.g., `cmu-lti/interactive-swe`)
|
||||
- `split`: Dataset split (e.g., `test`)
|
||||
|
||||
### Example:
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/cmu-lti__interactive-swe-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.39.0-no-hint-run_1/output.jsonl sphinx-doc__sphinx-8721 cmu-lti/interactive-swe test
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
The benchmark outputs are stored in the `evaluation/evaluation_outputs/outputs/` directory with the following structure:
|
||||
|
||||
```
|
||||
evaluation/evaluation_outputs/outputs/
|
||||
└── cmu-lti__interactive-swe-{split}/
|
||||
└── {agent}/
|
||||
└── {model}-{date}_maxiter_{max_iter}_N_{version}-{options}-run_{run_number}/
|
||||
└── output.jsonl
|
||||
```
|
||||
|
||||
Where:
|
||||
- `{split}` is the dataset split (e.g., test)
|
||||
- `{agent}` is the agent class name
|
||||
- `{model}` is the model name
|
||||
- `{date}` is the run date
|
||||
- `{max_iter}` is the maximum iterations
|
||||
- `{version}` is the OpenHands version
|
||||
- `{options}` includes any additional options (e.g., no-hint, with-browsing)
|
||||
- `{run_number}` is the run number
|
||||
@@ -1,411 +0,0 @@
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
|
||||
import pandas as pd
|
||||
from datasets import load_dataset
|
||||
from litellm import completion as litellm_completion
|
||||
|
||||
import openhands.agenthub
|
||||
from evaluation.benchmarks.swe_bench.run_infer import (
|
||||
AgentFinishedCritic,
|
||||
complete_runtime,
|
||||
filter_dataset,
|
||||
get_config,
|
||||
initialize_runtime,
|
||||
)
|
||||
from evaluation.benchmarks.swe_bench.run_infer import (
|
||||
get_instruction as base_get_instruction,
|
||||
)
|
||||
from evaluation.utils.shared import (
|
||||
EvalException,
|
||||
EvalMetadata,
|
||||
EvalOutput,
|
||||
make_metadata,
|
||||
prepare_dataset,
|
||||
reset_logger_for_multiprocessing,
|
||||
run_evaluation,
|
||||
)
|
||||
from openhands.controller.state.state import State
|
||||
from openhands.core.config import (
|
||||
get_llm_config_arg,
|
||||
get_parser,
|
||||
)
|
||||
from openhands.core.config.condenser_config import NoOpCondenserConfig
|
||||
from openhands.core.config.utils import get_condenser_config_arg
|
||||
from openhands.core.logger import openhands_logger as logger
|
||||
from openhands.core.main import create_runtime, run_controller
|
||||
from openhands.events.action import MessageAction
|
||||
from openhands.events.serialization.event import event_from_dict, event_to_dict
|
||||
from openhands.utils.async_utils import call_async_from_sync
|
||||
|
||||
USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
|
||||
USE_INSTANCE_IMAGE = os.environ.get('USE_INSTANCE_IMAGE', 'false').lower() == 'true'
|
||||
RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'false'
|
||||
|
||||
|
||||
class FakeUser:
|
||||
def __init__(self, issue, hints, files):
|
||||
self.system_message = f"""
|
||||
You are a GitHub user reporting an issue. Here are the details of your issue and environment:
|
||||
|
||||
Issue: {issue}
|
||||
|
||||
Hints: {hints}
|
||||
|
||||
Files relative to your current directory: {files}
|
||||
|
||||
Your task is to respond to questions from a coder who is trying to solve your issue. The coder has a summarized version of the issue you have. Follow these rules:
|
||||
1. If the coder asks a question that is directly related to the information in the issue you have, provide that information.
|
||||
2. Always stay in character as a user reporting an issue, not as an AI assistant.
|
||||
3. Keep your responses concise and to the point.
|
||||
4. The coder has limited turns to solve the issue. Do not interact with the coder beyond 3 turns.
|
||||
|
||||
Respond with "I don't have that information" if the question is unrelated or you're unsure.
|
||||
"""
|
||||
self.chat_history = [{'role': 'system', 'content': self.system_message}]
|
||||
self.turns = 0
|
||||
# Get LLM config from config.toml
|
||||
self.llm_config = get_llm_config_arg(
|
||||
'llm.fake_user'
|
||||
) # You can change 'fake_user' to any config name you want
|
||||
|
||||
def generate_reply(self, question):
|
||||
if self.turns > 3:
|
||||
return 'Please continue working on the task. Do NOT ask for more help.'
|
||||
self.chat_history.append({'role': 'user', 'content': question.content})
|
||||
|
||||
response = litellm_completion(
|
||||
model=self.llm_config.model,
|
||||
messages=self.chat_history,
|
||||
api_key=self.llm_config.api_key.get_secret_value(),
|
||||
temperature=self.llm_config.temperature,
|
||||
base_url=self.llm_config.base_url,
|
||||
)
|
||||
|
||||
reply = response.choices[0].message.content
|
||||
self.chat_history.append({'role': 'assistant', 'content': reply})
|
||||
self.turns += 1
|
||||
return reply
|
||||
|
||||
|
||||
# Global variable for fake user
|
||||
fake_user = None
|
||||
|
||||
|
||||
def get_fake_user_response(state: State) -> str:
|
||||
global fake_user
|
||||
if not fake_user:
|
||||
return 'Please continue working on the task.'
|
||||
last_agent_message = state.get_last_agent_message()
|
||||
if last_agent_message:
|
||||
return fake_user.generate_reply(last_agent_message)
|
||||
return 'Please continue working on the task.'
|
||||
|
||||
|
||||
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
|
||||
'CodeActAgent': get_fake_user_response,
|
||||
}
|
||||
|
||||
|
||||
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
|
||||
instance_copy = instance.copy()
|
||||
instance_copy.problem_statement = f'{instance.problem_statement}\n\nHints:\nThe user has not provided all the necessary details about the issue, and there are some hidden details that are helpful. Please ask the user specific questions using non-code commands to gather the relevant information that the user has to help you solve the issue. Ensure you have all the details you require to solve the issue.'
|
||||
return base_get_instruction(instance_copy, metadata)
|
||||
|
||||
|
||||
def process_instance(
|
||||
instance: pd.Series,
|
||||
metadata: EvalMetadata,
|
||||
reset_logger: bool = True,
|
||||
) -> EvalOutput:
|
||||
config = get_config(instance, metadata)
|
||||
global fake_user
|
||||
original_issue = instance.original_issue
|
||||
issue = str(original_issue)
|
||||
fake_user = FakeUser(issue=issue, hints=instance.hints_text, files=instance.files)
|
||||
|
||||
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
|
||||
if reset_logger:
|
||||
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
|
||||
reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
|
||||
else:
|
||||
logger.info(f'Starting evaluation for instance {instance.instance_id}.')
|
||||
|
||||
runtime = create_runtime(config)
|
||||
call_async_from_sync(runtime.connect)
|
||||
|
||||
try:
|
||||
initialize_runtime(runtime, instance, metadata)
|
||||
|
||||
message_action = get_instruction(instance, metadata)
|
||||
|
||||
# Here's how you can run the agent (similar to the `main` function) and get the final task state
|
||||
state: State | None = asyncio.run(
|
||||
run_controller(
|
||||
config=config,
|
||||
initial_user_action=message_action,
|
||||
runtime=runtime,
|
||||
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
|
||||
metadata.agent_class
|
||||
],
|
||||
)
|
||||
)
|
||||
|
||||
# if fatal error, throw EvalError to trigger re-run
|
||||
if (
|
||||
state
|
||||
and state.last_error
|
||||
and 'fatal error during agent execution' in state.last_error
|
||||
and 'stuck in a loop' not in state.last_error
|
||||
):
|
||||
raise EvalException('Fatal error detected: ' + state.last_error)
|
||||
|
||||
# Get git patch
|
||||
return_val = complete_runtime(runtime, instance)
|
||||
git_patch = return_val['git_patch']
|
||||
logger.info(
|
||||
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
|
||||
)
|
||||
finally:
|
||||
runtime.close()
|
||||
|
||||
# Prepare test result
|
||||
test_result = {
|
||||
'git_patch': git_patch,
|
||||
}
|
||||
|
||||
if state is None:
|
||||
raise ValueError('State should not be None.')
|
||||
|
||||
histories = [event_to_dict(event) for event in state.history]
|
||||
metrics = state.metrics.get() if state.metrics else None
|
||||
|
||||
# Save the output
|
||||
instruction = message_action.content
|
||||
if message_action.image_urls:
|
||||
instruction += (
|
||||
'\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
|
||||
)
|
||||
output = EvalOutput(
|
||||
instance_id=instance.instance_id,
|
||||
instruction=instruction,
|
||||
instance=instance.to_dict(),
|
||||
test_result=test_result,
|
||||
metadata=metadata,
|
||||
history=histories,
|
||||
metrics=metrics,
|
||||
error=state.last_error if state and state.last_error else None,
|
||||
)
|
||||
return output
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = get_parser()
|
||||
parser.add_argument(
|
||||
'--dataset',
|
||||
type=str,
|
||||
default='cmu-lti/interactive-swe',
|
||||
help='dataset to evaluate on',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--split',
|
||||
type=str,
|
||||
default='test',
|
||||
help='split to evaluate on',
|
||||
)
|
||||
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
# Load dataset from huggingface datasets
|
||||
dataset = load_dataset(args.dataset, split=args.split)
|
||||
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
|
||||
logger.info(
|
||||
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
|
||||
)
|
||||
llm_config = None
|
||||
if args.llm_config:
|
||||
llm_config = get_llm_config_arg(args.llm_config)
|
||||
llm_config.log_completions = True
|
||||
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
|
||||
llm_config.modify_params = False
|
||||
|
||||
if llm_config is None:
|
||||
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
|
||||
|
||||
# Get condenser config from environment variable
|
||||
condenser_name = os.environ.get('EVAL_CONDENSER')
|
||||
if condenser_name:
|
||||
condenser_config = get_condenser_config_arg(condenser_name)
|
||||
if condenser_config is None:
|
||||
raise ValueError(
|
||||
f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
|
||||
)
|
||||
else:
|
||||
# If no specific condenser config is provided via env var, default to NoOpCondenser
|
||||
condenser_config = NoOpCondenserConfig()
|
||||
logger.debug(
|
||||
'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
|
||||
)
|
||||
|
||||
details = {'mode': 'interact'}
|
||||
_agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
|
||||
|
||||
dataset_descrption = (
|
||||
args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
|
||||
)
|
||||
metadata = make_metadata(
|
||||
llm_config,
|
||||
dataset_descrption,
|
||||
args.agent_cls,
|
||||
args.max_iterations,
|
||||
args.eval_note,
|
||||
args.eval_output_dir,
|
||||
details=details,
|
||||
condenser_config=condenser_config,
|
||||
)
|
||||
|
||||
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
|
||||
print(f'### OUTPUT FILE: {output_file} ###')
|
||||
|
||||
# Run evaluation in iterative mode:
|
||||
# If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
|
||||
ITERATIVE_EVAL_MODE = (
|
||||
os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
|
||||
)
|
||||
ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
|
||||
os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
|
||||
)
|
||||
|
||||
if not ITERATIVE_EVAL_MODE:
|
||||
# load the dataset
|
||||
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
|
||||
if len(instances) > 0 and not isinstance(
|
||||
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
|
||||
):
|
||||
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
|
||||
instances[col] = instances[col].apply(lambda x: str(x))
|
||||
run_evaluation(
|
||||
instances,
|
||||
metadata,
|
||||
output_file,
|
||||
args.eval_num_workers,
|
||||
process_instance,
|
||||
timeout_seconds=8
|
||||
* 60
|
||||
* 60, # 8 hour PER instance should be more than enough
|
||||
max_retries=5,
|
||||
)
|
||||
else:
|
||||
critic = AgentFinishedCritic()
|
||||
|
||||
def get_cur_output_file_path(attempt: int) -> str:
|
||||
return (
|
||||
f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
|
||||
)
|
||||
|
||||
eval_ids = None
|
||||
for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
|
||||
cur_output_file = get_cur_output_file_path(attempt)
|
||||
logger.info(
|
||||
f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
|
||||
)
|
||||
|
||||
# For deterministic eval, we set temperature to 0.1 for (>1) attempt
|
||||
# so hopefully we get slightly different results
|
||||
if attempt > 1 and metadata.llm_config.temperature == 0:
|
||||
logger.info(
|
||||
f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
|
||||
)
|
||||
metadata.llm_config.temperature = 0.1
|
||||
|
||||
# Load instances - at first attempt, we evaluate all instances
|
||||
# On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
|
||||
instances = prepare_dataset(
|
||||
swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
|
||||
)
|
||||
if len(instances) > 0 and not isinstance(
|
||||
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
|
||||
):
|
||||
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
|
||||
instances[col] = instances[col].apply(lambda x: str(x))
|
||||
|
||||
# Run evaluation - but save them to cur_output_file
|
||||
logger.info(
|
||||
f'Evaluating {len(instances)} instances for attempt {attempt}...'
|
||||
)
|
||||
run_evaluation(
|
||||
instances,
|
||||
metadata,
|
||||
cur_output_file,
|
||||
args.eval_num_workers,
|
||||
process_instance,
|
||||
timeout_seconds=8
|
||||
* 60
|
||||
* 60, # 8 hour PER instance should be more than enough
|
||||
max_retries=5,
|
||||
)
|
||||
|
||||
# When eval is done, we update eval_ids to the instances that failed the current attempt
|
||||
instances_failed = []
|
||||
logger.info(
|
||||
f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
|
||||
)
|
||||
with open(cur_output_file, 'r') as f:
|
||||
for line in f:
|
||||
instance = json.loads(line)
|
||||
try:
|
||||
history = [
|
||||
event_from_dict(event) for event in instance['history']
|
||||
]
|
||||
critic_result = critic.evaluate(
|
||||
history, instance['test_result'].get('git_patch', '')
|
||||
)
|
||||
if not critic_result.success:
|
||||
instances_failed.append(instance['instance_id'])
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f'Error loading history for instance {instance["instance_id"]}: {e}'
|
||||
)
|
||||
instances_failed.append(instance['instance_id'])
|
||||
logger.info(
|
||||
f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
|
||||
)
|
||||
eval_ids = instances_failed
|
||||
|
||||
# If no instances failed, we break
|
||||
if len(instances_failed) == 0:
|
||||
break
|
||||
|
||||
# Then we should aggregate the results from all attempts into the original output file
|
||||
# and remove the intermediate files
|
||||
logger.info(
|
||||
'Aggregating results from all attempts into the original output file...'
|
||||
)
|
||||
fout = open(output_file, 'w')
|
||||
added_instance_ids = set()
|
||||
for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
|
||||
cur_output_file = get_cur_output_file_path(attempt)
|
||||
if not os.path.exists(cur_output_file):
|
||||
logger.warning(
|
||||
f'Intermediate output file {cur_output_file} does not exist. Skipping...'
|
||||
)
|
||||
continue
|
||||
|
||||
with open(cur_output_file, 'r') as f:
|
||||
for line in f:
|
||||
instance = json.loads(line)
|
||||
# Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
|
||||
if (
|
||||
instance['instance_id'] not in added_instance_ids
|
||||
and instance['test_result'].get('git_patch', '').strip()
|
||||
):
|
||||
fout.write(line)
|
||||
added_instance_ids.add(instance['instance_id'])
|
||||
logger.info(
|
||||
f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
|
||||
)
|
||||
fout.close()
|
||||
logger.info(
|
||||
f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
|
||||
)
|
||||
@@ -1,131 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -eo pipefail
|
||||
|
||||
source "evaluation/utils/version_control.sh"
|
||||
|
||||
MODEL_CONFIG=$1
|
||||
COMMIT_HASH=$2
|
||||
AGENT=$3
|
||||
EVAL_LIMIT=$4
|
||||
MAX_ITER=$5
|
||||
NUM_WORKERS=$6
|
||||
SPLIT=$8
|
||||
N_RUNS=$9
|
||||
|
||||
|
||||
if [ -z "$NUM_WORKERS" ]; then
|
||||
NUM_WORKERS=1
|
||||
echo "Number of workers not specified, use default $NUM_WORKERS"
|
||||
fi
|
||||
checkout_eval_branch
|
||||
|
||||
if [ -z "$AGENT" ]; then
|
||||
echo "Agent not specified, use default CodeActAgent"
|
||||
AGENT="CodeActAgent"
|
||||
fi
|
||||
|
||||
if [ -z "$MAX_ITER" ]; then
|
||||
echo "MAX_ITER not specified, use default 100"
|
||||
MAX_ITER=100
|
||||
fi
|
||||
|
||||
if [ -z "$RUN_WITH_BROWSING" ]; then
|
||||
echo "RUN_WITH_BROWSING not specified, use default false"
|
||||
RUN_WITH_BROWSING=false
|
||||
fi
|
||||
|
||||
|
||||
if [ -z "$DATASET" ]; then
|
||||
echo "DATASET not specified, use default cmu-lti/interactive-swe"
|
||||
DATASET="cmu-lti/interactive-swe"
|
||||
fi
|
||||
|
||||
if [ -z "$SPLIT" ]; then
|
||||
echo "SPLIT not specified, use default test"
|
||||
SPLIT="test"
|
||||
fi
|
||||
|
||||
if [ -n "$EVAL_CONDENSER" ]; then
|
||||
echo "Using Condenser Config: $EVAL_CONDENSER"
|
||||
else
|
||||
echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
|
||||
fi
|
||||
|
||||
export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
|
||||
echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
|
||||
|
||||
get_openhands_version
|
||||
|
||||
echo "AGENT: $AGENT"
|
||||
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
|
||||
echo "MODEL_CONFIG: $MODEL_CONFIG"
|
||||
echo "DATASET: $DATASET"
|
||||
echo "SPLIT: $SPLIT"
|
||||
echo "MAX_ITER: $MAX_ITER"
|
||||
echo "NUM_WORKERS: $NUM_WORKERS"
|
||||
echo "COMMIT_HASH: $COMMIT_HASH"
|
||||
echo "EVAL_CONDENSER: $EVAL_CONDENSER"
|
||||
|
||||
# Default to NOT use Hint
|
||||
if [ -z "$USE_HINT_TEXT" ]; then
|
||||
export USE_HINT_TEXT=false
|
||||
fi
|
||||
echo "USE_HINT_TEXT: $USE_HINT_TEXT"
|
||||
EVAL_NOTE="$OPENHANDS_VERSION"
|
||||
# if not using Hint, add -no-hint to the eval note
|
||||
if [ "$USE_HINT_TEXT" = false ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-no-hint"
|
||||
fi
|
||||
|
||||
if [ "$RUN_WITH_BROWSING" = true ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-with-browsing"
|
||||
fi
|
||||
|
||||
if [ -n "$EXP_NAME" ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
|
||||
fi
|
||||
# Add condenser config to eval note if provided
|
||||
if [ -n "$EVAL_CONDENSER" ]; then
|
||||
EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
|
||||
fi
|
||||
|
||||
function run_eval() {
|
||||
local eval_note="${1}"
|
||||
COMMAND="poetry run python evaluation/benchmarks/swe_bench/run_infer_interact.py \
|
||||
--agent-cls $AGENT \
|
||||
--llm-config $MODEL_CONFIG \
|
||||
--max-iterations $MAX_ITER \
|
||||
--eval-num-workers $NUM_WORKERS \
|
||||
--eval-note $eval_note \
|
||||
--dataset $DATASET \
|
||||
--split $SPLIT"
|
||||
|
||||
if [ -n "$EVAL_LIMIT" ]; then
|
||||
echo "EVAL_LIMIT: $EVAL_LIMIT"
|
||||
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
|
||||
fi
|
||||
|
||||
# Run the command
|
||||
eval $COMMAND
|
||||
}
|
||||
|
||||
unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
|
||||
if [ -z "$N_RUNS" ]; then
|
||||
N_RUNS=1
|
||||
echo "N_RUNS not specified, use default $N_RUNS"
|
||||
fi
|
||||
|
||||
# Skip runs if the run number is in the SKIP_RUNS list
|
||||
# read from env variable SKIP_RUNS as a comma separated list of run numbers
|
||||
SKIP_RUNS=(${SKIP_RUNS//,/ })
|
||||
for i in $(seq 1 $N_RUNS); do
|
||||
if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
|
||||
echo "Skipping run $i"
|
||||
continue
|
||||
fi
|
||||
current_eval_note="$EVAL_NOTE-run_$i"
|
||||
echo "EVAL_NOTE: $current_eval_note"
|
||||
run_eval $current_eval_note
|
||||
done
|
||||
|
||||
checkout_original_branch
|
||||
@@ -26,6 +26,7 @@ import { downloadTrajectory } from "#/utils/download-trajectory";
|
||||
import { displayErrorToast } from "#/utils/custom-toast-handlers";
|
||||
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
|
||||
import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
|
||||
import i18n from "#/i18n";
|
||||
import { ErrorMessageBanner } from "./error-message-banner";
|
||||
import { shouldRenderEvent } from "./event-content-helpers/should-render-event";
|
||||
|
||||
@@ -180,7 +181,11 @@ export function ChatInterface() {
|
||||
{!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
|
||||
</div>
|
||||
|
||||
{errorMessage && <ErrorMessageBanner message={errorMessage} />}
|
||||
{errorMessage && (
|
||||
<ErrorMessageBanner
|
||||
message={i18n.exists(errorMessage) ? t(errorMessage) : errorMessage}
|
||||
/>
|
||||
)}
|
||||
|
||||
<InteractiveChatBox
|
||||
onSubmit={handleSendMessage}
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
import { Trans } from "react-i18next";
|
||||
import { Link } from "react-router";
|
||||
import i18n from "#/i18n";
|
||||
|
||||
interface ErrorMessageBannerProps {
|
||||
message: string;
|
||||
}
|
||||
@@ -9,23 +5,7 @@ interface ErrorMessageBannerProps {
|
||||
export function ErrorMessageBanner({ message }: ErrorMessageBannerProps) {
|
||||
return (
|
||||
<div className="w-full rounded-lg p-2 text-black border border-red-800 bg-red-500">
|
||||
{i18n.exists(message) ? (
|
||||
<Trans
|
||||
i18nKey={message}
|
||||
components={{
|
||||
a: (
|
||||
<Link
|
||||
className="underline font-bold cursor-pointer"
|
||||
to="/settings/billing"
|
||||
>
|
||||
link
|
||||
</Link>
|
||||
),
|
||||
}}
|
||||
/>
|
||||
) : (
|
||||
message
|
||||
)}
|
||||
{message}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
@@ -1,11 +1,6 @@
|
||||
import { OpenHandsAction } from "#/types/core/actions";
|
||||
import { OpenHandsEventType } from "#/types/core/base";
|
||||
import {
|
||||
isCommandAction,
|
||||
isCommandObservation,
|
||||
isOpenHandsAction,
|
||||
isOpenHandsObservation,
|
||||
} from "#/types/core/guards";
|
||||
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
|
||||
import { OpenHandsObservation } from "#/types/core/observations";
|
||||
|
||||
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
|
||||
@@ -20,21 +15,11 @@ export const shouldRenderEvent = (
|
||||
event: OpenHandsAction | OpenHandsObservation,
|
||||
) => {
|
||||
if (isOpenHandsAction(event)) {
|
||||
if (isCommandAction(event) && event.source === "user") {
|
||||
// For user commands, we always hide them from the chat interface
|
||||
return false;
|
||||
}
|
||||
|
||||
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
|
||||
return !noRenderList.includes(event.action);
|
||||
}
|
||||
|
||||
if (isOpenHandsObservation(event)) {
|
||||
if (isCommandObservation(event) && event.source === "user") {
|
||||
// For user commands, we always hide them from the chat interface
|
||||
return false;
|
||||
}
|
||||
|
||||
return !COMMON_NO_RENDER_LIST.includes(event.observation);
|
||||
}
|
||||
|
||||
|
||||
@@ -2,10 +2,32 @@ import React from "react";
|
||||
import { OpenHandsAction } from "#/types/core/actions";
|
||||
import { OpenHandsObservation } from "#/types/core/observations";
|
||||
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
|
||||
import { OpenHandsEventType } from "#/types/core/base";
|
||||
import { EventMessage } from "./event-message";
|
||||
import { ChatMessage } from "./chat-message";
|
||||
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
|
||||
|
||||
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
|
||||
"system",
|
||||
"agent_state_changed",
|
||||
"change_agent_state",
|
||||
];
|
||||
|
||||
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
|
||||
|
||||
const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
|
||||
if (isOpenHandsAction(event)) {
|
||||
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
|
||||
return !noRenderList.includes(event.action);
|
||||
}
|
||||
|
||||
if (isOpenHandsObservation(event)) {
|
||||
return !COMMON_NO_RENDER_LIST.includes(event.observation);
|
||||
}
|
||||
|
||||
return true;
|
||||
};
|
||||
|
||||
interface MessagesProps {
|
||||
messages: (OpenHandsAction | OpenHandsObservation)[];
|
||||
isAwaitingUserConfirmation: boolean;
|
||||
@@ -32,7 +54,7 @@ export const Messages: React.FC<MessagesProps> = React.memo(
|
||||
|
||||
return (
|
||||
<>
|
||||
{messages.map((message, index) => (
|
||||
{messages.filter(shouldRenderEvent).map((message, index) => (
|
||||
<EventMessage
|
||||
key={index}
|
||||
event={message}
|
||||
|
||||
@@ -213,18 +213,14 @@ export function WsClientProvider({
|
||||
|
||||
// Invalidate diffs cache when a file is edited or written
|
||||
if (
|
||||
isFileEditAction(event) ||
|
||||
isFileWriteAction(event) ||
|
||||
isCommandAction(event)
|
||||
!messageRateHandler.isUnderThreshold &&
|
||||
(isFileEditAction(event) ||
|
||||
isFileWriteAction(event) ||
|
||||
isCommandAction(event))
|
||||
) {
|
||||
queryClient.invalidateQueries(
|
||||
{
|
||||
queryKey: ["file_changes", conversationId],
|
||||
},
|
||||
// Do not refetch if we are still receiving messages at a high rate (e.g., loading an existing conversation)
|
||||
// This prevents unnecessary refetches when the user is still receiving messages
|
||||
{ cancelRefetch: false },
|
||||
);
|
||||
queryClient.invalidateQueries({
|
||||
queryKey: ["file_changes", conversationId],
|
||||
});
|
||||
|
||||
// Invalidate file diff cache when a file is edited or written
|
||||
if (!isCommandAction(event)) {
|
||||
|
||||
@@ -1,14 +1,21 @@
|
||||
import { useQueries, useQuery } from "@tanstack/react-query";
|
||||
import axios from "axios";
|
||||
import React from "react";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { RootState } from "#/store";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
|
||||
export const useActiveHost = () => {
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const [activeHost, setActiveHost] = React.useState<string | null>(null);
|
||||
const { conversationId } = useConversationId();
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
|
||||
const { data } = useQuery({
|
||||
queryKey: [conversationId, "hosts"],
|
||||
@@ -16,7 +23,7 @@ export const useActiveHost = () => {
|
||||
const hosts = await OpenHands.getWebHosts(conversationId);
|
||||
return { hosts };
|
||||
},
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
enabled,
|
||||
initialData: { hosts: [] },
|
||||
meta: {
|
||||
disableToast: true,
|
||||
|
||||
@@ -1,15 +1,23 @@
|
||||
import { useQuery } from "@tanstack/react-query";
|
||||
import React from "react";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { GitChange } from "#/api/open-hands.types";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
|
||||
export const useGetGitChanges = () => {
|
||||
const { conversationId } = useConversationId();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const [orderedChanges, setOrderedChanges] = React.useState<GitChange[]>([]);
|
||||
const previousDataRef = React.useRef<GitChange[]>(null);
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
|
||||
const result = useQuery({
|
||||
queryKey: ["file_changes", conversationId],
|
||||
@@ -17,7 +25,7 @@ export const useGetGitChanges = () => {
|
||||
retry: false,
|
||||
staleTime: 1000 * 60 * 5, // 5 minutes
|
||||
gcTime: 1000 * 60 * 15, // 15 minutes
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
enabled,
|
||||
meta: {
|
||||
disableToast: true,
|
||||
},
|
||||
|
||||
@@ -1,10 +1,13 @@
|
||||
import { useQuery } from "@tanstack/react-query";
|
||||
import { useTranslation } from "react-i18next";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { I18nKey } from "#/i18n/declaration";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
|
||||
// Define the return type for the VS Code URL query
|
||||
interface VSCodeUrlResult {
|
||||
@@ -15,7 +18,11 @@ interface VSCodeUrlResult {
|
||||
export const useVSCodeUrl = () => {
|
||||
const { t } = useTranslation();
|
||||
const { conversationId } = useConversationId();
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
|
||||
return useQuery<VSCodeUrlResult>({
|
||||
queryKey: ["vscode_url", conversationId],
|
||||
@@ -33,7 +40,7 @@ export const useVSCodeUrl = () => {
|
||||
error: t(I18nKey.VSCODE$URL_NOT_AVAILABLE),
|
||||
};
|
||||
},
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
enabled,
|
||||
refetchOnMount: true,
|
||||
retry: 3,
|
||||
});
|
||||
|
||||
@@ -1,19 +0,0 @@
|
||||
import { useSelector } from "react-redux";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { useActiveConversation } from "./query/use-active-conversation";
|
||||
|
||||
/**
|
||||
* Hook to determine if the runtime is ready for operations
|
||||
*
|
||||
* @returns boolean indicating if the runtime is ready
|
||||
*/
|
||||
export const useRuntimeIsReady = (): boolean => {
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
|
||||
return (
|
||||
conversation?.status === "RUNNING" &&
|
||||
!RUNTIME_INACTIVE_STATES.includes(curAgentState)
|
||||
);
|
||||
};
|
||||
@@ -6400,20 +6400,20 @@
|
||||
"uk": "Запит не вдалося виконати через внутрішню помилку сервера."
|
||||
},
|
||||
"STATUS$ERROR_LLM_OUT_OF_CREDITS": {
|
||||
"en": "You're out of OpenHands Credits. <a>Add funds</a>",
|
||||
"ja": "OpenHandsクレジットが不足しています。<a>資金を追加</a>",
|
||||
"zh-CN": "您的OpenHands点数已用完。<a>添加资金</a>",
|
||||
"zh-TW": "您的OpenHands點數已用完。<a>添加資金</a>",
|
||||
"ko-KR": "OpenHands 크레딧이 소진되었습니다. <a>자금 추가</a>",
|
||||
"no": "Du er tom for OpenHands-kreditter. <a>Legg til midler</a>",
|
||||
"it": "Hai esaurito i crediti OpenHands. <a>Aggiungi fondi</a>",
|
||||
"pt": "Você está sem créditos OpenHands. <a>Adicionar fundos</a>",
|
||||
"es": "Te has quedado sin créditos de OpenHands. <a>Añadir fondos</a>",
|
||||
"ar": "لقد نفدت رصيدك من OpenHands. <a>إضافة رصيد</a>",
|
||||
"fr": "Vous n'avez plus de crédits OpenHands. <a>Ajouter des fonds</a>",
|
||||
"tr": "OpenHands kredileriniz tükendi. <a>Bakiye ekle</a>",
|
||||
"de": "Ihre OpenHands-Guthaben sind aufgebraucht. <a>Guthaben hinzufügen</a>",
|
||||
"uk": "У вас закінчилися кредити OpenHands. <a>Додати кошти</a>"
|
||||
"en": "You're out of OpenHands Credits",
|
||||
"ja": "OpenHandsクレジットが不足しています",
|
||||
"zh-CN": "您的OpenHands点数已用完",
|
||||
"zh-TW": "您的OpenHands點數已用完",
|
||||
"ko-KR": "OpenHands 크레딧이 소진되었습니다",
|
||||
"no": "Du er tom for OpenHands-kreditter",
|
||||
"it": "Hai esaurito i crediti OpenHands",
|
||||
"pt": "Você está sem créditos OpenHands",
|
||||
"es": "Te has quedado sin créditos de OpenHands",
|
||||
"ar": "لقد نفدت رصيدك من OpenHands",
|
||||
"fr": "Vous n'avez plus de crédits OpenHands",
|
||||
"tr": "OpenHands kredileriniz tükendi",
|
||||
"de": "Ihre OpenHands-Guthaben sind aufgebraucht",
|
||||
"uk": "У вас закінчилися кредити OpenHands"
|
||||
},
|
||||
"STATUS$ERROR_LLM_CONTENT_POLICY_VIOLATION": {
|
||||
"en": "Content policy violation. The output was blocked by content filtering policy.",
|
||||
@@ -8780,7 +8780,7 @@
|
||||
"ar": "إرسال...",
|
||||
"fr": "Envoi...",
|
||||
"tr": "Gönderiliyor...",
|
||||
"de": "Senden...",
|
||||
"de": "Senden...",
|
||||
"uk": "Відправляємо..."
|
||||
},
|
||||
"FEEDBACK$SUBMITTING_MESSAGE": {
|
||||
|
||||
@@ -20,7 +20,7 @@ export interface SystemMessageAction extends OpenHandsActionEvent<"system"> {
|
||||
}
|
||||
|
||||
export interface CommandAction extends OpenHandsActionEvent<"run"> {
|
||||
source: "agent" | "user";
|
||||
source: "agent";
|
||||
args: {
|
||||
command: string;
|
||||
security_risk: ActionSecurityRisk;
|
||||
|
||||
@@ -4,7 +4,6 @@ import {
|
||||
AssistantMessageAction,
|
||||
OpenHandsAction,
|
||||
SystemMessageAction,
|
||||
CommandAction,
|
||||
} from "./actions";
|
||||
import {
|
||||
AgentStateChangeObservation,
|
||||
@@ -42,10 +41,6 @@ export const isErrorObservation = (
|
||||
): event is ErrorObservation =>
|
||||
isOpenHandsObservation(event) && event.observation === "error";
|
||||
|
||||
export const isCommandAction = (
|
||||
event: OpenHandsParsedEvent,
|
||||
): event is CommandAction => isOpenHandsAction(event) && event.action === "run";
|
||||
|
||||
export const isAgentStateChangeObservation = (
|
||||
event: OpenHandsParsedEvent,
|
||||
): event is AgentStateChangeObservation =>
|
||||
|
||||
@@ -11,7 +11,7 @@ export interface AgentStateChangeObservation
|
||||
}
|
||||
|
||||
export interface CommandObservation extends OpenHandsObservationEvent<"run"> {
|
||||
source: "agent" | "user";
|
||||
source: "agent";
|
||||
extras: {
|
||||
command: string;
|
||||
hidden?: boolean;
|
||||
|
||||
@@ -582,7 +582,7 @@ def _extract_and_validate_params(
|
||||
found_params = set()
|
||||
for param_match in param_matches:
|
||||
param_name = param_match.group(1)
|
||||
param_value = param_match.group(2)
|
||||
param_value = param_match.group(2).strip()
|
||||
|
||||
# Validate parameter is allowed
|
||||
if allowed_params and param_name not in allowed_params:
|
||||
|
||||
@@ -1013,12 +1013,12 @@ if __name__ == '__main__':
|
||||
|
||||
if not os.path.exists(full_path):
|
||||
# if user just removed a folder, prevent server error 500 in UI
|
||||
return JSONResponse(content=[])
|
||||
return []
|
||||
|
||||
try:
|
||||
# Check if the directory exists
|
||||
if not os.path.exists(full_path) or not os.path.isdir(full_path):
|
||||
return JSONResponse(content=[])
|
||||
return []
|
||||
|
||||
entries = os.listdir(full_path)
|
||||
|
||||
@@ -1047,11 +1047,11 @@ if __name__ == '__main__':
|
||||
|
||||
# Combine sorted directories and files
|
||||
sorted_entries = directories + files
|
||||
return JSONResponse(content=sorted_entries)
|
||||
return sorted_entries
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f'Error listing files: {e}')
|
||||
return JSONResponse(content=[])
|
||||
return []
|
||||
|
||||
logger.debug(f'Starting action execution API on port {args.port}')
|
||||
run(app, host='0.0.0.0', port=args.port)
|
||||
|
||||
@@ -48,34 +48,6 @@ def create_provider_tokens_object(
|
||||
return MappingProxyType(provider_information)
|
||||
|
||||
|
||||
async def setup_init_convo_settings(
|
||||
user_id: str | None, providers_set: list[ProviderType]
|
||||
) -> ConversationInitData:
|
||||
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
|
||||
settings = await settings_store.load()
|
||||
|
||||
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
|
||||
user_secrets: UserSecrets | None = await secrets_store.load()
|
||||
|
||||
if not settings:
|
||||
raise ConnectionRefusedError(
|
||||
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
|
||||
)
|
||||
|
||||
session_init_args: dict = {}
|
||||
session_init_args = {**settings.__dict__, **session_init_args}
|
||||
|
||||
git_provider_tokens = create_provider_tokens_object(providers_set)
|
||||
if server_config.app_mode != AppMode.SAAS and user_secrets:
|
||||
git_provider_tokens = user_secrets.provider_tokens
|
||||
|
||||
session_init_args['git_provider_tokens'] = git_provider_tokens
|
||||
if user_secrets:
|
||||
session_init_args['custom_secrets'] = user_secrets.custom_secrets
|
||||
|
||||
return ConversationInitData(**session_init_args)
|
||||
|
||||
|
||||
@sio.event
|
||||
async def connect(connection_id: str, environ: dict) -> None:
|
||||
try:
|
||||
@@ -113,7 +85,30 @@ async def connect(connection_id: str, environ: dict) -> None:
|
||||
conversation_id, cookies_str, authorization_header
|
||||
)
|
||||
|
||||
conversation_init_data = await setup_init_convo_settings(user_id, providers_set)
|
||||
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
|
||||
settings = await settings_store.load()
|
||||
|
||||
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
|
||||
user_secrets: UserSecrets | None = await secrets_store.load()
|
||||
|
||||
if not settings:
|
||||
raise ConnectionRefusedError(
|
||||
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
|
||||
)
|
||||
session_init_args: dict = {}
|
||||
if settings:
|
||||
session_init_args = {**settings.__dict__, **session_init_args}
|
||||
|
||||
git_provider_tokens = create_provider_tokens_object(providers_set)
|
||||
if server_config.app_mode != AppMode.SAAS and user_secrets:
|
||||
git_provider_tokens = user_secrets.provider_tokens
|
||||
|
||||
session_init_args['git_provider_tokens'] = git_provider_tokens
|
||||
if user_secrets:
|
||||
session_init_args['custom_secrets'] = user_secrets.custom_secrets
|
||||
|
||||
conversation_init_data = ConversationInitData(**session_init_args)
|
||||
|
||||
agent_loop_info = await conversation_manager.join_conversation(
|
||||
conversation_id,
|
||||
connection_id,
|
||||
|
||||
@@ -188,7 +188,10 @@ async def load_custom_secrets_names(
|
||||
) -> GETCustomSecrets | JSONResponse:
|
||||
try:
|
||||
if not user_secrets:
|
||||
return GETCustomSecrets(custom_secrets=[])
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
content={'error': 'User secrets not found'},
|
||||
)
|
||||
|
||||
custom_secrets: list[CustomSecretWithoutValueModel] = []
|
||||
if user_secrets.custom_secrets:
|
||||
@@ -217,30 +220,31 @@ async def create_custom_secret(
|
||||
) -> JSONResponse:
|
||||
try:
|
||||
existing_secrets = await secrets_store.load()
|
||||
custom_secrets = dict(existing_secrets.custom_secrets) if existing_secrets else {}
|
||||
if existing_secrets:
|
||||
custom_secrets = dict(existing_secrets.custom_secrets)
|
||||
|
||||
secret_name = incoming_secret.name
|
||||
secret_value = incoming_secret.value
|
||||
secret_description = incoming_secret.description
|
||||
secret_name = incoming_secret.name
|
||||
secret_value = incoming_secret.value
|
||||
secret_description = incoming_secret.description
|
||||
|
||||
if secret_name in custom_secrets:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
content={'message': f'Secret {secret_name} already exists'},
|
||||
if secret_name in custom_secrets:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
content={'message': f'Secret {secret_name} already exists'},
|
||||
)
|
||||
|
||||
custom_secrets[secret_name] = CustomSecret(
|
||||
secret=secret_value,
|
||||
description=secret_description or '',
|
||||
)
|
||||
|
||||
custom_secrets[secret_name] = CustomSecret(
|
||||
secret=secret_value,
|
||||
description=secret_description or '',
|
||||
)
|
||||
# Create a new UserSecrets that preserves provider tokens
|
||||
updated_user_secrets = UserSecrets(
|
||||
custom_secrets=custom_secrets,
|
||||
provider_tokens=existing_secrets.provider_tokens,
|
||||
)
|
||||
|
||||
# Create a new UserSecrets that preserves provider tokens
|
||||
updated_user_secrets = UserSecrets(
|
||||
custom_secrets=custom_secrets,
|
||||
provider_tokens=existing_secrets.provider_tokens if existing_secrets else {},
|
||||
)
|
||||
|
||||
await secrets_store.store(updated_user_secrets)
|
||||
await secrets_store.store(updated_user_secrets)
|
||||
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_201_CREATED,
|
||||
|
||||
@@ -683,29 +683,6 @@ def test_agent_config_condenser_with_no_enabled():
|
||||
assert isinstance(agent_config.condenser, NoOpCondenserConfig)
|
||||
|
||||
|
||||
def test_sandbox_volumes_toml(default_config, temp_toml_file):
|
||||
"""Test that volumes configuration under [sandbox] works correctly."""
|
||||
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
|
||||
toml_file.write("""
|
||||
[sandbox]
|
||||
volumes = "/home/user/mydir:/workspace:rw,/data:/data:ro"
|
||||
timeout = 1
|
||||
""")
|
||||
|
||||
load_from_toml(default_config, temp_toml_file)
|
||||
finalize_config(default_config)
|
||||
|
||||
# Check that sandbox.volumes is set correctly
|
||||
assert (
|
||||
default_config.sandbox.volumes
|
||||
== '/home/user/mydir:/workspace:rw,/data:/data:ro'
|
||||
)
|
||||
assert default_config.workspace_mount_path == '/home/user/mydir'
|
||||
assert default_config.workspace_mount_path_in_sandbox == '/workspace'
|
||||
assert default_config.workspace_base == '/home/user/mydir'
|
||||
assert default_config.sandbox.timeout == 1
|
||||
|
||||
|
||||
def test_condenser_config_from_toml_basic(default_config, temp_toml_file):
|
||||
"""Test loading basic condenser configuration from TOML."""
|
||||
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
|
||||
|
||||
@@ -652,34 +652,6 @@ NON_FNCALL_RESPONSE_MESSAGE = {
|
||||
<parameter=command>view</parameter>
|
||||
<parameter=path>/test/file.py</parameter>
|
||||
<parameter=view_range>[1, 10]</parameter>
|
||||
</function>""",
|
||||
),
|
||||
# Test case with indented code block to verify indentation is preserved
|
||||
(
|
||||
[
|
||||
{
|
||||
'index': 1,
|
||||
'function': {
|
||||
'arguments': '{"command": "str_replace", "path": "/test/file.py", "old_str": "def example():\\n pass", "new_str": "def example():\\n # This is indented\\n print(\\"hello\\")\\n return True"}',
|
||||
'name': 'str_replace_editor',
|
||||
},
|
||||
'id': 'test_id',
|
||||
'type': 'function',
|
||||
}
|
||||
],
|
||||
"""<function=str_replace_editor>
|
||||
<parameter=command>str_replace</parameter>
|
||||
<parameter=path>/test/file.py</parameter>
|
||||
<parameter=old_str>
|
||||
def example():
|
||||
pass
|
||||
</parameter>
|
||||
<parameter=new_str>
|
||||
def example():
|
||||
# This is indented
|
||||
print("hello")
|
||||
return True
|
||||
</parameter>
|
||||
</function>""",
|
||||
),
|
||||
],
|
||||
|
||||
@@ -138,39 +138,6 @@ async def test_add_custom_secret(test_client, file_secrets_store):
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_custom_secret_with_no_existing_secrets(
|
||||
test_client, file_secrets_store
|
||||
):
|
||||
"""Test creating a custom secret when there are no existing secrets at all."""
|
||||
|
||||
# Don't store any initial settings - this simulates a completely new user
|
||||
# or a situation where the secrets store is empty
|
||||
|
||||
# Make the POST request to add a custom secret
|
||||
add_secret_data = {
|
||||
'name': 'NEW_API_KEY',
|
||||
'value': 'new-api-key-value',
|
||||
'description': 'Test API Key',
|
||||
}
|
||||
response = test_client.post('/api/secrets', json=add_secret_data)
|
||||
assert response.status_code == 201
|
||||
|
||||
# Verify that the settings were stored with the new secret
|
||||
stored_settings = await file_secrets_store.load()
|
||||
|
||||
# Check that the secret was added
|
||||
assert 'NEW_API_KEY' in stored_settings.custom_secrets
|
||||
assert (
|
||||
stored_settings.custom_secrets['NEW_API_KEY'].secret.get_secret_value()
|
||||
== 'new-api-key-value'
|
||||
)
|
||||
assert stored_settings.custom_secrets['NEW_API_KEY'].description == 'Test API Key'
|
||||
|
||||
# Check that provider_tokens is an empty dict, not None
|
||||
assert stored_settings.provider_tokens == {}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_update_existing_custom_secret(test_client, file_secrets_store):
|
||||
"""Test updating an existing custom secret's name and description (cannot change value once set)."""
|
||||
|
||||
Reference in New Issue
Block a user