mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-04-29 03:00:45 -04:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 312993339f |
+1
-1
@@ -159,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
|
||||
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
|
||||
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
|
||||
|
||||
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.52-nikolaik`
|
||||
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.51-nikolaik`
|
||||
|
||||
## Develop inside Docker container
|
||||
|
||||
|
||||
@@ -62,17 +62,17 @@ system requirements and more information.
|
||||
|
||||
|
||||
```bash
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v ~/.openhands:/.openhands \
|
||||
-p 3000:3000 \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
```
|
||||
|
||||
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
|
||||
|
||||
+3
-3
@@ -51,17 +51,17 @@ OpenHands也可以使用Docker在本地系统上运行。
|
||||
|
||||
|
||||
```bash
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v ~/.openhands:/.openhands \
|
||||
-p 3000:3000 \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
```
|
||||
|
||||
> **注意**: 如果您在0.44版本之前使用过OpenHands,您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
|
||||
|
||||
+3
-3
@@ -42,17 +42,17 @@ OpenHandsはDockerを利用してローカル環境でも実行できます。
|
||||
> 公共ネットワークで実行していますか?[Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)を参照して、ネットワークバインディングの制限や追加のセキュリティ対策を実施してください。
|
||||
|
||||
```bash
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v ~/.openhands:/.openhands \
|
||||
-p 3000:3000 \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
```
|
||||
|
||||
**注**: バージョン0.44以前のOpenHandsを使用していた場合は、会話履歴を移行するために `mv ~/.openhands-state ~/.openhands` を実行してください。
|
||||
|
||||
@@ -12,7 +12,7 @@ services:
|
||||
- SANDBOX_API_HOSTNAME=host.docker.internal
|
||||
- DOCKER_HOST_ADDR=host.docker.internal
|
||||
#
|
||||
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.52-nikolaik}
|
||||
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.51-nikolaik}
|
||||
- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
|
||||
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
|
||||
ports:
|
||||
|
||||
+1
-1
@@ -7,7 +7,7 @@ services:
|
||||
image: openhands:latest
|
||||
container_name: openhands-app-${DATE:-}
|
||||
environment:
|
||||
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik}
|
||||
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik}
|
||||
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
|
||||
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
|
||||
ports:
|
||||
|
||||
@@ -103,7 +103,7 @@ The conversation history will be saved in `~/.openhands/sessions`.
|
||||
```bash
|
||||
docker run -it \
|
||||
--pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e SANDBOX_USER_ID=$(id -u) \
|
||||
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
|
||||
-e LLM_API_KEY=$LLM_API_KEY \
|
||||
@@ -112,7 +112,7 @@ docker run -it \
|
||||
-v ~/.openhands:/.openhands \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app-$(date +%Y%m%d%H%M%S) \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52 \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51 \
|
||||
python -m openhands.cli.main --override-cli-mode true
|
||||
```
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@ export GITHUB_TOKEN="your-token" # Required for repository operations
|
||||
# Run OpenHands
|
||||
docker run -it \
|
||||
--pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e SANDBOX_USER_ID=$(id -u) \
|
||||
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
|
||||
-e LLM_API_KEY=$LLM_API_KEY \
|
||||
@@ -73,7 +73,7 @@ docker run -it \
|
||||
-v ~/.openhands:/.openhands \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app-$(date +%Y%m%d%H%M%S) \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52 \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51 \
|
||||
python -m openhands.core.main -t "write a bash script that prints hi"
|
||||
```
|
||||
|
||||
|
||||
@@ -68,23 +68,23 @@ Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstud
|
||||
1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:
|
||||
|
||||
```bash
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v ~/.openhands:/.openhands \
|
||||
-p 3000:3000 \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
```
|
||||
|
||||
2. Wait until the server is running (see log below):
|
||||
```
|
||||
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
|
||||
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
Starting OpenHands...
|
||||
Running OpenHands as root
|
||||
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
|
||||
|
||||
@@ -91,17 +91,17 @@ This will automatically handle Docker requirements checking, image pulling, and
|
||||
#### Option 2: Using Docker Directly
|
||||
|
||||
```bash
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v ~/.openhands:/.openhands \
|
||||
-p 3000:3000 \
|
||||
--add-host host.docker.internal:host-gateway \
|
||||
--name openhands-app \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.52
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.51
|
||||
```
|
||||
|
||||
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
|
||||
|
||||
@@ -2,8 +2,6 @@
|
||||
|
||||
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
|
||||
|
||||
**UPDATE (8/12/2025): We now support running SWE-rebench evaluation (see the paper [here](https://arxiv.org/abs/2505.20411))! For how to run it, checkout [this README](./SWE-rebench.md).**
|
||||
|
||||
**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
|
||||
|
||||
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
|
||||
|
||||
@@ -1,84 +0,0 @@
|
||||
# SWE-rebench
|
||||
|
||||
<p align="center">
|
||||
<a href="https://arxiv.org/abs/2505.20411">📃 Paper</a>
|
||||
•
|
||||
<a href="https://huggingface.co/datasets/nebius/SWE-rebench">🤗 HuggingFace</a>
|
||||
•
|
||||
<a href="https://swe-rebench.com/leaderboard">📊 Leaderboard</a>
|
||||
</p>
|
||||
|
||||
SWE-rebench is a large-scale dataset for verifiable software engineering tasks.
|
||||
It comes in **two datasets**:
|
||||
|
||||
* **[`nebius/SWE-rebench-leaderboard`](https://huggingface.co/datasets/nebius/SWE-rebench-leaderboard)** – updatable benchmark used for [leaderboard evaluation](https://swe-rebench.com/leaderboard).
|
||||
* **[`nebius/SWE-rebench`](https://huggingface.co/datasets/nebius/SWE-rebench)** – full dataset with **21,302 tasks**, suitable for training or large-scale offline evaluation.
|
||||
|
||||
This document explains how to run OpenHands on SWE-rebench, using the leaderboard split as the main example.
|
||||
To run on the full dataset, simply replace the dataset name.
|
||||
|
||||
|
||||
## Setting Up
|
||||
|
||||
Set up your development environment and configure your LLM provider by following the [SWE-bench README](README.md) in this directory.
|
||||
|
||||
|
||||
## Running Inference
|
||||
|
||||
Use the existing SWE-bench inference script, changing the dataset to `nebius/SWE-rebench-leaderboard` and selecting the split (`test` for leaderboard submission):
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh \
|
||||
llm.your_llm HEAD CodeActAgent 30 50 1 nebius/SWE-rebench-leaderboard test
|
||||
```
|
||||
|
||||
Arguments:
|
||||
|
||||
* `llm.your_llm` – your model configuration key
|
||||
* `HEAD` – commit reference for reproducibility
|
||||
* `CodeActAgent` – agent type
|
||||
* `10` – number of examples to evaluate
|
||||
* `50` – maximum iterations per task (increase if needed)
|
||||
* `1` – number of workers
|
||||
* `nebius/SWE-rebench-leaderboard` – Hugging Face dataset name
|
||||
* `test` – dataset split
|
||||
|
||||
**Tip:** To run on the **full 21k dataset**, replace `nebius/SWE-rebench-leaderboard` with `nebius/SWE-rebench`.
|
||||
|
||||
|
||||
## Evaluating Results
|
||||
|
||||
After inference completes, evaluate using the [SWE-bench-fork evaluation harness](https://github.com/SWE-rebench/SWE-bench-fork).
|
||||
|
||||
1. Convert the OpenHands output to SWE-bench evaluation format:
|
||||
|
||||
```bash
|
||||
python evaluation/benchmarks/swe_bench/scripts/live/convert.py \
|
||||
--output_jsonl path/to/evaluation/output.jsonl > preds.jsonl
|
||||
```
|
||||
|
||||
2. Clone the SWE-bench-fork repo (https://github.com/SWE-rebench/SWE-bench-fork) and follow its README to install dependencies.
|
||||
|
||||
|
||||
3. Run the evaluation using the fork:
|
||||
|
||||
```bash
|
||||
python -m swebench.harness.run_evaluation \
|
||||
--dataset_name nebius/SWE-rebench-leaderboard \
|
||||
--split test \
|
||||
--predictions_path preds.jsonl \
|
||||
--max_workers 10 \
|
||||
--run_id openhands
|
||||
```
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{badertdinov2025swerebench,
|
||||
title={SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents},
|
||||
author={Badertdinov, Ibragim and Golubev, Alexander and Nekrashevich, Maksim and Shevtsov, Anton and Karasik, Simon and Andriushchenko, Andrei and Trofimova, Maria and Litvintseva, Daria and Yangel, Boris},
|
||||
journal={arXiv preprint arXiv:2505.20411},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,65 @@
|
||||
<uploaded_files>
|
||||
/workspace/{{ workspace_dir_name }}
|
||||
</uploaded_files>
|
||||
|
||||
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
|
||||
|
||||
<issue_description>
|
||||
{{ instance.problem_statement }}
|
||||
</issue_description>
|
||||
|
||||
Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
|
||||
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
|
||||
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
|
||||
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
|
||||
|
||||
Follow these phases to resolve the issue:
|
||||
|
||||
Phase 1. READING: read the problem and reword it in clearer terms
|
||||
1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
|
||||
1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
|
||||
1.3 Explain the problem in clear terms.
|
||||
1.4 Enumerate the steps to reproduce the problem.
|
||||
1.5 Hightlight any best practices to take into account when testing and fixing the issue
|
||||
|
||||
Phase 2. RUNNING: install and run the tests on the repository
|
||||
2.1 Follow the readme
|
||||
2.2 Install the environment and anything needed
|
||||
2.2 Iterate and figure out how to run the tests
|
||||
|
||||
Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
|
||||
3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
|
||||
3.2 Identify all files related to the problem statement.
|
||||
3.3 Propose the methods and files to fix the issue and explain why.
|
||||
3.4 From the possible file locations, select the most likely location to fix the issue.
|
||||
|
||||
Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
|
||||
4.1 Look at existing test files in the repository to understand the test format/structure.
|
||||
4.2 Create a minimal reproduction script that reproduces the located issue.
|
||||
4.3 Run the reproduction script to confirm you are reproducing the issue.
|
||||
4.4 Adjust the reproduction script as necessary.
|
||||
|
||||
Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
|
||||
5.1 State clearly what the problem is.
|
||||
5.2 State clearly where the problem is located.
|
||||
5.3 State clearly how the test reproduces the issue.
|
||||
5.4 State clearly the best practices to take into account in the fix.
|
||||
5.5 State clearly how to fix the problem.
|
||||
|
||||
Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
|
||||
6.1 Make minimal, focused changes to fix the issue.
|
||||
|
||||
Phase 7. VERIFICATION: Test your implementation thoroughly.
|
||||
7.1 Run your reproduction script to verify the fix works.
|
||||
7.2 Add edge cases to your test script to ensure comprehensive coverage.
|
||||
7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
|
||||
|
||||
8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
|
||||
8.1 Ensure you've fully addressed all requirements.
|
||||
8.2 Run any tests in the repository related to:
|
||||
8.2.1 The issue you are fixing
|
||||
8.2.2 The files you modified
|
||||
8.2.3 The functions you changed
|
||||
8.3 If any tests fail, revise your implementation until all tests pass
|
||||
|
||||
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
|
||||
@@ -0,0 +1,45 @@
|
||||
# Task: Fix Issue in Python Repository
|
||||
|
||||
## Repository Context
|
||||
You are provided with a Python code repository that contains an issue requiring your attention. The repository is located in a sandboxed environment, and you have access to the codebase to implement the necessary changes.
|
||||
The code repository is located at: `/workspace/{{ workspace_dir_name }}`
|
||||
(This path is provided for context; use file system tools to confirm paths before access).
|
||||
|
||||
## Goal
|
||||
Your goal is to fix the issue described in the **Issue Description** section below. Implement the necessary changes to **non-test files only** within the repository, ensuring that **all relevant tests pass** after your changes.
|
||||
|
||||
## Key Requirements & Constraints
|
||||
|
||||
1. **Understand the problem** very well: it is a bug report, and you know humans don't always write good descriptions. Explore the codebase to understand the related code and the problem in depth. It is possible that the solution needs to be a bit more extensive than just the stated text. Don't exagerate though: don't do unrelated refactoring, but also don't interpret the description too strictly.
|
||||
2. **Focus on the issues:** Implement the fix focusing on non-test files related to the issue.
|
||||
2. **Environment Ready:** The Python environment is pre-configured with all dependencies. Do not install packages.
|
||||
3. **Mandatory Testing Procedure:**
|
||||
* **Create Test to Reproduce the Issue:** *Before* implementing any fix, you MUST create a *new test* (separate from existing tests) that specifically reproduces the issue.
|
||||
* Take existing tests as example to understand the testing format/structure.
|
||||
* Enhance this test with edge cases.
|
||||
* Run this test to confirm reproduction.
|
||||
* **Verify Fix:** After implementing the fix, run your test again to verify the issue is resolved.
|
||||
* **Identify ALL Relevant Tests:** You MUST perform a **dedicated search and analysis** to identify **all** existing unit tests potentially affected by your changes. This includes:
|
||||
* Tests in the same module/directory as the changed files (e.g., `tests/` subdirectories).
|
||||
* Tests explicitly importing or using the modified code/classes/functions.
|
||||
* Tests mentioned in the issue description or related documentation.
|
||||
* Tests covering functionalities that *depend on* the modified code (analyze callers/dependencies if necessary).
|
||||
**If you cannot confidently identify a specific subset, you MUST identify and plan to run the entire test suite for the modified application or module(s). State your identified test scope clearly.**
|
||||
* **Run Identified Relevant Tests:** You MUST execute the **complete set** of relevant existing unit tests you identified in the previous step. Ensure you are running the *correct and comprehensive set* of tests. You MUST NOT modify these existing tests.
|
||||
* **Final Check & Verification:** Before finishing, ensure **all** identified relevant existing tests pass. **Explicitly confirm that you have considered potential omissions in your test selection and believe the executed tests comprehensively cover the impact of your changes.** Failing to identify and run the *complete* relevant set constitutes a failure. If any identified tests fail, revise your fix. Passing all relevant tests is the primary measure of success.
|
||||
4. **Defensive Programming:** Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
|
||||
5. **Final Review:** Compare your solution against the original issue and the base commit ({{ instance.base_commit }}) to ensure completeness and test passage.
|
||||
|
||||
## General Workflow Guidance
|
||||
|
||||
* Prioritize understanding the problem, exploring the code, planning your fix, implementing it carefully using the required diff format, and **thoroughly testing** according to the **Mandatory Testing Procedure**.
|
||||
* Consider trade-offs between different solutions. The goal is a **robust change that makes the relevant tests pass.** Quality, correctness, and reliability are key.
|
||||
* Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
|
||||
|
||||
* IMPORTANT: Your solution will be tested by additional hidden tests, so do not assume the task is complete just because visible tests pass! Refine the solution until you are confident that it is robust and comprehensive according to the **Defensive Programming** requirement.
|
||||
|
||||
## Final Note
|
||||
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
|
||||
|
||||
## Issue Description
|
||||
{{ instance.problem_statement }}
|
||||
@@ -80,8 +80,6 @@ def set_dataset_type(dataset_name: str) -> str:
|
||||
DATASET_TYPE = 'SWE-Gym'
|
||||
elif 'swe-bench-live' in name_lower:
|
||||
DATASET_TYPE = 'SWE-bench-Live'
|
||||
elif 'swe-rebench' in name_lower:
|
||||
DATASET_TYPE = 'SWE-rebench'
|
||||
elif 'multimodal' in name_lower:
|
||||
DATASET_TYPE = 'Multimodal'
|
||||
else:
|
||||
@@ -111,7 +109,9 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
|
||||
if mode.startswith('swt'):
|
||||
template_name = 'swt.j2'
|
||||
elif mode == 'swe':
|
||||
if 'gpt-4.1' in llm_model:
|
||||
if 'claude' in llm_model:
|
||||
template_name = 'swe_default.j2'
|
||||
elif 'gpt-4.1' in llm_model:
|
||||
template_name = 'swe_gpt4.j2'
|
||||
else:
|
||||
template_name = (
|
||||
@@ -180,8 +180,6 @@ def get_instance_docker_image(
|
||||
docker_image_prefix = 'docker.io/starryzhang/'
|
||||
elif DATASET_TYPE == 'SWE-bench':
|
||||
docker_image_prefix = 'docker.io/swebench/'
|
||||
elif DATASET_TYPE == 'SWE-rebench':
|
||||
docker_image_prefix = 'docker.io/swerebench/'
|
||||
repo, name = instance_id.split('__')
|
||||
image_name = f'{docker_image_prefix.rstrip("/")}/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
|
||||
logger.debug(f'Using official SWE-Bench image: {image_name}')
|
||||
@@ -322,8 +320,6 @@ def initialize_runtime(
|
||||
# inject the instance swe entry
|
||||
if DATASET_TYPE == 'SWE-bench-Live':
|
||||
entry_script_path = 'instance_swe_entry_live.sh'
|
||||
elif DATASET_TYPE == 'SWE-rebench':
|
||||
entry_script_path = 'instance_swe_entry_rebench.sh'
|
||||
else:
|
||||
entry_script_path = 'instance_swe_entry.sh'
|
||||
runtime.copy_to(
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
source ~/.bashrc
|
||||
SWEUTIL_DIR=/swe_util
|
||||
|
||||
# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
|
||||
# SWE_INSTANCE_ID=django__django-11099
|
||||
if [ -z "$SWE_INSTANCE_ID" ]; then
|
||||
echo "Error: SWE_INSTANCE_ID is not set." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
|
||||
item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
|
||||
|
||||
if [[ -z "$item" ]]; then
|
||||
echo "No item found for the provided instance ID."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
||||
WORKSPACE_NAME=$(echo "$item" | jq -r '(.repo | tostring) + "__" + (.version | tostring) | gsub("/"; "__")')
|
||||
|
||||
echo "WORKSPACE_NAME: $WORKSPACE_NAME"
|
||||
|
||||
# Clear the workspace
|
||||
if [ -d /workspace ]; then
|
||||
rm -rf /workspace/*
|
||||
else
|
||||
mkdir /workspace
|
||||
fi
|
||||
# Copy repo to workspace
|
||||
if [ -d /workspace/$WORKSPACE_NAME ]; then
|
||||
rm -rf /workspace/$WORKSPACE_NAME
|
||||
fi
|
||||
mkdir -p /workspace
|
||||
cp -r /testbed /workspace/$WORKSPACE_NAME
|
||||
|
||||
# Activate instance-specific environment
|
||||
if [ -d /opt/miniconda3 ]; then
|
||||
. /opt/miniconda3/etc/profile.d/conda.sh
|
||||
conda activate testbed
|
||||
fi
|
||||
|
||||
export PATH=/opt/conda/envs/testbed/bin:$PATH
|
||||
@@ -263,20 +263,19 @@ def prepare_dataset(
|
||||
f'Randomly sampling {eval_n_limit} unique instances with random seed 42.'
|
||||
)
|
||||
|
||||
def make_serializable(instance_dict: dict) -> dict:
|
||||
def make_serializable(instance: pd.Series) -> dict:
|
||||
import numpy as np
|
||||
|
||||
instance_dict = instance.to_dict()
|
||||
for k, v in instance_dict.items():
|
||||
if isinstance(v, np.ndarray):
|
||||
instance_dict[k] = v.tolist()
|
||||
elif isinstance(v, pd.Timestamp):
|
||||
instance_dict[k] = str(v)
|
||||
elif isinstance(v, dict):
|
||||
instance_dict[k] = make_serializable(v)
|
||||
return instance_dict
|
||||
|
||||
new_dataset = [
|
||||
make_serializable(instance.to_dict())
|
||||
make_serializable(instance)
|
||||
for _, instance in dataset.iterrows()
|
||||
if str(instance[id_column]) not in finished_ids
|
||||
]
|
||||
|
||||
Generated
+2
-2
@@ -1,12 +1,12 @@
|
||||
{
|
||||
"name": "openhands-frontend",
|
||||
"version": "0.52.0",
|
||||
"version": "0.51.1",
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"": {
|
||||
"name": "openhands-frontend",
|
||||
"version": "0.52.0",
|
||||
"version": "0.51.1",
|
||||
"dependencies": {
|
||||
"@heroui/react": "^2.8.2",
|
||||
"@heroui/use-infinite-scroll": "^2.2.10",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "openhands-frontend",
|
||||
"version": "0.52.0",
|
||||
"version": "0.51.1",
|
||||
"private": true,
|
||||
"type": "module",
|
||||
"engines": {
|
||||
|
||||
@@ -147,7 +147,6 @@ export enum I18nKey {
|
||||
SUGGESTIONS$CLEAN_DEPENDENCIES = "SUGGESTIONS$CLEAN_DEPENDENCIES",
|
||||
SETTINGS$LLM_SETTINGS = "SETTINGS$LLM_SETTINGS",
|
||||
SETTINGS$GIT_SETTINGS = "SETTINGS$GIT_SETTINGS",
|
||||
SETTINGS$GIT_SETTINGS_DESCRIPTION = "SETTINGS$GIT_SETTINGS_DESCRIPTION",
|
||||
SETTINGS$SOUND_NOTIFICATIONS = "SETTINGS$SOUND_NOTIFICATIONS",
|
||||
SETTINGS$MAX_BUDGET_PER_TASK = "SETTINGS$MAX_BUDGET_PER_TASK",
|
||||
SETTINGS$MAX_BUDGET_PER_CONVERSATION = "SETTINGS$MAX_BUDGET_PER_CONVERSATION",
|
||||
|
||||
@@ -2351,22 +2351,6 @@
|
||||
"tr": "Git Ayarları",
|
||||
"uk": "Git налаштування"
|
||||
},
|
||||
"SETTINGS$GIT_SETTINGS_DESCRIPTION": {
|
||||
"en": "Configure the username and email that OpenHands uses to commit changes.",
|
||||
"ja": "OpenHandsがコミットに使用するユーザー名とメールを設定します。",
|
||||
"zh-CN": "配置OpenHands用于提交更改的用户名和电子邮件。",
|
||||
"zh-TW": "配置OpenHands用於提交更改的用戶名和電子郵件。",
|
||||
"ko-KR": "OpenHands가 변경 사항을 커밋할 때 사용하는 사용자 이름과 이메일을 구성합니다.",
|
||||
"de": "Konfigurieren Sie den Benutzernamen und die E-Mail, die OpenHands zum Committen von Änderungen verwendet.",
|
||||
"no": "Konfigurer brukernavnet og e-posten som OpenHands bruker for å committe endringer.",
|
||||
"it": "Configura il nome utente e l'email che OpenHands utilizza per committare le modifiche.",
|
||||
"pt": "Configure o nome de usuário e o email que o OpenHands usa para fazer commits de alterações.",
|
||||
"es": "Configure el nombre de usuario y el correo electrónico que OpenHands utiliza para confirmar cambios.",
|
||||
"ar": "قم بتكوين اسم المستخدم والبريد الإلكتروني الذي يستخدمه OpenHands لارتكاب التغييرات.",
|
||||
"fr": "Configurez le nom d'utilisateur et l'email qu'OpenHands utilise pour valider les modifications.",
|
||||
"tr": "OpenHands'ın değişiklikleri commit etmek için kullandığı kullanıcı adını ve e-postayı yapılandırın.",
|
||||
"uk": "Налаштуйте ім'я користувача та електронну пошту, які OpenHands використовує для фіксації змін."
|
||||
},
|
||||
"SETTINGS$SOUND_NOTIFICATIONS": {
|
||||
"en": "Sound Notifications",
|
||||
"ja": "サウンド通知",
|
||||
|
||||
@@ -250,12 +250,9 @@ function AppSettingsScreen() {
|
||||
/>
|
||||
|
||||
<div className="border-t border-t-tertiary pt-6 mt-2 hidden">
|
||||
<h3 className="text-lg font-medium mb-2">
|
||||
<h3 className="text-lg font-medium mb-4">
|
||||
{t(I18nKey.SETTINGS$GIT_SETTINGS)}
|
||||
</h3>
|
||||
<p className="text-sm text-secondary mb-4">
|
||||
{t(I18nKey.SETTINGS$GIT_SETTINGS_DESCRIPTION)}
|
||||
</p>
|
||||
<div className="flex flex-col gap-6">
|
||||
<SettingsInput
|
||||
testId="git-user-name-input"
|
||||
|
||||
@@ -66,11 +66,6 @@ Your primary role is to assist users by executing commands, modifying code, and
|
||||
* Use APIs to work with GitHub or other platforms, unless the user asks otherwise or your task requires browsing.
|
||||
</SECURITY>
|
||||
|
||||
<EXTERNAL_SERVICES>
|
||||
* When interacting with external services like GitHub, GitLab, or Bitbucket, use their respective APIs instead of browser-based interactions whenever possible.
|
||||
* Only resort to browser-based interactions with these services if specifically requested by the user or if the required operation cannot be performed via API.
|
||||
</EXTERNAL_SERVICES>
|
||||
|
||||
<ENVIRONMENT_SETUP>
|
||||
* When user asks you to run an application, don't stop if the application is not installed. Instead, please install the application and run the command again.
|
||||
* If you encounter missing dependencies:
|
||||
|
||||
@@ -40,7 +40,7 @@ Two configuration options are required to use the Kubernetes runtime:
|
||||
2. **Runtime Container Image**: Specify the container image to use for the runtime environment
|
||||
```toml
|
||||
[sandbox]
|
||||
runtime_container_image = "docker.all-hands.dev/all-hands-ai/runtime:0.52-nikolaik"
|
||||
runtime_container_image = "docker.all-hands.dev/all-hands-ai/runtime:0.51-nikolaik"
|
||||
```
|
||||
|
||||
#### Additional Kubernetes Options
|
||||
|
||||
@@ -280,11 +280,6 @@ def prep_build_folder(
|
||||
),
|
||||
)
|
||||
|
||||
# Copy the 'microagents' directory (Microagents)
|
||||
shutil.copytree(
|
||||
Path(project_root, 'microagents'), Path(build_folder, 'code', 'microagents')
|
||||
)
|
||||
|
||||
# Copy pyproject.toml and poetry.lock files
|
||||
for file in ['pyproject.toml', 'poetry.lock']:
|
||||
src = Path(openhands_source_dir, file)
|
||||
|
||||
@@ -239,8 +239,7 @@ COPY ./code/pyproject.toml ./code/poetry.lock /openhands/code/
|
||||
# ================================================================
|
||||
RUN if [ -d /openhands/code/openhands ]; then rm -rf /openhands/code/openhands; fi
|
||||
COPY ./code/pyproject.toml ./code/poetry.lock /openhands/code/
|
||||
RUN if [ -d /openhands/code/microagents ]; then rm -rf /openhands/code/microagents; fi
|
||||
COPY ./code/microagents /openhands/code/microagents
|
||||
|
||||
COPY ./code/openhands /openhands/code/openhands
|
||||
RUN chmod a+rwx /openhands/code/openhands/__init__.py
|
||||
|
||||
|
||||
@@ -36,7 +36,6 @@ class Settings(BaseModel):
|
||||
enable_default_condenser: bool = True
|
||||
enable_sound_notifications: bool = False
|
||||
enable_proactive_conversation_starters: bool = True
|
||||
enable_solvability_analysis: bool = True
|
||||
user_consents_to_analytics: bool | None = None
|
||||
sandbox_base_container_image: str | None = None
|
||||
sandbox_runtime_container_image: str | None = None
|
||||
|
||||
+1
-1
@@ -6,7 +6,7 @@ requires = [
|
||||
|
||||
[tool.poetry]
|
||||
name = "openhands-ai"
|
||||
version = "0.52.0"
|
||||
version = "0.51.1"
|
||||
description = "OpenHands: Code Less, Make More"
|
||||
authors = [ "OpenHands" ]
|
||||
license = "MIT"
|
||||
|
||||
@@ -89,8 +89,8 @@ def test_prep_build_folder(temp_dir):
|
||||
extra_deps=None,
|
||||
)
|
||||
|
||||
# make sure that the code (openhands/) and microagents folder were copied
|
||||
assert shutil_mock.copytree.call_count == 2
|
||||
# make sure that the code was copied
|
||||
shutil_mock.copytree.assert_called_once()
|
||||
assert shutil_mock.copy2.call_count == 2
|
||||
|
||||
# Now check dockerfile is in the folder
|
||||
|
||||
Reference in New Issue
Block a user