Compare commits

..

60 Commits

Author SHA1 Message Date
Xingyao Wang
8067ae85c3 Merge branch 'main' into fix-git-coauthorship-cli-runtime 2025-08-22 09:41:09 -04:00
Xingyao Wang
4507a25b85 Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests (#10540)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-22 13:34:02 +00:00
llamantino
d9cf5b7302 ci: add GitHub Action to post welcome message on good first issues (#9707)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-08-22 09:09:45 -04:00
Xingyao Wang
2a86e32263 fix(CI): Pin @modelcontextprotocol/server-filesystem to version 2025.8.18 (#10561)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-22 05:00:11 +08:00
Engel Nyst
b311ae6e15 fix: normalize malformed <parameter> tags (Qwen3) (#10539) 2025-08-21 19:03:20 +02:00
Ryan H. Tran
adb773789a Upgrade aci to 0.3.2: clamp view_range end to file length and emit warning instead of error (#10502) 2025-08-21 23:01:54 +07:00
Engel Nyst
91d3d1d20a Fix: expose aggregated LLM metrics in State for evaluation scripts (#10537)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-21 17:43:09 +02:00
llamantino
e9e2c98946 fix(tests): increase hard timeout in test_bash_server to avoid timeout on Windows (#9930) 2025-08-21 17:12:42 +02:00
Engel Nyst
7861c1ddf7 fix(anthropic): disable extended thinking for Opus 4.1 (#10532)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-21 00:13:15 +02:00
Engel Nyst
5ce5469bfa docs: update OpenAPI specification to include all current endpoints (#10412)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-20 21:58:35 +02:00
Xingyao Wang
4a3f5dd9b4 fix(runtime): correctly set session_api_key for local runtime (#10506) 2025-08-21 03:51:19 +08:00
Joe O'Connor
bc8b995dd3 Add additional networks (#9566)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-08-20 18:52:31 +00:00
chuckbutkus
07c4742496 Add useful tools jq and gettext to image (#10531) 2025-08-20 18:27:09 +00:00
mamoodi
b5887f8a9d Fix CLI docs command (#10520) 2025-08-20 14:53:15 +00:00
mamoodi
0166df6575 Release 0.54.0 (#10465) 2025-08-20 10:29:15 -04:00
Ryan H. Tran
e03a1f4e37 Move TASKS.md to session-specific directory in ~/.openhands (#10493)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-20 22:26:55 +08:00
sp.wack
c763f0e368 chroe(vscode): Refresh vscode integration lockfile (#9965)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-08-20 15:33:11 +02:00
Engel Nyst
bb0e24d23b Centralize model feature checks (#10414)
Co-authored-by: OpenHands-GPT-5 <openhands@all-hands.dev>
2025-08-19 20:30:07 +00:00
sp.wack
aa6b454772 fix: Enhance GitHub repository search to include user organizations (#10324)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-19 15:56:15 +00:00
openhands
4e300b24b7 fix(runtime): ensure git safe.directory is configured for root user
When running DockerRuntime with run_as_openhands=False (i.e., as root),
the git safe.directory configuration was not being set up, causing
'dubious ownership' errors when git commands were executed.

This fix extracts the git configuration logic into a separate function
and ensures it's called for both root and non-root users, preventing
the 'fatal: detected dubious ownership in repository' error.

Fixes tests/runtime/test_bash.py::test_bash_remove_prefix[DockerRuntime-False]

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-19 14:51:08 +00:00
sp.wack
0297b3da18 Fix conversation ID validation to return 400 instead of 500 for long IDs (#10496) 2025-08-19 18:03:05 +04:00
Hiep Le
476954f3a4 refactor(frontend): update the styling for the microagent management page. (#10494) 2025-08-19 19:50:42 +07:00
dependabot[bot]
f296d7bde5 chore(deps): bump abatilo/actions-poetry from 3 to 4 (#10487)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-19 13:58:39 +02:00
Zacharias Fisches
f866b3f8ea Update modal runtime for modal>=1.0 (#10479)
Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>
2025-08-19 10:33:03 +00:00
Zacharias Fisches
36d31b74f7 fix jinja / dockerfile syntax by removing newlines (#10476)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-08-19 02:50:41 +00:00
openhands
6ceae397d7 tests(runtime): align timeout assertion and robust git remote setup in test_bash\n\n- get_timeout_suffix(): assert on stable prefix only\n- test_bash_remove_prefix: tolerate existing origin via add-or-set-url\n\nCo-authored-by: openhands <openhands@all-hands.dev> 2025-08-19 02:46:34 +00:00
openhands
f894c25597 runtime(docker/cli): fix git co-authorship + git safe.directory and workspace ownership
- Ensure workspace ownership and permissions are set after (or alongside) user creation
- Add defensive guard for UID=0 for non-root users (use 1000)
- Configure git safe.directory for /workspace to avoid ‘dubious ownership’ errors
- Set global core.hooksPath and init.templateDir to /openhands/git-hooks
- Ship prepare-commit-msg hook at runtime (copy from code or generate fallback) to always append
  ‘Co-authored-by: openhands <openhands@all-hands.dev>’
- BashSession: start shell in correct working dir for target user
- Command startup: never pass UID 0 to openhands user

This fixes tests/runtime/test_bash.py::test_git_co_authorship_runtime_setup for DockerRuntime and CLIRuntime.

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 23:19:34 +00:00
Engel Nyst
634a7691a2 tests: reorganize unit tests into subdirectories mirroring source modules (#10484)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-19 01:11:07 +02:00
openhands
87b936b04a bash: normalize timeout suffix format to 1 decimal for hard timeouts
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 21:51:05 +00:00
Xingyao Wang
81ba4399fa fix(frontend): fix MCP tab in frontend unit tests (#10481)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 21:25:09 +00:00
openhands
6068e4298b cli: ensure git co-authorship works in CLIRuntime
- Always prefix PATH for subprocess to use wrapper
- Configure global prepare-commit-msg hook for fallback

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 21:21:53 +00:00
Rohit Malhotra
875036d920 (Hotfix): Fix logs and filestore init for llm registry (#10470) 2025-08-18 20:57:08 +00:00
Xingyao Wang
39333dd5de feat: enable MCP in SaaS (#10480) 2025-08-18 20:40:42 +00:00
Rohit Malhotra
3660933d59 refactor: replace 'convo' naming with 'conversation' (#10473)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 15:10:32 -04:00
Xingyao Wang
baf2cc5c7e Pin OpenAI Python SDK to 1.99.9 to avoid LiteLLM import breakage (BerriAI/litellm#13711) (#10471)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
2025-08-18 18:45:34 +00:00
Rohit Malhotra
7b31d57a2f Update conversation stats filename (#10472) 2025-08-18 18:09:13 +00:00
Rohit Malhotra
61d90c31eb (Hotfix): Fix eval pipeline (#10466) 2025-08-18 12:51:51 -04:00
Xingyao Wang
3fea7fd2fc feat: improve MCP config UI with comprehensive add/edit/delete functionality (#10145)
Co-authored-by: OpenHands <openhands@all-hands.dev>
2025-08-18 16:33:27 +00:00
suixinio
c64b1ae111 fix(openrouter): Force string serialization for openrouter/anthropic/claude-sonnet-4 model (#10454) 2025-08-18 17:50:01 +02:00
Kevin Musgrave
74ba21bad0 feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench (#10270)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-08-18 14:18:08 +00:00
Engel Nyst
bef6b1afee cli: fix Ubuntu white-on-white model autocomplete by merging default prompt_toolkit UI style (#10347) 2025-08-18 20:32:09 +08:00
Graham Neubig
ad85e3249a test(e2e): Add web browsing catchphrase E2E for #10378 and wire into CI (#10401)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 08:28:42 -04:00
Engel Nyst
822ce86150 Ensure .bashrc exists (#10461) 2025-08-18 20:18:11 +08:00
Graham Neubig
305caf1257 Implement configurable base URL for E2E tests (#10394)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 07:44:07 -04:00
Rohit Malhotra
25d9cf2890 [Refactor]: Add LLMRegistry for llm services (#9589)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-08-18 02:11:20 -04:00
Engel Nyst
17b1a21296 chore(ci): enhance lint-fix workflow for FE (#10448)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 05:21:13 +02:00
Engel Nyst
97bcb2162d Add instruction to use existing repository labels in PR/MR microagents (#10446)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-18 04:35:20 +02:00
Engel Nyst
8401641f7e Docs + Code: rename ‘convo’ to ‘conversation’ across codebase and docs (#10447) 2025-08-18 04:35:02 +02:00
Engel Nyst
e2343c0927 Runtime-backend docs update (arch) - cron agent run (#10423)
Co-authored-by: OpenHands-GPT-5 <openhands@all-hands.dev>
2025-08-18 02:04:31 +02:00
Xingyao Wang
277064720c chore: remove timeout >600s warning log in Event.set_hard_timeout (#10444)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-17 23:25:13 +02:00
Xingyao Wang
ef3e0c8dfe Fix think observation redundant rendering in frontend (#10409)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-17 10:55:03 +08:00
Engel Nyst
315d391414 Revert "tests: reorganize unit tests into subdirectories mirroring source modules" (#10437) 2025-08-17 00:33:17 +00:00
openhands
388e3ba496 Revert "Fix git config tests by ensuring local module imports"
This reverts commit 2fd68cef2f.
2025-08-15 22:01:28 +00:00
openhands
2fd68cef2f Fix git config tests by ensuring local module imports
The tests were failing because the poetry environment was using an
installed version of the package from /openhands/code/ instead of the
local development version. This caused the tests to use the old version
of get_action_execution_server_startup_command that didn't include git
configuration arguments.

Fixed by adding the current directory to sys.path at the beginning of
the test file to ensure local modules are imported first.

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-15 21:46:04 +00:00
openhands
e1788a74c5 Fix Docker build: Move git hook setup to separate RUN command
- Move git hook setup after source code is copied
- Separate RUN command prevents build failure when trying to copy hooks before source exists
- Git hooks are now set up after COPY ./code/openhands command
- This ensures the prepare-commit-msg file exists before trying to copy it

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 22:15:35 +00:00
Xingyao Wang
f046982d41 Update openhands/runtime/impl/cli/cli_runtime.py 2025-08-14 06:14:09 +08:00
openhands
e600225f0f Move CLI git wrapper to ~/.openhands/bin
- Use ~/.openhands/bin instead of workspace .openhands_bin directory
- This follows standard user binary patterns and persists across workspaces
- Avoids cluttering workspace with runtime infrastructure
- Updated test to check new location

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 22:05:06 +00:00
openhands
dd8401cc98 Update test to expect co-authorship for all runtimes
All runtimes have git hooks installed via Dockerfile.j2, so they should all
automatically add co-authorship. CLI runtime has additional PATH-based wrapper
but the base hook functionality works universally.

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 22:03:02 +00:00
openhands
e99f41372a Fix test to not manually set up git hooks
- Remove manual git hook setup from test - runtime should handle this
- Rename test to reflect it tests runtime setup, not just hooks
- Make test work with different runtime types (CLI uses wrapper, others may differ)
- Test should verify runtime's co-authorship mechanism, not set it up

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 21:59:49 +00:00
openhands
e43f73f643 Implement automatic git co-authorship in CLI runtime using PATH-based wrapper
- Replace conditional environment variable check with always-enabled git co-authorship
- Use PATH manipulation instead of command wrapping to handle chained commands
- Create modified git wrapper that uses full path to real git executable to avoid recursion
- Update tests to reflect always-enabled behavior
- Add comprehensive documentation for git hooks and wrapper functionality

Fixes https://github.com/All-Hands-AI/OpenHands/issues/9957

Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 21:51:38 +00:00
225 changed files with 24232 additions and 16956 deletions

View File

@@ -22,7 +22,7 @@ jobs:
uses: actions/checkout@v4
- name: Install poetry via pipx
uses: abatilo/actions-poetry@v3
uses: abatilo/actions-poetry@v4
with:
poetry-version: 2.1.3
@@ -183,7 +183,11 @@ jobs:
# Run the tests with detailed output
cd tests/e2e
poetry run python -m pytest test_settings.py::test_github_token_configuration test_conversation.py::test_conversation_start -v --no-header --capture=no --timeout=600
poetry run python -m pytest \
test_settings.py::test_github_token_configuration \
test_conversation.py::test_conversation_start \
test_browsing_catchphrase.py::test_browsing_catchphrase \
-v --no-header --capture=no --timeout=900
- name: Upload test results
if: always()

View File

@@ -29,6 +29,12 @@ jobs:
run: |
cd frontend
npm install --frozen-lockfile
- name: Generate i18n and route types
run: |
cd frontend
npm run make-i18n
npx react-router typegen || true
- name: Fix frontend lint issues
run: |
cd frontend
@@ -45,7 +51,7 @@ jobs:
git config --local user.email "openhands@all-hands.dev"
git config --local user.name "OpenHands Bot"
git add -A
git commit -m "🤖 Auto-fix frontend linting issues"
git commit -m "🤖 Auto-fix frontend linting issues" --no-verify
git push
# Python lint fixes
@@ -87,5 +93,5 @@ jobs:
git config --local user.email "openhands@all-hands.dev"
git config --local user.name "OpenHands Bot"
git add -A
git commit -m "🤖 Auto-fix Python linting issues"
git commit -m "🤖 Auto-fix Python linting issues" --no-verify
git push

View File

@@ -0,0 +1,50 @@
name: Welcome Good First Issue
on:
issues:
types: [labeled]
permissions:
issues: write
jobs:
comment-on-good-first-issue:
if: github.event.label.name == 'good first issue'
runs-on: ubuntu-latest
steps:
- name: Check if welcome comment already exists
id: check_comment
uses: actions/github-script@v7
with:
result-encoding: string
script: |
const issueNumber = context.issue.number;
const comments = await github.rest.issues.listComments({
...context.repo,
issue_number: issueNumber
});
const alreadyCommented = comments.data.some(
(comment) =>
comment.body.includes('<!-- auto-comment:good-first-issue -->')
);
return alreadyCommented ? 'true' : 'false';
- name: Leave welcome comment
if: steps.check_comment.outputs.result == 'false'
uses: actions/github-script@v7
with:
script: |
const repoUrl = `https://github.com/${context.repo.owner}/${context.repo.repo}`;
await github.rest.issues.createComment({
...context.repo,
issue_number: context.issue.number,
body: "🙌 **Hey there, future contributor!** 🙌\n\n" +
"This issue has been labeled as **good first issue**, which means it's a great place to get started with the OpenHands project.\n\n" +
"If you're interested in working on it, feel free to! No need to ask for permission.\n\n" +
"Be sure to check out our [development setup guide](" + repoUrl + "/blob/main/Development.md) to get your environment set up, and follow our [contribution guidelines](" + repoUrl + "/blob/main/CONTRIBUTING.md) when you're ready to submit a fix.\n\n" +
"🙌 Happy hacking! 🙌\n\n" +
"<!-- auto-comment:good-first-issue -->"
});

2
.gitignore vendored
View File

@@ -257,3 +257,5 @@ containers/runtime/code
# test results
test-results
.eval_sessions

View File

@@ -87,6 +87,8 @@ VSCode Extension:
If you are starting a pull request (PR), please follow the template in `.github/pull_request_template.md`.
If you need to add labels when opening a PR, check the existing labels defined on that repository and select from existing ones. Do not invent your own labels.
## Implementation Details
These details may or may not be useful for your current task.
@@ -142,6 +144,35 @@ Your specialized knowledge and instructions here...
- Add the setting to the `Settings` model in `openhands/storage/data_models/settings.py`
- Update any relevant backend code to apply the setting (e.g., in session creation)
#### Settings UI Patterns:
There are two main patterns for saving settings in the OpenHands frontend:
**Pattern 1: Entity-based Resources (Immediate Save)**
- Used for: API Keys, Secrets, MCP Servers
- Behavior: Changes are saved immediately when user performs actions (add/edit/delete)
- Implementation:
- No "Save Changes" button
- No local state management or `isDirty` tracking
- Uses dedicated mutation hooks for each operation (e.g., `use-add-mcp-server.ts`, `use-delete-mcp-server.ts`)
- Each mutation triggers immediate API call with query invalidation for UI updates
- Example: MCP settings, API Keys & Secrets tabs
- Benefits: Simpler UX, no risk of losing changes, consistent with modern web app patterns
**Pattern 2: Form-based Settings (Manual Save)**
- Used for: Application settings, LLM configuration
- Behavior: Changes are accumulated locally and saved when user clicks "Save Changes"
- Implementation:
- Has "Save Changes" button that becomes enabled when changes are detected
- Uses local state management with `isDirty` tracking
- Uses `useSaveSettings` hook to save all changes at once
- Example: LLM tab, Application tab
- Benefits: Allows bulk changes, explicit save action, can validate all fields before saving
**When to use each pattern:**
- Use Pattern 1 (Immediate Save) for entity management where each item is independent
- Use Pattern 2 (Manual Save) for configuration forms where settings are interdependent or need validation
### Adding New LLM Models
To add a new LLM model to OpenHands, you need to update multiple files across both frontend and backend:

47
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,47 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/)
- id: end-of-file-fixer
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/)
- id: check-yaml
args: ["--allow-multiple-documents"]
- id: debug-statements
- repo: https://github.com/tox-dev/pyproject-fmt
rev: v2.5.1
hooks:
- id: pyproject-fmt
- repo: https://github.com/abravalheri/validate-pyproject
rev: v0.24.1
hooks:
- id: validate-pyproject
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.11.8
hooks:
# Run the linter.
- id: ruff
entry: ruff check --config dev_config/python/ruff.toml
types_or: [python, pyi, jupyter]
args: [--fix, --unsafe-fixes]
exclude: third_party/
# Run the formatter.
- id: ruff-format
entry: ruff format --config dev_config/python/ruff.toml
types_or: [python, pyi, jupyter]
exclude: third_party/
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.15.0
hooks:
- id: mypy
additional_dependencies:
[types-requests, types-setuptools, types-pyyaml, types-toml, types-docker, types-Markdown, pydantic, lxml]
# To see gaps add `--html-report mypy-report/`
entry: mypy --config-file dev_config/python/mypy.ini openhands/
always_run: true
pass_filenames: false

View File

@@ -159,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.53-nikolaik`
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.54-nikolaik`
## Develop inside Docker container

View File

@@ -79,17 +79,17 @@ You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)
You can also run OpenHands directly with Docker:
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.53
docker.all-hands.dev/all-hands-ai/openhands:0.54
```
</details>

View File

@@ -51,17 +51,17 @@ OpenHands也可以使用Docker在本地系统上运行。
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.53
docker.all-hands.dev/all-hands-ai/openhands:0.54
```
> **注意**: 如果您在0.44版本之前使用过OpenHands您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。

View File

@@ -42,17 +42,17 @@ OpenHandsはDockerを利用してローカル環境でも実行できます。
> 公共ネットワークで実行していますか?[Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)を参照して、ネットワークバインディングの制限や追加のセキュリティ対策を実施してください。
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.53
docker.all-hands.dev/all-hands-ai/openhands:0.54
```
**注**: バージョン0.44以前のOpenHandsを使用していた場合は、会話履歴を移行するために `mv ~/.openhands-state ~/.openhands` を実行してください。

View File

@@ -21,7 +21,7 @@ ENV POETRY_NO_INTERACTION=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache
RUN apt-get update -y \
&& apt-get install -y curl make git build-essential \
&& apt-get install -y curl make git build-essential jq gettext \
&& python3 -m pip install poetry --break-system-packages
COPY pyproject.toml poetry.lock ./

View File

@@ -12,7 +12,7 @@ services:
- SANDBOX_API_HOSTNAME=host.docker.internal
- DOCKER_HOST_ADDR=host.docker.internal
#
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.53-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.54-nikolaik}
- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

View File

@@ -7,7 +7,7 @@ services:
image: openhands:latest
container_name: openhands-app-${DATE:-}
environment:
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

File diff suppressed because it is too large Load Diff

View File

Before

Width:  |  Height:  |  Size: 113 KiB

After

Width:  |  Height:  |  Size: 113 KiB

View File

@@ -2,55 +2,102 @@
title: Backend Architecture
---
<div style={{ textAlign: 'center' }}>
<img src="https://github.com/All-Hands-AI/OpenHands/assets/16201837/97d747e3-29d8-4ccb-8d34-6ad1adb17f38" alt="OpenHands System Architecture Diagram Jul 4 2024" />
<p><em>OpenHands System Architecture Diagram (July 4, 2024)</em></p>
</div>
This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents.
# Frontend architecture
# System overview
![system_architecture.svg](/static/img/system_architecture.svg)
```mermaid
flowchart LR
U["User"] --> FE["Frontend (SPA)"]
FE -- "HTTP/WS" --> BE["OpenHands Backend"]
BE --> ES["EventStream"]
BE --> ST["Storage"]
BE --> RT["Runtime Interface"]
BE --> LLM["LLM Providers"]
subgraph Runtime
direction TB
RT --> DRT["Docker Runtime"]
RT --> LRT["Local Runtime"]
RT --> RRT["Remote Runtime"]
DRT --> AES["Action Execution Server"]
LRT --> AES
RRT --> AES
AES --> Bash["Bash Session"]
AES --> Jupyter["Jupyter Plugin"]
AES --> Browser["BrowserEnv"]
end
```
This Overview is simplified to show the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below.
# Backend Architecture
_**Disclaimer**: The backend architecture is a work in progress and is subject to change. The following diagram shows the current architecture of the backend based on the commit that is shown in the footer of the diagram._
![backend_architecture.svg](/static/img/backend_architecture.svg)
```mermaid
classDiagram
class Agent {
<<abstract>>
+sandbox_plugins: list[PluginRequirement]
}
class CodeActAgent {
+tools
}
Agent <|-- CodeActAgent
class EventStream
class Observation
class Action
Action --> Observation
Agent --> EventStream
class Runtime {
+connect()
+send_action_for_execution()
}
class ActionExecutionClient {
+_send_action_server_request()
}
class DockerRuntime
class LocalRuntime
class RemoteRuntime
Runtime <|-- ActionExecutionClient
ActionExecutionClient <|-- DockerRuntime
ActionExecutionClient <|-- LocalRuntime
ActionExecutionClient <|-- RemoteRuntime
class ActionExecutionServer {
+/execute_action
+/alive
}
class BashSession
class JupyterPlugin
class BrowserEnv
ActionExecutionServer --> BashSession
ActionExecutionServer --> JupyterPlugin
ActionExecutionServer --> BrowserEnv
Agent --> Runtime
Runtime ..> ActionExecutionServer : REST
```
<details>
<summary>Updating this Diagram</summary>
<div>
The generation of the backend architecture diagram is partially automated.
The diagram is generated from the type hints in the code using the py2puml
tool. The diagram is then manually reviewed, adjusted and exported to PNG
and SVG.
We maintain architecture diagrams inline with Mermaid in this MDX.
## Prerequisites
- Running python environment in which openhands is executable
(according to the instructions in the README.md file in the root of the repository)
- [py2puml](https://github.com/lucsorel/py2puml) installed
## Steps
1. Autogenerate the diagram by running the following command from the root of the repository:
`py2puml openhands openhands > docs/architecture/backend_architecture.puml`
2. Open the generated file in a PlantUML editor, e.g. Visual Studio Code with the PlantUML extension or [PlantText](https://www.planttext.com/)
3. Review the generated PUML and make all necessary adjustments to the diagram (add missing parts, fix mistakes, improve positioning).
_py2puml creates the diagram based on the type hints in the code, so missing or incorrect type hints may result in an incomplete or incorrect diagram._
4. Review the diff between the new and the previous diagram and manually check if the changes are correct.
_Make sure not to remove parts that were manually added to the diagram in the past and are still relevant._
5. Add the commit hash of the commit that was used to generate the diagram to the diagram footer.
6. Export the diagram as PNG and SVG files and replace the existing diagrams in the `docs/architecture` directory. This can be done with (e.g. [PlantText](https://www.planttext.com/))
Guidance:
- Edit the Mermaid blocks directly (flowchart/classDiagram).
- Quote labels and edge text for GitHub preview compatibility.
- Keep relationships concise and reflect stable abstractions (agents, runtime client/server, plugins).
- Verify accuracy against code:
- openhands/runtime/impl/action_execution/action_execution_client.py
- openhands/runtime/impl/docker/docker_runtime.py
- openhands/runtime/impl/local/local_runtime.py
- openhands/runtime/action_execution_server.py
- openhands/runtime/plugins/*
- Build docs locally or view on GitHub to confirm diagrams render.
</div>
</details>

View File

@@ -52,7 +52,7 @@ graph TD
2. Image Building: OpenHands builds a new Docker image (the "OH runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client"
3. Container Launch: When OpenHands starts, it launches a Docker container using the OH runtime image
4. Action Execution Server Initialization: The action execution server initializes an `ActionExecutor` inside the container, setting up necessary components like a bash shell and loading any specified plugins
5. Communication: The OpenHands backend (`openhands/runtime/impl/eventstream/eventstream_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations
5. Communication: The OpenHands backend (client: `openhands/runtime/impl/action_execution/action_execution_client.py`; runtimes: `openhands/runtime/impl/docker/docker_runtime.py`, `openhands/runtime/impl/local/local_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations
6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations
7. Observation Return: The action execution server sends execution results back to the OpenHands backend as observations
@@ -72,7 +72,7 @@ Check out the [relevant code](https://github.com/All-Hands-AI/OpenHands/blob/mai
### Image Tagging System
OpenHands uses a three-tag system for its runtime images to balance reproducibility with flexibility.
Tags may be in one of 2 formats:
The tags are:
- **Versioned Tag**: `oh_v{openhands_version}_{base_image}` (e.g.: `oh_v0.9.9_nikolaik_s_python-nodejs_t_python3.12-nodejs22`)
- **Lock Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}` (e.g.: `oh_v0.9.9_1234567890abcdef`)
@@ -119,18 +119,52 @@ This tagging approach allows OpenHands to efficiently manage both development an
2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images)
3. The **lock** tag (e.g., `runtime:oh_v0.9.3_1234567890abcdef`) always points to the latest build for a particular base image, dependency, and OpenHands version combination
## Volume mounts: named volumes and overlay
OpenHands supports both bind mounts and Docker named volumes in SandboxConfig.volumes:
- Bind mount: "/abs/host/path:/container/path[:mode]"
- Named volume: "volume:<name>:/container/path[:mode]" or any non-absolute host spec treated as a named volume
Overlay mode (copy-on-write layer) is supported for bind mounts by appending ":overlay" to the mode (e.g., ":ro,overlay").
To enable overlay COW, set SANDBOX_VOLUME_OVERLAYS to a writable host directory; per-container upper/work dirs are created under it. If SANDBOX_VOLUME_OVERLAYS is unset, overlay mounts are skipped.
Implementation references:
- openhands/runtime/impl/docker/docker_runtime.py (named volumes in _build_docker_run_args; overlay mounts in _process_overlay_mounts)
- openhands/core/config/sandbox_config.py (volumes field)
## Runtime Plugin System
The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the runtime client starts up.
The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the action execution server starts up inside the runtime.
Check [an example of Jupyter plugin here](https://github.com/All-Hands-AI/OpenHands/blob/ecf4aed28b0cf7c18d4d8ff554883ba182fc6bdd/openhands/runtime/plugins/jupyter/__init__.py#L21-L55) if you want to implement your own plugin.
## Ports and URLs
*More details about the Plugin system are still under construction - contributions are welcomed!*
- Host port allocation uses file-locked ranges for stability and concurrency:
- Main runtime port: find_available_port_with_lock on configured range
- VSCode port: SandboxConfig.sandbox.vscode_port if provided, else find_available_port_with_lock in VSCODE_PORT_RANGE
- App ports: two additional ranges for plugin/web apps
- DOCKER_HOST_ADDR (if set) adjusts how URLs are formed for LocalRuntime/Docker environments.
- VSCode URL is exposed with a connection token from the action execution server endpoint /vscode/connection_token and rendered as:
- Docker/Local: http://localhost:{port}/?tkn={token}&folder={workspace_mount_path_in_sandbox}
- RemoteRuntime: scheme://vscode-{host}/?tkn={token}&folder={workspace_mount_path_in_sandbox}
References:
- openhands/runtime/impl/docker/docker_runtime.py (port ranges, locking, DOCKER_HOST_ADDR, vscode_url)
- openhands/runtime/impl/local/local_runtime.py (vscode_url factory)
- openhands/runtime/impl/remote/remote_runtime.py (vscode_url mapping)
- openhands/runtime/action_execution_server.py (/vscode/connection_token)
Examples:
- Jupyter: openhands/runtime/plugins/jupyter/__init__.py (JupyterPlugin, Kernel Gateway)
- VS Code: openhands/runtime/plugins/vscode/* (VSCodePlugin, exposes tokenized URL)
- Agent Skills: openhands/runtime/plugins/agent_skills/*
Key aspects of the plugin system:
1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class
2. Plugin Registration: Available plugins are registered in an `ALL_PLUGINS` dictionary
2. Plugin Registration: Available plugins are registered in `openhands/runtime/plugins/__init__.py` via `ALL_PLUGINS`
3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime
4. Initialization: Plugins are initialized asynchronously when the runtime client starts
5. Usage: The runtime client can use initialized plugins to extend its capabilities (e.g., the JupyterPlugin for running IPython cells)
4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions
5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping

View File

@@ -65,7 +65,7 @@ To send follow-up messages for the same conversation, mention `@openhands` in a
Conversation is started by mentioning `@openhands`.
![slack-create-convo.png](/static/img/slack-create-convo.png)
![slack-create-conversation.png](/static/img/slack-create-conversation.png)
### See agent response and send follow up messages

View File

@@ -119,7 +119,7 @@ The conversation history will be saved in `~/.openhands/sessions`.
```bash
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
@@ -128,8 +128,8 @@ docker run -it \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.53 \
python -m openhands.cli.main --override-cli-mode true
docker.all-hands.dev/all-hands-ai/openhands:0.54 \
python -m openhands.cli.entry --override-cli-mode true
```
<Note>

View File

@@ -61,7 +61,7 @@ export GITHUB_TOKEN="your-token" # Required for repository operations
# Run OpenHands
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
@@ -73,7 +73,7 @@ docker run -it \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.53 \
docker.all-hands.dev/all-hands-ai/openhands:0.54 \
python -m openhands.core.main -t "write a bash script that prints hi"
```

View File

@@ -68,23 +68,23 @@ Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstud
1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.53
docker.all-hands.dev/all-hands-ai/openhands:0.54
```
2. Wait until the server is running (see log below):
```
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.53
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.54
Starting OpenHands...
Running OpenHands as root
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None

View File

@@ -109,17 +109,17 @@ Note that you'll still need `uv` installed for the default MCP servers to work p
<Accordion title="Docker Command (Click to expand)">
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.53-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.54-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.53
docker.all-hands.dev/all-hands-ai/openhands:0.54
```
</Accordion>

View File

@@ -130,3 +130,28 @@ docker run # ... \
<Note>
**Docker Desktop Required**: Network isolation features, including custom networks and `host.docker.internal` routing, require Docker Desktop. Docker Engine alone does not support these features on localhost across custom networks. If you're using Docker Engine without Docker Desktop, network isolation may not work as expected.
</Note>
### Sidecar Containers
If you want to run sidecar containers to the sandbox 'runner' containers without exposing the sandbox containers to the host network, you can use the `SANDBOX_ADDITIONAL_NETWORKS` environment variable to specify additional Docker network names that should be added to the sandbox containers.
```bash
docker network create openhands-sccache
docker run -d \
--hostname openhandsredis \
--network openhands-sccache \
redis
docker run # ...
-e SANDBOX_ADDITIONAL_NETWORKS='["openhands-sccache"]' \
# ...
```
Then all sandbox instances will have to access a shared redis instance at `openhandsredis:6379`.
#### Docker Compose gotcha
Note that Docker Compose adds a prefix (a scope) by default to created networks, which is not taken into account by the additional networks config. Therefore when using docker compose you have to either:
- specify a network name via the `name` field to remove the scoping (https://docs.docker.com/reference/compose-file/networks/#name)
- or provide the scope within the given config (e.g. `SANDBOX_ADDITIONAL_NETWORKS: '["myscope_openhands-sccache"]'` where `myscope` is the docker-compose assigned prefix).

View File

@@ -9,7 +9,8 @@ from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -60,18 +61,15 @@ AGENT_CLS_TO_INST_SUFFIX = {
def get_config(
metadata: EvalMetadata,
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
# Create config with EDA-specific container image
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
# Override the container image for EDA
config.sandbox.base_container_image = 'python:3.12-bookworm'
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
agent_config.enable_prompt_extensions = False
@@ -146,7 +144,7 @@ def process_instance(
logger.info(f'Final message: {final_message} | Ground truth: {instance["text"]}')
test_result = game.reward()
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -17,7 +17,8 @@ from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -40,19 +41,12 @@ from openhands.utils.async_utils import call_async_from_sync
def get_config(
metadata: EvalMetadata,
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-slim'
# Create config with agent_bench-specific container image
config = get_openhands_config_for_eval(metadata=metadata)
# Override the container image for agent_bench
config.sandbox.base_container_image = 'python:3.12-slim'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
agent_config.enable_prompt_extensions = False
@@ -273,7 +267,7 @@ def process_instance(
# remove when it becomes unnecessary
histories = compatibility_for_eval_history_pairs(state.history)
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Save the output
output = EvalOutput(

View File

@@ -17,6 +17,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -49,15 +51,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.11-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
sandbox_config=sandbox_config,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -246,7 +243,7 @@ def process_instance(
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary
histories = compatibility_for_eval_history_pairs(state.history)
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Save the output
output = EvalOutput(

View File

@@ -15,6 +15,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -60,15 +62,10 @@ def get_config(
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = BIOCODER_BENCH_CONTAINER_IMAGE
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -294,7 +291,7 @@ def process_instance(
raise ValueError('State should not be None.')
test_result = complete_runtime(runtime, instance)
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary

View File

@@ -18,6 +18,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -74,15 +76,10 @@ def get_config(
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -422,7 +419,7 @@ def process_instance(
# You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -11,6 +11,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -39,14 +41,8 @@ def get_config(
)
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
workspace_base=None,
workspace_mount_path=None,
config = get_openhands_config_for_eval(
metadata=metadata, runtime='docker', sandbox_config=sandbox_config
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -88,7 +84,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary

View File

@@ -16,6 +16,8 @@ from evaluation.utils.shared import (
assert_and_raise,
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -113,16 +115,11 @@ def get_config(
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = base_container_image
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
enable_browser=RUN_WITH_BROWSING,
config = get_openhands_config_for_eval(
metadata=metadata,
sandbox_config=sandbox_config,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
enable_browser=RUN_WITH_BROWSING,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
@@ -480,7 +477,7 @@ def process_instance(
# NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
histories = [event_to_dict(event) for event in state.history]
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Save the output
output = EvalOutput(

View File

@@ -17,6 +17,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -64,15 +66,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -294,7 +291,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
test_result = complete_runtime(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)

View File

@@ -22,6 +22,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -59,15 +61,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
sandbox_config=sandbox_config,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
config.set_llm_config(metadata.llm_config)
if metadata.agent_config:
@@ -269,7 +266,7 @@ Here is the task:
'model_answer': model_answer,
'ground_truth': instance['Final answer'],
}
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -12,6 +12,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -42,15 +44,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -108,7 +105,7 @@ def process_instance(
# attempt to parse model_answer
ast_eval_fn = instance['ast_eval']
correct, hallucination = ast_eval_fn(instance_id, model_answer_raw)
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
logger.info(
f'Final message: {model_answer_raw} | Correctness: {correct} | Hallucination: {hallucination}'
)

View File

@@ -30,6 +30,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -63,15 +65,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -292,7 +289,7 @@ Ok now its time to start solving the question. Good luck!
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Save the output
output = EvalOutput(

View File

@@ -23,6 +23,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -84,15 +86,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -248,7 +245,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
test_result = complete_runtime(runtime, instance)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)

View File

@@ -16,6 +16,7 @@ import ruamel.yaml
from evaluation.utils.shared import (
EvalMetadata,
get_default_sandbox_config_for_eval,
get_openhands_config_for_eval,
make_metadata,
)
from openhands.core.config import (
@@ -37,15 +38,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)

View File

@@ -22,6 +22,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -47,15 +49,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -335,7 +332,7 @@ Be thorough in your exploration, testing, and reasoning. It's fine if your think
)
)
assert state is not None
metrics = state.metrics.get() if state.metrics else {}
metrics = get_metrics(state)
test_result = complete_runtime(runtime, instance)

View File

@@ -10,6 +10,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -51,15 +53,10 @@ def get_config(
'$OH_INTERPRETER_PATH -m pip install scitools-pyke'
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -247,7 +244,7 @@ def process_instance(
)
test_result['final_message'] = final_message
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here
# remove when it becomes unnecessary

View File

@@ -13,6 +13,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -57,15 +59,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'xingyaoww/od-eval-miniwob:v1.0'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
@@ -174,7 +171,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Instruction is the first message from the USER
instruction = ''

View File

@@ -15,6 +15,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -109,15 +111,10 @@ def get_config(
f'$OH_INTERPRETER_PATH -m pip install {" ".join(MINT_DEPENDENCIES)}'
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -205,7 +202,7 @@ def process_instance(
task_state = state.extra_data['task_state']
logger.info('Task state: ' + str(task_state.to_dict()))
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -26,6 +26,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -79,15 +81,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'public.ecr.aws/i5g0m1f6/ml-bench'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -250,7 +247,7 @@ def process_instance(instance: Any, metadata: EvalMetadata, reset_logger: bool =
)
)
assert state is not None
metrics = state.metrics.get() if state.metrics else {}
metrics = get_metrics(state)
test_result = complete_runtime(runtime)

View File

@@ -23,6 +23,7 @@ from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
get_default_sandbox_config_for_eval,
get_openhands_config_for_eval,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
@@ -87,13 +88,9 @@ def get_config(metadata: EvalMetadata, instance: pd.Series) -> OpenHandsConfig:
dataset_name=metadata.dataset,
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
run_as_openhands=False,
config = get_openhands_config_for_eval(
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
return config

View File

@@ -21,6 +21,7 @@ from evaluation.utils.shared import (
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -341,16 +342,11 @@ def get_config(
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
config = get_openhands_config_for_eval(
metadata=metadata,
enable_browser=RUN_WITH_BROWSING,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(

View File

@@ -31,6 +31,7 @@ from evaluation.utils.shared import (
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -174,15 +175,10 @@ def get_config(
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(

View File

@@ -12,6 +12,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -63,16 +65,10 @@ def get_config(
sandbox_config.base_container_image = (
'docker.io/xingyaoww/openhands-eval-scienceagentbench'
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime=os.environ.get('RUNTIME', 'docker'),
max_budget_per_task=4,
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
@@ -218,7 +214,7 @@ If the program uses some packages that are incompatible, please figure out alter
# You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -93,6 +93,9 @@ export USE_HINT_TEXT=true # Ignore this if you are not sure.
# Specify a condenser configuration for memory management (default: NoOpCondenser)
export EVAL_CONDENSER=summarizer_for_eval # Name of the condenser config group in config.toml
# Specify the instruction prompt template file name
export INSTRUCTION_TEMPLATE_NAME=swe_custom.j2 # Name of the file in the swe_bench/prompts folder.
```
Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,

View File

@@ -19,6 +19,7 @@ from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
get_default_sandbox_config_for_eval,
get_openhands_config_for_eval,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
@@ -83,13 +84,9 @@ def get_config(metadata: EvalMetadata, instance: pd.Series) -> OpenHandsConfig:
dataset_name=metadata.dataset,
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
run_as_openhands=False,
config = get_openhands_config_for_eval(
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
return config

View File

@@ -32,6 +32,7 @@ from evaluation.utils.shared import (
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -108,7 +109,9 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
llm_model = metadata.llm_config.model
# Determine the template file based on mode and LLM
if mode.startswith('swt'):
if metadata.instruction_template_name:
template_name = metadata.instruction_template_name
elif mode.startswith('swt'):
template_name = 'swt.j2'
elif mode == 'swe':
if 'gpt-4.1' in llm_model:
@@ -122,6 +125,7 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
logger.error(f'Unexpected evaluation mode: {mode}. Falling back to default.')
template_name = 'swe_default.j2'
logger.debug(f'Using instruction template file: {template_name}')
# Set up Jinja2 environment
# Assuming templates are in 'evaluation/benchmarks/swe_bench/prompts' relative to this script
prompts_dir = os.path.join(os.path.dirname(__file__), 'prompts')
@@ -224,16 +228,11 @@ def get_config(
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
config = get_openhands_config_for_eval(
metadata=metadata,
enable_browser=RUN_WITH_BROWSING,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(

View File

@@ -21,6 +21,7 @@ from evaluation.utils.shared import (
EvalException,
EvalMetadata,
EvalOutput,
get_metrics,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -179,7 +180,7 @@ def process_instance(
raise ValueError('State should not be None.')
histories = [event_to_dict(event) for event in state.history]
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Save the output
instruction = message_action.content

View File

@@ -20,6 +20,7 @@ from evaluation.utils.shared import (
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -199,16 +200,11 @@ def get_config(
'REPO_PATH': f'/workspace/{workspace_dir_name}/',
}
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
config = get_openhands_config_for_eval(
metadata=metadata,
enable_browser=RUN_WITH_BROWSING,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(

View File

@@ -37,6 +37,7 @@ from evaluation.benchmarks.testgeneval.utils import load_testgeneval_dataset
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
get_openhands_config_for_eval,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
@@ -58,20 +59,21 @@ def get_config(instance: pd.Series) -> OpenHandsConfig:
f'Invalid container image for instance {instance["instance_id_swebench"]}.'
)
logger.info(f'Using instance container image: {base_container_image}.')
return OpenHandsConfig(
run_as_openhands=False,
runtime=os.environ.get('RUNTIME', 'eventstream'),
sandbox=SandboxConfig(
base_container_image=base_container_image,
use_host_network=False,
timeout=1800,
api_key=os.environ.get('ALLHANDS_API_KEY'),
remote_runtime_api_url=os.environ.get(
'SANDBOX_REMOTE_RUNTIME_API_URL', 'http://localhost:8000'
),
# Create custom sandbox config for testgeneval with specific requirements
sandbox_config = SandboxConfig(
base_container_image=base_container_image,
use_host_network=False,
timeout=1800, # Longer timeout than default (300)
api_key=os.environ.get('ALLHANDS_API_KEY'),
remote_runtime_api_url=os.environ.get(
'SANDBOX_REMOTE_RUNTIME_API_URL', 'http://localhost:8000'
),
workspace_base=None,
workspace_mount_path=None,
)
return get_openhands_config_for_eval(
sandbox_config=sandbox_config,
runtime=os.environ.get('RUNTIME', 'docker'), # Different default runtime
)

View File

@@ -25,6 +25,7 @@ from evaluation.utils.shared import (
assert_and_raise,
codeact_user_response,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -126,29 +127,26 @@ def get_config(
f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
runtime=os.environ.get('RUNTIME', 'eventstream'),
sandbox=SandboxConfig(
base_container_image=base_container_image,
enable_auto_lint=True,
use_host_network=False,
# large enough timeout, since some testcases take very long to run
timeout=300,
# Add platform to the sandbox config to solve issue 4401
platform='linux/amd64',
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get(
'SANDBOX_REMOTE_RUNTIME_API_URL', 'http://localhost:8000'
),
keep_runtime_alive=False,
remote_runtime_init_timeout=3600,
sandbox_config = SandboxConfig(
base_container_image=base_container_image,
enable_auto_lint=True,
use_host_network=False,
# large enough timeout, since some testcases take very long to run
timeout=300,
# Add platform to the sandbox config to solve issue 4401
platform='linux/amd64',
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get(
'SANDBOX_REMOTE_RUNTIME_API_URL', 'http://localhost:8000'
),
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
keep_runtime_alive=False,
remote_runtime_init_timeout=3600,
)
config = get_openhands_config_for_eval(
metadata=metadata,
sandbox_config=sandbox_config,
runtime=os.environ.get('RUNTIME', 'docker'),
)
config.set_llm_config(
update_llm_config_for_completions_logging(

View File

@@ -12,7 +12,10 @@ import tempfile
import yaml
from browsing import pre_login
from evaluation.utils.shared import get_default_sandbox_config_for_eval
from evaluation.utils.shared import (
get_default_sandbox_config_for_eval,
get_openhands_config_for_eval,
)
from openhands.controller.state.state import State
from openhands.core.config import (
LLMConfig,
@@ -42,19 +45,17 @@ def get_config(
sandbox_config.enable_auto_lint = True
# If the web services are running on the host machine, this must be set to True
sandbox_config.use_host_network = True
config = OpenHandsConfig(
run_as_openhands=False,
max_budget_per_task=4,
config = get_openhands_config_for_eval(
max_iterations=100,
save_trajectory_path=os.path.join(
mount_path_on_host, f'traj_{task_short_name}.json'
),
sandbox=sandbox_config,
# we mount trajectories path so that trajectories, generated by OpenHands
# controller, can be accessible to the evaluator file in the runtime container
sandbox_config=sandbox_config,
workspace_mount_path=mount_path_on_host,
workspace_mount_path_in_sandbox='/outputs',
)
config.save_trajectory_path = os.path.join(
mount_path_on_host, f'traj_{task_short_name}.json'
)
config.max_budget_per_task = 4
config.set_llm_config(llm_config)
if agent_config:
config.set_agent_config(agent_config)

View File

@@ -11,6 +11,8 @@ from evaluation.utils.shared import (
codeact_user_response,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -43,15 +45,10 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -134,7 +131,7 @@ def process_instance(instance: Any, metadata: EvalMetadata, reset_logger: bool =
correct = eval_answer(str(model_answer_raw), str(answer))
logger.info(f'Final message: {model_answer_raw} | Correctness: {correct}')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# history is now available as a stream of events, rather than list of pairs of (Action, Observation)
# for compatibility with the existing output format, we can remake the pairs here

View File

@@ -20,6 +20,7 @@ from evaluation.utils.shared import (
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
@@ -160,16 +161,11 @@ def get_config(
instance_id=instance['instance_id'],
)
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
config = get_openhands_config_for_eval(
metadata=metadata,
enable_browser=RUN_WITH_BROWSING,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(

View File

@@ -12,6 +12,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -72,16 +74,10 @@ def get_config(
'VWA_WIKIPEDIA': f'{base_url}:8888',
'VWA_HOMEPAGE': f'{base_url}:4399',
}
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
attach_to_existing=True,
sandbox_config=sandbox_config,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
@@ -179,7 +175,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Instruction obtained from the first message from the USER
instruction = ''

View File

@@ -12,6 +12,8 @@ from evaluation.utils.shared import (
EvalOutput,
compatibility_for_eval_history_pairs,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -64,15 +66,10 @@ def get_config(
'MAP': f'{base_url}:3000',
'HOMEPAGE': f'{base_url}:4399',
}
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
sandbox_config=sandbox_config,
)
config.set_llm_config(metadata.llm_config)
agent_config = config.get_agent_config(metadata.agent_class)
@@ -163,7 +160,7 @@ def process_instance(
if state is None:
raise ValueError('State should not be None.')
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
# Instruction is the first message from the USER
instruction = ''

View File

@@ -9,6 +9,8 @@ from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
@@ -44,18 +46,12 @@ def get_config(
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.platform = 'linux/amd64'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
config = get_openhands_config_for_eval(
metadata=metadata,
runtime=os.environ.get('RUNTIME', 'docker'),
max_iterations=metadata.max_iterations,
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
# debug
debug=True,
sandbox_config=sandbox_config,
)
config.debug = True
config.set_llm_config(
update_llm_config_for_completions_logging(
metadata.llm_config, metadata.eval_output_dir, instance_id
@@ -135,7 +131,7 @@ def process_instance(
assert len(histories) > 0, 'History should not be empty'
test_result: TestResult = test_class.verify_result(runtime, histories)
metrics = state.metrics.get() if state.metrics else None
metrics = get_metrics(state)
finally:
runtime.close()

View File

@@ -53,6 +53,7 @@ class EvalMetadata(BaseModel):
data_split: str | None = None
details: dict[str, Any] | None = None
condenser_config: CondenserConfig | None = None
instruction_template_name: str | None = None
class EvalOutput(BaseModel):
@@ -205,6 +206,7 @@ def make_metadata(
condenser_config=condenser_config
if condenser_config
else NoOpCondenserConfig(),
instruction_template_name=os.environ.get('INSTRUCTION_TEMPLATE_NAME'),
)
metadata_json = metadata.model_dump_json()
logger.info(f'Metadata: {metadata_json}')
@@ -666,8 +668,23 @@ def is_fatal_runtime_error(error: str | None) -> bool:
def get_metrics(state: State) -> dict[str, Any]:
"""Extract metrics from the state."""
metrics = state.metrics.get() if state.metrics else {}
"""Extract metrics for evaluations.
Prefer ConversationStats (source of truth) and fall back to state.metrics for
backward compatibility.
"""
metrics: dict[str, Any]
try:
if getattr(state, 'conversation_stats', None):
combined = state.conversation_stats.get_combined_metrics()
metrics = combined.get()
elif getattr(state, 'metrics', None):
metrics = state.metrics.get()
else:
metrics = {}
except Exception:
metrics = state.metrics.get() if getattr(state, 'metrics', None) else {}
metrics['condenser'] = get_condensation_metadata(state)
return metrics
@@ -686,3 +703,79 @@ def get_default_sandbox_config_for_eval() -> SandboxConfig:
remote_runtime_enable_retries=True,
remote_runtime_class='sysbox',
)
def get_openhands_config_for_eval(
metadata: EvalMetadata | None = None,
sandbox_config: SandboxConfig | None = None,
runtime: str | None = None,
max_iterations: int | None = None,
default_agent: str | None = None,
enable_browser: bool = False,
workspace_base: str | None = None,
workspace_mount_path: str | None = None,
):
"""Create an OpenHandsConfig with common patterns used across evaluation scripts.
This function provides a standardized way to create OpenHands configurations
for evaluation runs, with sensible defaults that match the patterns used in
most run_infer.py scripts. Individual evaluation scripts can override specific
attributes as needed.
Args:
metadata: EvalMetadata containing agent class, max iterations, etc.
sandbox_config: Custom sandbox config. If None, uses get_default_sandbox_config_for_eval()
runtime: Runtime type. If None, uses environment RUNTIME or 'docker'
max_iterations: Max iterations for the agent. If None, uses metadata.max_iterations
default_agent: Agent class name. If None, uses metadata.agent_class
enable_browser: Whether to enable browser functionality
workspace_base: Workspace base path. Defaults to None
workspace_mount_path: Workspace mount path. Defaults to None
Returns:
OpenHandsConfig: Configured for evaluation with eval-specific overrides applied
"""
# Defer import to avoid circular imports at module load time
from openhands.core.config.openhands_config import (
OpenHandsConfig as _OHConfig, # type: ignore
)
# Use provided sandbox config or get default
if sandbox_config is None:
sandbox_config = get_default_sandbox_config_for_eval()
# Extract values from metadata if provided
if metadata is not None:
if max_iterations is None:
max_iterations = metadata.max_iterations
if default_agent is None:
default_agent = metadata.agent_class
# Use environment runtime or default
if runtime is None:
runtime = os.environ.get('RUNTIME', 'docker')
# Provide sensible defaults if still None
if default_agent is None:
default_agent = 'CodeActAgent'
if max_iterations is None:
max_iterations = 50
# Always use repo-local .eval_sessions directory (absolute path)
eval_store = os.path.abspath(os.path.join(os.getcwd(), '.eval_sessions'))
# Create the base config with evaluation-specific overrides
config = _OHConfig(
default_agent=default_agent,
run_as_openhands=False,
runtime=runtime,
max_iterations=max_iterations,
enable_browser=enable_browser,
sandbox=sandbox_config,
workspace_base=workspace_base,
workspace_mount_path=workspace_mount_path,
file_store='local',
file_store_path=eval_store,
)
return config

View File

@@ -232,13 +232,16 @@ describe("RepositorySelectionForm", () => {
renderForm();
const dropdown = await screen.findByTestId("repo-dropdown");
const input = dropdown.querySelector('input[type="text"]') as HTMLInputElement;
const input = dropdown.querySelector(
'input[type="text"]',
) as HTMLInputElement;
expect(input).toBeInTheDocument();
await userEvent.type(input, "https://github.com/kubernetes/kubernetes");
expect(searchGitReposSpy).toHaveBeenLastCalledWith(
"kubernetes/kubernetes",
3,
"github",
);
});
@@ -268,13 +271,16 @@ describe("RepositorySelectionForm", () => {
renderForm();
const dropdown = await screen.findByTestId("repo-dropdown");
const input = dropdown.querySelector('input[type="text"]') as HTMLInputElement;
const input = dropdown.querySelector(
'input[type="text"]',
) as HTMLInputElement;
expect(input).toBeInTheDocument();
await userEvent.type(input, "https://github.com/kubernetes/kubernetes");
expect(searchGitReposSpy).toHaveBeenLastCalledWith(
"kubernetes/kubernetes",
3,
"github",
);
});
});

View File

@@ -444,28 +444,38 @@ describe("MicroagentManagement", () => {
expect(filePath2).toBeInTheDocument();
});
it("should display add microagent button in repository accordion", async () => {
it("should render add microagent button", async () => {
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(screen.getByTestId("repository-name-tooltip")).toBeInTheDocument();
});
// Check that add microagent buttons are present
const addButtons = screen.getAllByTestId("add-microagent-button");
expect(addButtons.length).toBeGreaterThan(0);
});
it("should open add microagent modal when add button is clicked", async () => {
it("should open modal when add button is clicked", async () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(screen.getByTestId("repository-name-tooltip")).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1292,11 +1302,18 @@ describe("MicroagentManagement", () => {
it("should render add microagent button", async () => {
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Check that add microagent buttons are present
const addButtons = screen.getAllByTestId("add-microagent-button");
expect(addButtons.length).toBeGreaterThan(0);
@@ -1306,11 +1323,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1361,11 +1385,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1385,11 +1416,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1408,11 +1446,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1441,11 +1486,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1468,11 +1520,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);
@@ -1494,11 +1553,18 @@ describe("MicroagentManagement", () => {
const user = userEvent.setup();
renderMicroagentManagement();
// Wait for repositories to be loaded
// Wait for repositories to be loaded and processed
await waitFor(() => {
expect(mockUseUserRepositories).toHaveBeenCalled();
});
// Wait for repositories to be displayed in the accordion
await waitFor(() => {
expect(
screen.getByTestId("repository-name-tooltip"),
).toBeInTheDocument();
});
// Find and click the first add microagent button
const addButtons = screen.getAllByTestId("add-microagent-button");
await user.click(addButtons[0]);

View File

@@ -136,7 +136,7 @@ describe("Settings Screen", () => {
"secrets",
"api keys",
];
const sectionsToExclude = ["llm", "mcp"];
const sectionsToExclude = ["llm"];
renderSettingsScreen();

View File

@@ -1,12 +1,12 @@
{
"name": "openhands-frontend",
"version": "0.53.0",
"version": "0.54.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "openhands-frontend",
"version": "0.53.0",
"version": "0.54.0",
"dependencies": {
"@heroui/react": "^2.8.2",
"@heroui/use-infinite-scroll": "^2.2.10",

View File

@@ -1,6 +1,6 @@
{
"name": "openhands-frontend",
"version": "0.53.0",
"version": "0.54.0",
"private": true,
"type": "module",
"engines": {

View File

@@ -1,7 +1,9 @@
import { useCallback, useMemo, useRef } from "react";
import { useCallback, useMemo, useState } from "react";
import { useTranslation } from "react-i18next";
import { Provider } from "../../types/settings";
import { useGitRepositories } from "../../hooks/query/use-git-repositories";
import { useSearchRepositories } from "../../hooks/query/use-search-repositories";
import { useDebounce } from "../../hooks/use-debounce";
import OpenHands from "../../api/open-hands";
import { GitRepository } from "../../types/git";
import {
@@ -19,10 +21,6 @@ export interface GitRepositoryDropdownProps {
onChange?: (repository?: GitRepository) => void;
}
interface SearchCache {
[key: string]: GitRepository[];
}
export function GitRepositoryDropdown({
provider,
value,
@@ -33,6 +31,20 @@ export function GitRepositoryDropdown({
onChange,
}: GitRepositoryDropdownProps) {
const { t } = useTranslation();
const [searchInput, setSearchInput] = useState("");
const debouncedSearchInput = useDebounce(searchInput, 300);
// Process search input to handle URLs
const processedSearchInput = useMemo(() => {
if (debouncedSearchInput.startsWith("https://")) {
const match = debouncedSearchInput.match(
/https:\/\/[^/]+\/([^/]+\/[^/]+)/,
);
return match ? match[1] : debouncedSearchInput;
}
return debouncedSearchInput;
}, [debouncedSearchInput]);
const {
data,
fetchNextPage,
@@ -45,6 +57,10 @@ export function GitRepositoryDropdown({
enabled: !disabled,
});
// Search query for processed input (handles URLs)
const { data: searchData, isLoading: isSearchLoading } =
useSearchRepositories(processedSearchInput, provider);
const allOptions: AsyncSelectOption[] = useMemo(
() =>
data?.pages
@@ -58,75 +74,83 @@ export function GitRepositoryDropdown({
[data],
);
// Keep track of search results
const searchCache = useRef<SearchCache>({});
const searchOptions: AsyncSelectOption[] = useMemo(
() =>
searchData
? searchData.map((repo) => ({
value: repo.id,
label: repo.full_name,
}))
: [],
[searchData],
);
const selectedOption = useMemo(() => {
// First check in loaded pages
const option = allOptions.find((opt) => opt.value === value);
if (option) return option;
// If not found, check in search cache
const repo = Object.values(searchCache.current)
.flat()
.find((r) => r.id === value);
if (repo) {
return {
value: repo.id,
label: repo.full_name,
};
}
// If not found, check in search results
const searchOption = searchOptions.find((opt) => opt.value === value);
if (searchOption) return searchOption;
return null;
}, [allOptions, value]);
}, [allOptions, searchOptions, value]);
const loadOptions = useCallback(
async (inputValue: string): Promise<AsyncSelectOption[]> => {
// Update search input to trigger debounced search
setSearchInput(inputValue);
// If empty input, show all loaded options
if (!inputValue.trim()) {
return allOptions;
}
// If it looks like a URL, extract the repo name and search
// For very short inputs, do local filtering
if (inputValue.length < 2) {
return allOptions.filter((option) =>
option.label.toLowerCase().includes(inputValue.toLowerCase()),
);
}
// Handle URL inputs by performing direct search
if (inputValue.startsWith("https://")) {
const match = inputValue.match(/https:\/\/[^/]+\/([^/]+\/[^/]+)/);
if (match) {
const repoName = match[1];
const searchResults = await OpenHands.searchGitRepositories(
repoName,
3,
);
// Cache the search results
searchCache.current[repoName] = searchResults;
return searchResults.map((repo) => ({
value: repo.id,
label: repo.full_name,
}));
try {
// Perform direct search for URL-based inputs
const repositories = await OpenHands.searchGitRepositories(
repoName,
3,
provider,
);
return repositories.map((repo) => ({
value: repo.full_name,
label: repo.full_name,
data: repo,
}));
} catch (error) {
// Fall back to local filtering if search fails
return allOptions.filter((option) =>
option.label.toLowerCase().includes(repoName.toLowerCase()),
);
}
}
}
// For any other input, search via API
if (inputValue.length >= 2) {
// Only search if at least 2 characters
const searchResults = await OpenHands.searchGitRepositories(
inputValue,
10,
);
// Cache the search results
searchCache.current[inputValue] = searchResults;
return searchResults.map((repo) => ({
value: repo.id,
label: repo.full_name,
}));
// For regular text inputs, use hook-based search results if available
if (searchOptions.length > 0 && processedSearchInput === inputValue) {
return searchOptions;
}
// For very short inputs, do local filtering
// Fallback to local filtering while search is loading
return allOptions.filter((option) =>
option.label.toLowerCase().includes(inputValue.toLowerCase()),
);
},
[allOptions],
[allOptions, searchOptions, processedSearchInput, provider],
);
const handleChange = (option: AsyncSelectOption | null) => {
@@ -142,9 +166,7 @@ export function GitRepositoryDropdown({
// If not found, check in search results
if (!repo) {
repo = Object.values(searchCache.current)
.flat()
.find((r) => r.id === option.value);
repo = searchData?.find((r) => r.id === option.value);
}
onChange?.(repo);
@@ -167,7 +189,7 @@ export function GitRepositoryDropdown({
errorMessage={errorMessage}
disabled={disabled}
isClearable={false}
isLoading={isLoading || isLoading || isFetchingNextPage}
isLoading={isLoading || isFetchingNextPage || isSearchLoading}
cacheOptions
defaultOptions={allOptions}
onChange={handleChange}

View File

@@ -16,6 +16,8 @@ const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
const OBSERVATION_NO_RENDER_LIST: OpenHandsEventType[] = ["think"];
export const shouldRenderEvent = (
event: OpenHandsAction | OpenHandsObservation,
) => {
@@ -35,7 +37,10 @@ export const shouldRenderEvent = (
return false;
}
return !COMMON_NO_RENDER_LIST.includes(event.observation);
const noRenderList = COMMON_NO_RENDER_LIST.concat(
OBSERVATION_NO_RENDER_LIST,
);
return !noRenderList.includes(event.observation);
}
return true;

View File

@@ -117,7 +117,7 @@ export function EventMessage({
}
if (hasObservationPair && isOpenHandsAction(event)) {
if (hasThoughtProperty(event.args)) {
if (hasThoughtProperty(event.args) && event.action !== "think") {
return (
<div>
<ChatMessage
@@ -243,9 +243,11 @@ export function EventMessage({
return (
<div>
{isOpenHandsAction(event) && hasThoughtProperty(event.args) && (
<ChatMessage type="agent" message={event.args.thought} />
)}
{isOpenHandsAction(event) &&
hasThoughtProperty(event.args) &&
event.action !== "think" && (
<ChatMessage type="agent" message={event.args.thought} />
)}
<GenericEventMessage
title={getEventContent(event).title}

View File

@@ -17,7 +17,7 @@ export function MicroagentManagementAccordionTitle({
<TooltipButton
tooltip={repository.full_name}
ariaLabel={repository.full_name}
className="text-white text-base font-normal bg-transparent p-0 min-w-0 h-auto cursor-pointer truncate max-w-[232px]"
className="text-white text-base font-normal bg-transparent p-0 min-w-0 h-auto cursor-pointer truncate max-w-[200px] translate-y-[-1px]"
testId="repository-name-tooltip"
placement="bottom"
>

View File

@@ -7,8 +7,6 @@ import {
} from "#/state/microagent-management-slice";
import { RootState } from "#/store";
import { GitRepository } from "#/types/git";
import PlusIcon from "#/icons/plus.svg?react";
import { TooltipButton } from "#/components/shared/buttons/tooltip-button";
interface MicroagentManagementAddMicroagentButtonProps {
repository: GitRepository;
@@ -25,23 +23,22 @@ export function MicroagentManagementAddMicroagentButton({
const dispatch = useDispatch();
const handleClick = (e: React.MouseEvent<HTMLDivElement>) => {
const handleClick = (e: React.MouseEvent<HTMLButtonElement>) => {
e.stopPropagation();
dispatch(setAddMicroagentModalVisible(!addMicroagentModalVisible));
dispatch(setSelectedRepository(repository));
};
return (
<div onClick={handleClick}>
<TooltipButton
tooltip={t(I18nKey.COMMON$ADD_MICROAGENT)}
ariaLabel={t(I18nKey.COMMON$ADD_MICROAGENT)}
className="p-0 min-w-0 h-6 w-6 flex items-center justify-center bg-transparent cursor-pointer"
testId="add-microagent-button"
placement="bottom"
>
<PlusIcon width={22} height={22} />
</TooltipButton>
</div>
<button
type="button"
onClick={handleClick}
className="translate-y-[-1px]"
data-testid="add-microagent-button"
>
<span className="text-sm font-normal leading-5 text-[#8480FF] cursor-pointer hover:text-[#6C63FF] transition-colors duration-200">
{t(I18nKey.COMMON$ADD_MICROAGENT)}
</span>
</button>
);
}

View File

@@ -1,4 +1,5 @@
import React, { useEffect, useState } from "react";
import { useTranslation } from "react-i18next";
import { useDispatch, useSelector } from "react-redux";
import { MicroagentManagementSidebar } from "./microagent-management-sidebar";
import { MicroagentManagementMain } from "./microagent-management-main";
@@ -25,6 +26,12 @@ import { GitRepository } from "#/types/git";
import { queryClient } from "#/query-client-config";
import { Provider } from "#/types/settings";
import { MicroagentManagementLearnThisRepoModal } from "./microagent-management-learn-this-repo-modal";
import {
displaySuccessToast,
displayErrorToast,
} from "#/utils/custom-toast-handlers";
import { getFirstPRUrl } from "#/utils/parse-pr-url";
import { I18nKey } from "#/i18n/declaration";
// Handle error events
const isErrorEvent = (evt: unknown): evt is { error: true; message: string } =>
@@ -112,6 +119,8 @@ export function MicroagentManagementContent() {
learnThisRepoModalVisible,
} = useSelector((state: RootState) => state.microagentManagement);
const { t } = useTranslation();
const dispatch = useDispatch();
const { createConversationAndSubscribe, isPending } =
@@ -159,6 +168,37 @@ export function MicroagentManagementContent() {
? (selectedRepository as GitRepository).full_name
: "";
// Check if agent is running and ready to work
if (
isOpenHandsEvent(socketEvent) &&
isAgentStateChangeObservation(socketEvent) &&
socketEvent.extras.agent_state === AgentState.RUNNING
) {
displaySuccessToast(
t(I18nKey.MICROAGENT_MANAGEMENT$OPENING_PR_TO_CREATE_MICROAGENT),
);
}
// Check if agent has finished and we have a PR
if (isOpenHandsEvent(socketEvent) && isFinishAction(socketEvent)) {
const prUrl = getFirstPRUrl(socketEvent.args.final_thought || "");
if (prUrl) {
displaySuccessToast(
t(I18nKey.MICROAGENT_MANAGEMENT$PR_READY_FOR_REVIEW),
);
} else {
// Agent finished but no PR found
displaySuccessToast(t(I18nKey.MICROAGENT_MANAGEMENT$PR_NOT_CREATED));
}
}
// Handle error events
if (isErrorEvent(socketEvent) || isAgentStatusError(socketEvent)) {
displayErrorToast(
t(I18nKey.MICROAGENT_MANAGEMENT$ERROR_CREATING_MICROAGENT),
);
}
if (shouldInvalidateConversationsList(socketEvent)) {
invalidateConversationsList(repositoryName);
}

View File

@@ -65,6 +65,18 @@ export function MicroagentManagementRepoMicroagents({
}
}, [conversations]);
useEffect(
() => () => {
dispatch(
setSelectedMicroagentItem({
microagent: null,
conversation: null,
}),
);
},
[],
);
// Show loading only when both queries are loading
const isLoading = isLoadingMicroagents || isLoadingConversations;
@@ -82,7 +94,7 @@ export function MicroagentManagementRepoMicroagents({
// If there's an error with microagents, show the learn this repo component
if (isError) {
return (
<div className="pb-4">
<div>
<MicroagentManagementLearnThisRepo repository={repository} />
</div>
);
@@ -93,7 +105,7 @@ export function MicroagentManagementRepoMicroagents({
const totalItems = numberOfMicroagents + numberOfConversations;
return (
<div className="pb-4">
<div>
{totalItems === 0 && (
<MicroagentManagementLearnThisRepo repository={repository} />
)}

View File

@@ -97,8 +97,10 @@ export function MicroagentManagementRepositories({
variant="splitted"
className="w-full px-0 gap-3"
itemClasses={{
base: "shadow-none bg-transparent border border-[#ffffff40] rounded-[6px] cursor-pointer",
trigger: "cursor-pointer gap-1",
base: "shadow-none bg-transparent cursor-pointer px-0",
trigger: "cursor-pointer gap-2 py-3",
indicator:
"flex items-center justify-center p-0.5 pr-[3px] text-white hover:bg-[#454545] rounded transition-colors duration-200 rotate-180",
}}
selectionMode="multiple"
>

View File

@@ -0,0 +1,110 @@
import { render, screen, fireEvent } from "@testing-library/react";
import { describe, it, expect, vi } from "vitest";
import { MCPServerForm } from "../mcp-server-form";
// i18n mock
vi.mock("react-i18next", () => ({
useTranslation: () => ({
t: (key: string) => key,
}),
}));
describe("MCPServerForm validation", () => {
const noop = () => {};
it("rejects invalid env var lines and allows blank lines", () => {
const onSubmit = vi.fn();
render(
<MCPServerForm
mode="add"
server={{ id: "tmp", type: "stdio" }}
existingServers={[]}
onSubmit={onSubmit}
onCancel={noop}
/>,
);
// Fill required fields
fireEvent.change(screen.getByTestId("name-input"), {
target: { value: "my-server" },
});
fireEvent.change(screen.getByTestId("command-input"), {
target: { value: "npx" },
});
// Invalid env entries mixed with blank lines
fireEvent.change(screen.getByTestId("env-input"), {
target: { value: "invalid\n\nKEY=value\n=novalue\nKEY_ONLY=" },
});
fireEvent.click(screen.getByTestId("submit-button"));
// Should show invalid env format error
expect(
screen.getByText("SETTINGS$MCP_ERROR_ENV_INVALID_FORMAT"),
).toBeInTheDocument();
// Fix env with valid lines and blank lines
fireEvent.change(screen.getByTestId("env-input"), {
target: { value: "KEY=value\n\nANOTHER=123" },
});
fireEvent.click(screen.getByTestId("submit-button"));
// No error; submit should be called
expect(onSubmit).toHaveBeenCalledTimes(1);
});
it("rejects duplicate URLs across sse/shttp types", () => {
const onSubmit = vi.fn();
const existingServers = [
{ id: "sse-1", type: "sse" as const, url: "https://api.example.com" },
{ id: "shttp-1", type: "shttp" as const, url: "https://x.example.com" },
];
const r1 = render(
<MCPServerForm
mode="add"
server={{ id: "tmp", type: "sse" }}
existingServers={existingServers}
onSubmit={onSubmit}
onCancel={noop}
/>,
);
fireEvent.change(screen.getAllByTestId("url-input")[0], {
target: { value: "https://api.example.com" },
});
fireEvent.click(screen.getAllByTestId("submit-button")[0]);
expect(
screen.getByText("SETTINGS$MCP_ERROR_URL_DUPLICATE"),
).toBeInTheDocument();
// Unmount first form, then check shttp duplicate
r1.unmount();
const r2 = render(
<MCPServerForm
mode="add"
server={{ id: "tmp2", type: "shttp" }}
existingServers={existingServers}
onSubmit={onSubmit}
onCancel={noop}
/>,
);
fireEvent.change(screen.getAllByTestId("url-input")[0], {
target: { value: "https://api.example.com" },
});
fireEvent.click(screen.getAllByTestId("submit-button")[0]);
expect(
screen.getByText("SETTINGS$MCP_ERROR_URL_DUPLICATE"),
).toBeInTheDocument();
r2.unmount();
});
});

View File

@@ -0,0 +1,158 @@
import { render, screen } from "@testing-library/react";
import { describe, it, expect, vi } from "vitest";
import { MCPServerList } from "../mcp-server-list";
// Mock react-i18next
vi.mock("react-i18next", () => ({
useTranslation: () => ({
t: (key: string) => key,
}),
}));
const mockServers = [
{
id: "sse-0",
type: "sse" as const,
url: "https://very-long-url-that-could-cause-layout-overflow.example.com/api/v1/mcp/server/endpoint/with/many/path/segments",
},
{
id: "stdio-0",
type: "stdio" as const,
name: "test-stdio-server",
command: "python",
args: ["-m", "test_server"],
},
];
describe("MCPServerList", () => {
it("should render servers with proper layout structure", () => {
const mockOnEdit = vi.fn();
const mockOnDelete = vi.fn();
render(
<MCPServerList
servers={mockServers}
onEdit={mockOnEdit}
onDelete={mockOnDelete}
/>,
);
// Check that the table structure is rendered
const table = screen.getByRole("table");
expect(table).toBeInTheDocument();
expect(table).toHaveClass("w-full");
// Check that server items are rendered
const serverItems = screen.getAllByTestId("mcp-server-item");
expect(serverItems).toHaveLength(2);
// Check that action buttons are present for each server
const editButtons = screen.getAllByTestId("edit-mcp-server-button");
const deleteButtons = screen.getAllByTestId("delete-mcp-server-button");
expect(editButtons).toHaveLength(2);
expect(deleteButtons).toHaveLength(2);
});
it("should render empty state when no servers", () => {
const mockOnEdit = vi.fn();
const mockOnDelete = vi.fn();
render(
<MCPServerList
servers={[]}
onEdit={mockOnEdit}
onDelete={mockOnDelete}
/>,
);
expect(screen.getByText("SETTINGS$MCP_NO_SERVERS")).toBeInTheDocument();
});
it("should handle long URLs without breaking layout", () => {
const longUrlServer = {
id: "sse-0",
type: "sse" as const,
url: "https://extremely-long-url-that-would-previously-cause-layout-overflow-and-push-action-buttons-out-of-view.example.com/api/v1/mcp/server/endpoint/with/many/path/segments/and/query/parameters?param1=value1&param2=value2&param3=value3",
};
const mockOnEdit = vi.fn();
const mockOnDelete = vi.fn();
render(
<MCPServerList
servers={[longUrlServer]}
onEdit={mockOnEdit}
onDelete={mockOnDelete}
/>,
);
// Check that action buttons are still present and accessible
const editButton = screen.getByTestId("edit-mcp-server-button");
const deleteButton = screen.getByTestId("delete-mcp-server-button");
expect(editButton).toBeInTheDocument();
expect(deleteButton).toBeInTheDocument();
// Check that the URL is properly displayed with title attribute for accessibility
const detailsCells = screen.getAllByTitle(longUrlServer.url);
expect(detailsCells).toHaveLength(2); // Name and Details columns both have the URL
// Check that both name and details cells use truncation and have title for tooltip
const [nameCell, detailsCell] = detailsCells;
expect(nameCell).toHaveClass("truncate");
expect(detailsCell).toHaveClass("truncate");
});
it("should display command and arguments for STDIO servers", () => {
const stdioServer = {
id: "stdio-1",
type: "stdio" as const,
name: "test-server",
command: "python",
args: ["-m", "test_module", "--verbose"],
};
const mockOnEdit = vi.fn();
const mockOnDelete = vi.fn();
render(
<MCPServerList
servers={[stdioServer]}
onEdit={mockOnEdit}
onDelete={mockOnDelete}
/>,
);
// Check that the server details show command + arguments
const expectedDetails = "python -m test_module --verbose";
expect(screen.getByTitle(expectedDetails)).toBeInTheDocument();
expect(screen.getByText(expectedDetails)).toBeInTheDocument();
});
it("should fallback to server name for STDIO servers without command", () => {
const stdioServer = {
id: "stdio-2",
type: "stdio" as const,
name: "fallback-server",
};
const mockOnEdit = vi.fn();
const mockOnDelete = vi.fn();
render(
<MCPServerList
servers={[stdioServer]}
onEdit={mockOnEdit}
onDelete={mockOnDelete}
/>,
);
// Check that the server details show the server name as fallback
// Both name and details columns will have the same value, so we expect 2 elements
const fallbackElements = screen.getAllByTitle("fallback-server");
expect(fallbackElements).toHaveLength(2);
const fallbackTextElements = screen.getAllByText("fallback-server");
expect(fallbackTextElements).toHaveLength(2);
});
});

View File

@@ -1,78 +0,0 @@
import React, { useState } from "react";
import { useTranslation } from "react-i18next";
import { MCPConfig } from "#/types/settings";
import { I18nKey } from "#/i18n/declaration";
import { MCPSSEServers } from "./mcp-sse-servers";
import { MCPStdioServers } from "./mcp-stdio-servers";
import { MCPJsonEditor } from "./mcp-json-editor";
import { BrandButton } from "../brand-button";
interface MCPConfigEditorProps {
mcpConfig?: MCPConfig;
onChange: (config: MCPConfig) => void;
}
export function MCPConfigEditor({ mcpConfig, onChange }: MCPConfigEditorProps) {
const { t } = useTranslation();
const [isEditing, setIsEditing] = useState(false);
const handleConfigChange = (newConfig: MCPConfig) => {
onChange(newConfig);
setIsEditing(false);
};
const config = mcpConfig || { sse_servers: [], stdio_servers: [] };
return (
<div>
<div className="flex flex-col gap-2 mb-6">
<div className="text-sm font-medium">
{t(I18nKey.SETTINGS$MCP_TITLE)}
</div>
<p className="text-xs text-[#A3A3A3]">
{t(I18nKey.SETTINGS$MCP_DESCRIPTION)}
</p>
</div>
{!isEditing && (
<div className="flex justify-between items-center mb-4">
<div className="flex items-center">
<BrandButton
type="button"
variant="primary"
onClick={() => setIsEditing(true)}
>
{t(I18nKey.SETTINGS$MCP_EDIT_CONFIGURATION)}
</BrandButton>
</div>
</div>
)}
<div>
{isEditing ? (
<MCPJsonEditor
mcpConfig={mcpConfig}
onChange={handleConfigChange}
onCancel={() => setIsEditing(false)}
/>
) : (
<>
<div className="flex flex-col gap-6">
<div>
<MCPSSEServers servers={config.sse_servers} />
</div>
<div>
<MCPStdioServers servers={config.stdio_servers} />
</div>
</div>
{config.sse_servers.length === 0 &&
config.stdio_servers.length === 0 && (
<div className="mt-4 p-2 bg-yellow-50 border border-yellow-200 rounded-md text-sm text-yellow-700">
{t(I18nKey.SETTINGS$MCP_NO_SERVERS_CONFIGURED)}
</div>
)}
</>
)}
</div>
</div>
);
}

View File

@@ -1,139 +0,0 @@
import React, { useState, useRef, useEffect } from "react";
import { useTranslation, Trans } from "react-i18next";
import { MCPConfig } from "#/types/settings";
import { I18nKey } from "#/i18n/declaration";
import { BrandButton } from "../brand-button";
import { cn } from "#/utils/utils";
interface MCPJsonEditorProps {
mcpConfig?: MCPConfig;
onChange: (config: MCPConfig) => void;
onCancel: () => void;
}
const MCP_DEFAULT_CONFIG: MCPConfig = {
sse_servers: [],
stdio_servers: [],
};
export function MCPJsonEditor({
mcpConfig,
onChange,
onCancel,
}: MCPJsonEditorProps) {
const { t } = useTranslation();
const [configText, setConfigText] = useState(() =>
mcpConfig
? JSON.stringify(mcpConfig, null, 2)
: JSON.stringify(MCP_DEFAULT_CONFIG, null, 2),
);
const [error, setError] = useState<string | null>(null);
const textareaRef = useRef<HTMLTextAreaElement>(null);
useEffect(() => {
textareaRef.current?.focus();
}, []);
const handleTextChange = (e: React.ChangeEvent<HTMLTextAreaElement>) => {
setConfigText(e.target.value);
};
const handleSave = () => {
try {
const newConfig = JSON.parse(configText);
// Validate the structure
if (!newConfig.sse_servers || !Array.isArray(newConfig.sse_servers)) {
throw new Error(t(I18nKey.SETTINGS$MCP_ERROR_SSE_ARRAY));
}
if (!newConfig.stdio_servers || !Array.isArray(newConfig.stdio_servers)) {
throw new Error(t(I18nKey.SETTINGS$MCP_ERROR_STDIO_ARRAY));
}
// Validate SSE servers
for (const server of newConfig.sse_servers) {
if (
typeof server !== "string" &&
(!server.url || typeof server.url !== "string")
) {
throw new Error(t(I18nKey.SETTINGS$MCP_ERROR_SSE_URL));
}
}
// Validate stdio servers
for (const server of newConfig.stdio_servers) {
if (!server.name || !server.command) {
throw new Error(t(I18nKey.SETTINGS$MCP_ERROR_STDIO_PROPS));
}
}
onChange(newConfig);
setError(null);
} catch (e) {
setError(
e instanceof Error
? e.message
: t(I18nKey.SETTINGS$MCP_ERROR_INVALID_JSON),
);
}
};
return (
<div>
<p className="mb-2 text-sm text-gray-400">
<Trans
i18nKey={I18nKey.SETTINGS$MCP_CONFIG_DESCRIPTION}
components={{
a: (
<a
href="https://docs.all-hands.dev/usage/mcp"
target="_blank"
rel="noopener noreferrer"
className="text-blue-400 hover:underline"
>
documentation
</a>
),
}}
/>
</p>
<textarea
ref={textareaRef}
className={cn(
"w-full h-64 resize-y p-2 rounded-sm text-sm font-mono",
"bg-tertiary border border-[#717888]",
"placeholder:italic placeholder:text-tertiary-alt",
"focus:outline-none focus:ring-1 focus:ring-primary",
"disabled:bg-[#2D2F36] disabled:border-[#2D2F36] disabled:cursor-not-allowed",
)}
value={configText}
onChange={handleTextChange}
spellCheck="false"
/>
{error && (
<div className="mt-2 p-2 bg-red-100 border border-red-300 rounded-md text-sm text-red-700">
<strong>{t(I18nKey.SETTINGS$MCP_CONFIG_ERROR)}</strong> {error}
</div>
)}
<div className="mt-2 text-sm text-gray-400">
<strong>{t(I18nKey.SETTINGS$MCP_CONFIG_EXAMPLE)}</strong>{" "}
<code>
{
'{ "sse_servers": ["https://example-mcp-server.com/sse"], "stdio_servers": [{ "name": "fetch", "command": "uvx", "args": ["mcp-server-fetch"] }] }'
}
</code>
</div>
<div className="mt-4 flex justify-end gap-3">
<BrandButton type="button" variant="secondary" onClick={onCancel}>
{t(I18nKey.BUTTON$CANCEL)}
</BrandButton>
<BrandButton type="button" variant="primary" onClick={handleSave}>
{t(I18nKey.SETTINGS$MCP_PREVIEW_CHANGES)}
</BrandButton>
</div>
</div>
);
}

View File

@@ -0,0 +1,376 @@
import React from "react";
import { useTranslation } from "react-i18next";
import { I18nKey } from "#/i18n/declaration";
import { SettingsInput } from "../settings-input";
import { SettingsDropdownInput } from "../settings-dropdown-input";
import { BrandButton } from "../brand-button";
import { OptionalTag } from "../optional-tag";
import { cn } from "#/utils/utils";
type MCPServerType = "sse" | "stdio" | "shttp";
interface MCPServerConfig {
id: string;
type: MCPServerType;
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
interface MCPServerFormProps {
mode: "add" | "edit";
server?: MCPServerConfig;
existingServers?: MCPServerConfig[];
onSubmit: (server: MCPServerConfig) => void;
onCancel: () => void;
}
export function MCPServerForm({
mode,
server,
existingServers,
onSubmit,
onCancel,
}: MCPServerFormProps) {
const { t } = useTranslation();
const [serverType, setServerType] = React.useState<MCPServerType>(
server?.type || "sse",
);
const [error, setError] = React.useState<string | null>(null);
const serverTypeOptions = [
{ key: "sse", label: t(I18nKey.SETTINGS$MCP_SERVER_TYPE_SSE) },
{ key: "stdio", label: t(I18nKey.SETTINGS$MCP_SERVER_TYPE_STDIO) },
{ key: "shttp", label: t(I18nKey.SETTINGS$MCP_SERVER_TYPE_SHTTP) },
];
const validateUrl = (url: string): string | null => {
if (!url) return t(I18nKey.SETTINGS$MCP_ERROR_URL_REQUIRED);
try {
const urlObj = new URL(url);
if (!["http:", "https:"].includes(urlObj.protocol)) {
return t(I18nKey.SETTINGS$MCP_ERROR_URL_INVALID_PROTOCOL);
}
} catch {
return t(I18nKey.SETTINGS$MCP_ERROR_URL_INVALID);
}
return null;
};
const validateName = (name: string): string | null => {
if (!name) return t(I18nKey.SETTINGS$MCP_ERROR_NAME_REQUIRED);
if (!/^[a-zA-Z0-9_-]+$/.test(name)) {
return t(I18nKey.SETTINGS$MCP_ERROR_NAME_INVALID);
}
return null;
};
const validateNameUniqueness = (name: string): string | null => {
if (!existingServers) return null;
const shouldCheckUniqueness =
mode === "add" || (mode === "edit" && server?.name !== name);
if (!shouldCheckUniqueness) return null;
const existingStdioNames = existingServers
.filter((s) => s.type === "stdio")
.map((s) => s.name)
.filter(Boolean);
if (existingStdioNames.includes(name)) {
return t(I18nKey.SETTINGS$MCP_ERROR_NAME_DUPLICATE);
}
return null;
};
const validateCommand = (command: string): string | null => {
if (!command) return t(I18nKey.SETTINGS$MCP_ERROR_COMMAND_REQUIRED);
if (command.includes(" ")) {
return t(I18nKey.SETTINGS$MCP_ERROR_COMMAND_NO_SPACES);
}
return null;
};
const validateUrlUniqueness = (url: string): string | null => {
if (!existingServers) return null;
const originalUrl = server?.url;
const changed = mode === "add" || (mode === "edit" && originalUrl !== url);
if (!changed) return null;
// For URL-based servers (sse/shttp), ensure URL is unique across both types
const exists = existingServers.some(
(s) => (s.type === "sse" || s.type === "shttp") && s.url === url,
);
if (exists) return t(I18nKey.SETTINGS$MCP_ERROR_URL_DUPLICATE);
return null;
};
const validateEnvFormat = (envString: string): string | null => {
if (!envString.trim()) return null;
const lines = envString.split("\n");
for (let i = 0; i < lines.length; i += 1) {
const trimmed = lines[i].trim();
if (trimmed) {
const eq = trimmed.indexOf("=");
if (eq === -1) return t(I18nKey.SETTINGS$MCP_ERROR_ENV_INVALID_FORMAT);
const key = trimmed.substring(0, eq).trim();
if (!key) return t(I18nKey.SETTINGS$MCP_ERROR_ENV_INVALID_FORMAT);
}
}
return null;
};
const validateStdioServer = (formData: FormData): string | null => {
const name = formData.get("name")?.toString().trim() || "";
const command = formData.get("command")?.toString().trim() || "";
const envString = formData.get("env")?.toString() || "";
const nameError = validateName(name);
if (nameError) return nameError;
const uniquenessError = validateNameUniqueness(name);
if (uniquenessError) return uniquenessError;
const commandError = validateCommand(command);
if (commandError) return commandError;
// Validate environment variable format
const envError = validateEnvFormat(envString);
if (envError) return envError;
return null;
};
const validateForm = (formData: FormData): string | null => {
if (serverType === "sse" || serverType === "shttp") {
const url = formData.get("url")?.toString().trim() || "";
const urlError = validateUrl(url);
if (urlError) return urlError;
const urlDupError = validateUrlUniqueness(url);
if (urlDupError) return urlDupError;
return null;
}
if (serverType === "stdio") {
return validateStdioServer(formData);
}
return null;
};
const parseEnvironmentVariables = (
envString: string,
): Record<string, string> => {
const env: Record<string, string> = {};
const input = envString.trim();
if (!input) return env;
for (const line of input.split("\n")) {
const trimmed = line.trim();
const eq = trimmed.indexOf("=");
const key = eq >= 0 ? trimmed.substring(0, eq).trim() : "";
if (trimmed && eq !== -1 && key) {
env[key] = trimmed.substring(eq + 1).trim();
}
}
return env;
};
const formatEnvironmentVariables = (env?: Record<string, string>): string => {
if (!env) return "";
return Object.entries(env)
.map(([key, value]) => `${key}=${value}`)
.join("\n");
};
const handleSubmit = (event: React.FormEvent<HTMLFormElement>) => {
event.preventDefault();
setError(null);
const formData = new FormData(event.currentTarget);
const validationError = validateForm(formData);
if (validationError) {
setError(validationError);
return;
}
const baseConfig = {
id: server?.id || `${serverType}-${Date.now()}`,
type: serverType,
};
if (serverType === "sse" || serverType === "shttp") {
const url = formData.get("url")?.toString().trim();
const apiKey = formData.get("api_key")?.toString().trim();
onSubmit({
...baseConfig,
url: url!,
...(apiKey && { api_key: apiKey }),
});
} else if (serverType === "stdio") {
const name = formData.get("name")?.toString().trim();
const command = formData.get("command")?.toString().trim();
const argsString = formData.get("args")?.toString().trim();
const envString = formData.get("env")?.toString().trim();
const args = argsString
? argsString
.split("\n")
.map((arg) => arg.trim())
.filter(Boolean)
: [];
const env = parseEnvironmentVariables(envString || "");
onSubmit({
...baseConfig,
name: name!,
command: command!,
...(args.length > 0 && { args }),
...(Object.keys(env).length > 0 && { env }),
});
}
};
const formTestId =
mode === "add" ? "add-mcp-server-form" : "edit-mcp-server-form";
return (
<form
data-testid={formTestId}
onSubmit={handleSubmit}
className="flex flex-col items-start gap-6"
>
{mode === "add" && (
<SettingsDropdownInput
testId="server-type-dropdown"
name="server-type"
label={t(I18nKey.SETTINGS$MCP_SERVER_TYPE)}
items={serverTypeOptions}
selectedKey={serverType}
onSelectionChange={(key) => setServerType(key as MCPServerType)}
onInputChange={() => {}} // Prevent input changes
isClearable={false}
allowsCustomValue={false}
required
wrapperClassName={cn("w-full", "max-w-[680px]")}
/>
)}
{error && <p className="text-red-500 text-sm">{error}</p>}
{(serverType === "sse" || serverType === "shttp") && (
<>
<SettingsInput
testId="url-input"
name="url"
type="url"
label={t(I18nKey.SETTINGS$MCP_URL)}
className="w-full max-w-[680px]"
required
defaultValue={server?.url || ""}
placeholder="https://api.example.com"
/>
<SettingsInput
testId="api-key-input"
name="api_key"
type="password"
label={t(I18nKey.SETTINGS$MCP_API_KEY)}
className="w-full max-w-[680px]"
showOptionalTag
defaultValue={server?.api_key || ""}
placeholder={t(I18nKey.SETTINGS$MCP_API_KEY_PLACEHOLDER)}
/>
</>
)}
{serverType === "stdio" && (
<>
<SettingsInput
testId="name-input"
name="name"
type="text"
label={t(I18nKey.SETTINGS$MCP_NAME)}
className="w-full max-w-[680px]"
required
defaultValue={server?.name || ""}
placeholder="my-mcp-server"
pattern="^[a-zA-Z0-9_-]+$"
/>
<SettingsInput
testId="command-input"
name="command"
type="text"
label={t(I18nKey.SETTINGS$MCP_COMMAND)}
className="w-full max-w-[680px]"
required
defaultValue={server?.command || ""}
placeholder="npx"
/>
<label className="flex flex-col gap-2.5 w-full max-w-[680px]">
<div className="flex items-center gap-2">
<span className="text-sm">
{t(I18nKey.SETTINGS$MCP_COMMAND_ARGUMENTS)}
</span>
<OptionalTag />
</div>
<textarea
data-testid="args-input"
name="args"
rows={3}
defaultValue={server?.args?.join("\n") || ""}
placeholder="arg1&#10;arg2&#10;arg3"
className={cn(
"bg-tertiary border border-[#717888] w-full rounded-sm p-2 placeholder:italic placeholder:text-tertiary-alt resize-none",
"disabled:bg-[#2D2F36] disabled:border-[#2D2F36] disabled:cursor-not-allowed",
)}
/>
<p className="text-xs text-tertiary-alt">
{t(I18nKey.SETTINGS$MCP_COMMAND_ARGUMENTS_HELP)}
</p>
</label>
<label className="flex flex-col gap-2.5 w-full max-w-[680px]">
<div className="flex items-center gap-2">
<span className="text-sm">
{t(I18nKey.SETTINGS$MCP_ENVIRONMENT_VARIABLES)}
</span>
<OptionalTag />
</div>
<textarea
data-testid="env-input"
name="env"
rows={4}
defaultValue={formatEnvironmentVariables(server?.env)}
placeholder="KEY1=value1&#10;KEY2=value2"
className={cn(
"resize-none",
"bg-tertiary border border-[#717888] rounded-sm p-2 placeholder:italic placeholder:text-tertiary-alt",
"disabled:bg-[#2D2F36] disabled:border-[#2D2F36] disabled:cursor-not-allowed",
)}
/>
</label>
</>
)}
<div className="flex items-center gap-4">
<BrandButton
testId="cancel-button"
type="button"
variant="secondary"
onClick={onCancel}
>
{t(I18nKey.BUTTON$CANCEL)}
</BrandButton>
<BrandButton testId="submit-button" type="submit" variant="primary">
{mode === "add" && t(I18nKey.SETTINGS$MCP_ADD_SERVER)}
{mode === "edit" && t(I18nKey.SETTINGS$MCP_SAVE_SERVER)}
</BrandButton>
</div>
</form>
);
}

View File

@@ -0,0 +1,110 @@
import { FaPencil, FaTrash } from "react-icons/fa6";
import { useTranslation } from "react-i18next";
import { I18nKey } from "#/i18n/declaration";
interface MCPServerConfig {
id: string;
type: "sse" | "stdio" | "shttp";
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
export function MCPServerListItem({
server,
onEdit,
onDelete,
}: {
server: MCPServerConfig;
onEdit: () => void;
onDelete: () => void;
}) {
const { t } = useTranslation();
const getServerTypeLabel = (type: string) => {
switch (type) {
case "sse":
return t(I18nKey.SETTINGS$MCP_SERVER_TYPE_SSE);
case "stdio":
return t(I18nKey.SETTINGS$MCP_SERVER_TYPE_STDIO);
case "shttp":
return t(I18nKey.SETTINGS$MCP_SERVER_TYPE_SHTTP);
default:
return type.toUpperCase();
}
};
const getServerDescription = (serverConfig: MCPServerConfig) => {
if (serverConfig.type === "stdio") {
if (serverConfig.command) {
const args =
serverConfig.args && serverConfig.args.length > 0
? ` ${serverConfig.args.join(" ")}`
: "";
return `${serverConfig.command}${args}`;
}
return serverConfig.name || "";
}
if (
(serverConfig.type === "sse" || serverConfig.type === "shttp") &&
serverConfig.url
) {
return serverConfig.url;
}
return "";
};
const serverName = server.type === "stdio" ? server.name : server.url;
const serverDescription = getServerDescription(server);
return (
<tr
data-testid="mcp-server-item"
className="grid grid-cols-[minmax(0,0.25fr)_120px_minmax(0,1fr)_120px] gap-4 items-start border-t border-tertiary"
>
<td
className="p-3 text-sm text-content-2 truncate min-w-0"
title={serverName}
>
{serverName}
</td>
<td className="p-3 text-sm text-content-2 whitespace-nowrap">
{getServerTypeLabel(server.type)}
</td>
<td
className="p-3 text-sm text-content-2 opacity-80 italic min-w-0 truncate"
title={serverDescription}
>
<span className="inline-block max-w-full align-bottom">
{serverDescription}
</span>
</td>
<td className="p-3 flex items-start justify-end gap-4 whitespace-nowrap">
<button
data-testid="edit-mcp-server-button"
type="button"
onClick={onEdit}
aria-label={`Edit ${serverName}`}
className="cursor-pointer hover:text-content-1 transition-colors"
>
<FaPencil size={16} />
</button>
<button
data-testid="delete-mcp-server-button"
type="button"
onClick={onDelete}
aria-label={`Delete ${serverName}`}
className="cursor-pointer hover:text-content-1 transition-colors"
>
<FaTrash size={16} />
</button>
</td>
</tr>
);
}

View File

@@ -0,0 +1,71 @@
import { useTranslation } from "react-i18next";
import { MCPServerListItem } from "./mcp-server-list-item";
import { I18nKey } from "#/i18n/declaration";
interface MCPServerConfig {
id: string;
type: "sse" | "stdio" | "shttp";
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
interface MCPServerListProps {
servers: MCPServerConfig[];
onEdit: (server: MCPServerConfig) => void;
onDelete: (serverId: string) => void;
}
export function MCPServerList({
servers,
onEdit,
onDelete,
}: MCPServerListProps) {
const { t } = useTranslation();
if (servers.length === 0) {
return (
<div className="border border-tertiary rounded-md p-8 text-center">
<p className="text-content-2 text-sm">
{t(I18nKey.SETTINGS$MCP_NO_SERVERS)}
</p>
</div>
);
}
return (
<div className="border border-tertiary rounded-md overflow-hidden">
<table className="w-full">
<thead className="bg-base-tertiary">
<tr className="grid grid-cols-[minmax(0,0.25fr)_120px_minmax(0,1fr)_120px] gap-4 items-start">
<th className="text-left p-3 text-sm font-medium">
{t(I18nKey.SETTINGS$NAME)}
</th>
<th className="text-left p-3 text-sm font-medium">
{t(I18nKey.SETTINGS$MCP_SERVER_TYPE)}
</th>
<th className="text-left p-3 text-sm font-medium">
{t(I18nKey.SETTINGS$MCP_SERVER_DETAILS)}
</th>
<th className="text-right p-3 text-sm font-medium">
{t(I18nKey.SETTINGS$ACTIONS)}
</th>
</tr>
</thead>
<tbody>
{servers.map((server) => (
<MCPServerListItem
key={server.id}
server={server}
onEdit={() => onEdit(server)}
onDelete={() => onDelete(server.id)}
/>
))}
</tbody>
</table>
</div>
);
}

View File

@@ -23,7 +23,7 @@ export function ModalBackdrop({ children, onClose }: ModalBackdropProps) {
<div className="fixed inset-0 flex items-center justify-center z-20">
<div
onClick={handleClick}
className="fixed inset-0 bg-black bg-opacity-80"
className="fixed inset-0 bg-black opacity-60"
/>
<div className="relative">{children}</div>
</div>

View File

@@ -0,0 +1,67 @@
import { useMutation, useQueryClient } from "@tanstack/react-query";
import { useSettings } from "#/hooks/query/use-settings";
import OpenHands from "#/api/open-hands";
import { MCPSSEServer, MCPStdioServer, MCPSHTTPServer } from "#/types/settings";
type MCPServerType = "sse" | "stdio" | "shttp";
interface MCPServerConfig {
type: MCPServerType;
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
export function useAddMcpServer() {
const queryClient = useQueryClient();
const { data: settings } = useSettings();
return useMutation({
mutationFn: async (server: MCPServerConfig): Promise<void> => {
if (!settings) return;
const currentConfig = settings.MCP_CONFIG || {
sse_servers: [],
stdio_servers: [],
shttp_servers: [],
};
const newConfig = { ...currentConfig };
if (server.type === "sse") {
const sseServer: MCPSSEServer = {
url: server.url!,
...(server.api_key && { api_key: server.api_key }),
};
newConfig.sse_servers.push(sseServer);
} else if (server.type === "stdio") {
const stdioServer: MCPStdioServer = {
name: server.name!,
command: server.command!,
...(server.args && { args: server.args }),
...(server.env && { env: server.env }),
};
newConfig.stdio_servers.push(stdioServer);
} else if (server.type === "shttp") {
const shttpServer: MCPSHTTPServer = {
url: server.url!,
...(server.api_key && { api_key: server.api_key }),
};
newConfig.shttp_servers.push(shttpServer);
}
const apiSettings = {
mcp_config: newConfig,
};
await OpenHands.saveSettings(apiSettings);
},
onSuccess: () => {
// Invalidate the settings query to trigger a refetch
queryClient.invalidateQueries({ queryKey: ["settings"] });
},
});
}

View File

@@ -0,0 +1,37 @@
import { useMutation, useQueryClient } from "@tanstack/react-query";
import { useSettings } from "#/hooks/query/use-settings";
import OpenHands from "#/api/open-hands";
import { MCPConfig } from "#/types/settings";
export function useDeleteMcpServer() {
const queryClient = useQueryClient();
const { data: settings } = useSettings();
return useMutation({
mutationFn: async (serverId: string): Promise<void> => {
if (!settings?.MCP_CONFIG) return;
const newConfig: MCPConfig = { ...settings.MCP_CONFIG };
const [serverType, indexStr] = serverId.split("-");
const index = parseInt(indexStr, 10);
if (serverType === "sse") {
newConfig.sse_servers.splice(index, 1);
} else if (serverType === "stdio") {
newConfig.stdio_servers.splice(index, 1);
} else if (serverType === "shttp") {
newConfig.shttp_servers.splice(index, 1);
}
const apiSettings = {
mcp_config: newConfig,
};
await OpenHands.saveSettings(apiSettings);
},
onSuccess: () => {
// Invalidate the settings query to trigger a refetch
queryClient.invalidateQueries({ queryKey: ["settings"] });
},
});
}

View File

@@ -0,0 +1,69 @@
import { useMutation, useQueryClient } from "@tanstack/react-query";
import { useSettings } from "#/hooks/query/use-settings";
import OpenHands from "#/api/open-hands";
import { MCPSSEServer, MCPStdioServer, MCPSHTTPServer } from "#/types/settings";
type MCPServerType = "sse" | "stdio" | "shttp";
interface MCPServerConfig {
type: MCPServerType;
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
export function useUpdateMcpServer() {
const queryClient = useQueryClient();
const { data: settings } = useSettings();
return useMutation({
mutationFn: async ({
serverId,
server,
}: {
serverId: string;
server: MCPServerConfig;
}): Promise<void> => {
if (!settings?.MCP_CONFIG) return;
const newConfig = { ...settings.MCP_CONFIG };
const [serverType, indexStr] = serverId.split("-");
const index = parseInt(indexStr, 10);
if (serverType === "sse") {
const sseServer: MCPSSEServer = {
url: server.url!,
...(server.api_key && { api_key: server.api_key }),
};
newConfig.sse_servers[index] = sseServer;
} else if (serverType === "stdio") {
const stdioServer: MCPStdioServer = {
name: server.name!,
command: server.command!,
...(server.args && { args: server.args }),
...(server.env && { env: server.env }),
};
newConfig.stdio_servers[index] = stdioServer;
} else if (serverType === "shttp") {
const shttpServer: MCPSHTTPServer = {
url: server.url!,
...(server.api_key && { api_key: server.api_key }),
};
newConfig.shttp_servers[index] = shttpServer;
}
const apiSettings = {
mcp_config: newConfig,
};
await OpenHands.saveSettings(apiSettings);
},
onSuccess: () => {
// Invalidate the settings query to trigger a refetch
queryClient.invalidateQueries({ queryKey: ["settings"] });
},
});
}

View File

@@ -781,4 +781,37 @@ export enum I18nKey {
PROJECT_MANAGEMENT$SVC_ACC_EMAIL_VALIDATION_ERROR = "PROJECT_MANAGEMENT$SVC_ACC_EMAIL_VALIDATION_ERROR",
PROJECT_MANAGEMENT$SVC_ACC_API_KEY_VALIDATION_ERROR = "PROJECT_MANAGEMENT$SVC_ACC_API_KEY_VALIDATION_ERROR",
MICROAGENT_MANAGEMENT$ERROR_LOADING_MICROAGENT_CONTENT = "MICROAGENT_MANAGEMENT$ERROR_LOADING_MICROAGENT_CONTENT",
SETTINGS$MCP_ERROR_ENV_INVALID_FORMAT = "SETTINGS$MCP_ERROR_ENV_INVALID_FORMAT",
SETTINGS$MCP_ERROR_URL_DUPLICATE = "SETTINGS$MCP_ERROR_URL_DUPLICATE",
SETTINGS$MCP_SERVER_TYPE_SSE = "SETTINGS$MCP_SERVER_TYPE_SSE",
SETTINGS$MCP_SERVER_TYPE_STDIO = "SETTINGS$MCP_SERVER_TYPE_STDIO",
SETTINGS$MCP_SERVER_TYPE_SHTTP = "SETTINGS$MCP_SERVER_TYPE_SHTTP",
SETTINGS$MCP_ERROR_URL_REQUIRED = "SETTINGS$MCP_ERROR_URL_REQUIRED",
SETTINGS$MCP_ERROR_URL_INVALID_PROTOCOL = "SETTINGS$MCP_ERROR_URL_INVALID_PROTOCOL",
SETTINGS$MCP_ERROR_URL_INVALID = "SETTINGS$MCP_ERROR_URL_INVALID",
SETTINGS$MCP_ERROR_NAME_REQUIRED = "SETTINGS$MCP_ERROR_NAME_REQUIRED",
SETTINGS$MCP_ERROR_NAME_INVALID = "SETTINGS$MCP_ERROR_NAME_INVALID",
SETTINGS$MCP_ERROR_NAME_DUPLICATE = "SETTINGS$MCP_ERROR_NAME_DUPLICATE",
SETTINGS$MCP_ERROR_COMMAND_REQUIRED = "SETTINGS$MCP_ERROR_COMMAND_REQUIRED",
SETTINGS$MCP_ERROR_COMMAND_NO_SPACES = "SETTINGS$MCP_ERROR_COMMAND_NO_SPACES",
SETTINGS$MCP_SERVER_TYPE = "SETTINGS$MCP_SERVER_TYPE",
SETTINGS$MCP_API_KEY_PLACEHOLDER = "SETTINGS$MCP_API_KEY_PLACEHOLDER",
SETTINGS$MCP_COMMAND_ARGUMENTS = "SETTINGS$MCP_COMMAND_ARGUMENTS",
SETTINGS$MCP_COMMAND_ARGUMENTS_HELP = "SETTINGS$MCP_COMMAND_ARGUMENTS_HELP",
SETTINGS$MCP_ENVIRONMENT_VARIABLES = "SETTINGS$MCP_ENVIRONMENT_VARIABLES",
SETTINGS$MCP_ADD_SERVER = "SETTINGS$MCP_ADD_SERVER",
SETTINGS$MCP_SAVE_SERVER = "SETTINGS$MCP_SAVE_SERVER",
SETTINGS$MCP_NO_SERVERS = "SETTINGS$MCP_NO_SERVERS",
SETTINGS$MCP_SERVER_DETAILS = "SETTINGS$MCP_SERVER_DETAILS",
SETTINGS$MCP_CONFIRM_DELETE = "SETTINGS$MCP_CONFIRM_DELETE",
SETTINGS$MCP_CONFIRM_CHANGES = "SETTINGS$MCP_CONFIRM_CHANGES",
SETTINGS$MCP_DEFAULT_CONFIG = "SETTINGS$MCP_DEFAULT_CONFIG",
PROJECT_MANAGEMENT$WORKSPACE_NAME_PLACEHOLDER = "PROJECT_MANAGEMENT$WORKSPACE_NAME_PLACEHOLDER",
PROJECT_MANAGEMENT$CONFIGURE_MODAL_DESCRIPTION = "PROJECT_MANAGEMENT$CONFIGURE_MODAL_DESCRIPTION",
PROJECT_MANAGEMENT$IMPORTANT_WORKSPACE_INTEGRATION = "PROJECT_MANAGEMENT$IMPORTANT_WORKSPACE_INTEGRATION",
SETTINGS = "SETTINGS",
MICROAGENT_MANAGEMENT$OPENING_PR_TO_CREATE_MICROAGENT = "MICROAGENT_MANAGEMENT$OPENING_PR_TO_CREATE_MICROAGENT",
MICROAGENT_MANAGEMENT$PR_READY_FOR_REVIEW = "MICROAGENT_MANAGEMENT$PR_READY_FOR_REVIEW",
MICROAGENT_MANAGEMENT$PR_NOT_CREATED = "MICROAGENT_MANAGEMENT$PR_NOT_CREATED",
MICROAGENT_MANAGEMENT$ERROR_CREATING_MICROAGENT = "MICROAGENT_MANAGEMENT$ERROR_CREATING_MICROAGENT",
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,86 +1,191 @@
import React, { useState, useEffect } from "react";
import React, { useState } from "react";
import { useTranslation } from "react-i18next";
import posthog from "posthog-js";
import { useSettings } from "#/hooks/query/use-settings";
import { useSaveSettings } from "#/hooks/mutation/use-save-settings";
import { MCPConfig } from "#/types/settings";
import { MCPConfigEditor } from "#/components/features/settings/mcp-settings/mcp-config-editor";
import { BrandButton } from "#/components/features/settings/brand-button";
import { useDeleteMcpServer } from "#/hooks/mutation/use-delete-mcp-server";
import { useAddMcpServer } from "#/hooks/mutation/use-add-mcp-server";
import { useUpdateMcpServer } from "#/hooks/mutation/use-update-mcp-server";
import { I18nKey } from "#/i18n/declaration";
import {
displayErrorToast,
displaySuccessToast,
} from "#/utils/custom-toast-handlers";
import { retrieveAxiosErrorMessage } from "#/utils/retrieve-axios-error-message";
import { MCPServerList } from "#/components/features/settings/mcp-settings/mcp-server-list";
import { MCPServerForm } from "#/components/features/settings/mcp-settings/mcp-server-form";
import { ConfirmationModal } from "#/components/shared/modals/confirmation-modal";
import { BrandButton } from "#/components/features/settings/brand-button";
import { MCPConfig } from "#/types/settings";
type MCPServerType = "sse" | "stdio" | "shttp";
interface MCPServerConfig {
id: string;
type: MCPServerType;
name?: string;
url?: string;
api_key?: string;
command?: string;
args?: string[];
env?: Record<string, string>;
}
function MCPSettingsScreen() {
const { t } = useTranslation();
const { data: settings, isLoading } = useSettings();
const { mutate: saveSettings, isPending } = useSaveSettings();
const { mutate: deleteMcpServer } = useDeleteMcpServer();
const { mutate: addMcpServer } = useAddMcpServer();
const { mutate: updateMcpServer } = useUpdateMcpServer();
const [mcpConfig, setMcpConfig] = useState<MCPConfig | undefined>(undefined);
const [isDirty, setIsDirty] = useState(false);
const [view, setView] = useState<"list" | "add" | "edit">("list");
const [editingServer, setEditingServer] = useState<MCPServerConfig | null>(
null,
);
const [confirmationModalIsVisible, setConfirmationModalIsVisible] =
useState(false);
const [serverToDelete, setServerToDelete] = useState<string | null>(null);
useEffect(() => {
if (!mcpConfig && settings?.MCP_CONFIG) {
setMcpConfig(settings.MCP_CONFIG);
}
}, [settings, mcpConfig]);
const handleConfigChange = (config: MCPConfig) => {
setMcpConfig(config);
setIsDirty(true);
const mcpConfig: MCPConfig = settings?.MCP_CONFIG || {
sse_servers: [],
stdio_servers: [],
shttp_servers: [],
};
const formAction = () => {
if (!settings) return;
// Convert servers to a unified format for display
const allServers: MCPServerConfig[] = [
...mcpConfig.sse_servers.map((server, index) => ({
id: `sse-${index}`,
type: "sse" as const,
url: typeof server === "string" ? server : server.url,
api_key: typeof server === "object" ? server.api_key : undefined,
})),
...mcpConfig.stdio_servers.map((server, index) => ({
id: `stdio-${index}`,
type: "stdio" as const,
name: server.name,
command: server.command,
args: server.args,
env: server.env,
})),
...mcpConfig.shttp_servers.map((server, index) => ({
id: `shttp-${index}`,
type: "shttp" as const,
url: typeof server === "string" ? server : server.url,
api_key: typeof server === "object" ? server.api_key : undefined,
})),
];
saveSettings(
{ MCP_CONFIG: mcpConfig },
const handleAddServer = (serverConfig: MCPServerConfig) => {
addMcpServer(serverConfig, {
onSuccess: () => {
setView("list");
},
});
};
const handleEditServer = (serverConfig: MCPServerConfig) => {
updateMcpServer(
{
serverId: serverConfig.id,
server: serverConfig,
},
{
onSuccess: () => {
displaySuccessToast(t(I18nKey.SETTINGS$SAVED));
posthog.capture("settings_saved", {
HAS_MCP_CONFIG: mcpConfig ? "YES" : "NO",
MCP_SSE_SERVERS_COUNT: mcpConfig?.sse_servers?.length || 0,
MCP_STDIO_SERVERS_COUNT: mcpConfig?.stdio_servers?.length || 0,
});
setIsDirty(false);
},
onError: (error) => {
const errorMessage = retrieveAxiosErrorMessage(error);
displayErrorToast(errorMessage || t(I18nKey.ERROR$GENERIC));
setView("list");
},
},
);
};
const handleDeleteServer = (serverId: string) => {
deleteMcpServer(serverId, {
onSuccess: () => {
setConfirmationModalIsVisible(false);
},
});
};
const handleEditClick = (server: MCPServerConfig) => {
setEditingServer(server);
setView("edit");
};
const handleDeleteClick = (serverId: string) => {
setServerToDelete(serverId);
setConfirmationModalIsVisible(true);
};
const handleConfirmDelete = () => {
if (serverToDelete) {
handleDeleteServer(serverToDelete);
setServerToDelete(null);
}
};
const handleCancelDelete = () => {
setConfirmationModalIsVisible(false);
setServerToDelete(null);
};
if (isLoading) {
return <div className="p-9">{t(I18nKey.HOME$LOADING)}</div>;
return (
<div className="px-11 py-9 flex flex-col gap-5">
<div className="animate-pulse">
<div className="h-6 bg-gray-300 rounded w-1/4 mb-4" />
<div className="h-4 bg-gray-300 rounded w-1/2 mb-8" />
<div className="h-10 bg-gray-300 rounded w-32" />
</div>
</div>
);
}
return (
<form
data-testid="mcp-settings-screen"
action={formAction}
className="flex flex-col h-full justify-between"
>
<div className="p-9 flex flex-col gap-12">
<MCPConfigEditor mcpConfig={mcpConfig} onChange={handleConfigChange} />
</div>
<div className="px-11 py-9 flex flex-col gap-5">
{view === "list" && (
<>
<BrandButton
testId="add-mcp-server-button"
type="button"
variant="primary"
onClick={() => setView("add")}
isDisabled={isLoading}
>
{t(I18nKey.SETTINGS$MCP_ADD_SERVER)}
</BrandButton>
<div className="flex gap-6 p-6 justify-end border-t border-t-tertiary">
<BrandButton
testId="submit-button"
type="submit"
variant="primary"
isDisabled={!isDirty || isPending}
>
{!isPending && t(I18nKey.SETTINGS$SAVE_CHANGES)}
{isPending && t(I18nKey.SETTINGS$SAVING)}
</BrandButton>
</div>
</form>
<MCPServerList
servers={allServers}
onEdit={handleEditClick}
onDelete={handleDeleteClick}
/>
</>
)}
{view === "add" && (
<MCPServerForm
mode="add"
existingServers={allServers}
onSubmit={handleAddServer}
onCancel={() => setView("list")}
/>
)}
{view === "edit" && editingServer && (
<MCPServerForm
mode="edit"
server={editingServer}
existingServers={allServers}
onSubmit={handleEditServer}
onCancel={() => {
setView("list");
setEditingServer(null);
}}
/>
)}
{confirmationModalIsVisible && (
<ConfirmationModal
text={t(I18nKey.SETTINGS$MCP_CONFIRM_DELETE)}
onConfirm={handleConfirmDelete}
onCancel={handleCancelDelete}
/>
)}
</div>
);
}

View File

@@ -23,6 +23,7 @@ const SAAS_NAV_ITEMS = [
{ to: "/settings/billing", text: "SETTINGS$NAV_CREDITS" },
{ to: "/settings/secrets", text: "SETTINGS$NAV_SECRETS" },
{ to: "/settings/api-keys", text: "SETTINGS$NAV_API_KEYS" },
{ to: "/settings/mcp", text: "SETTINGS$NAV_MCP" },
];
const OSS_NAV_ITEMS = [

View File

@@ -26,6 +26,7 @@ export const DEFAULT_SETTINGS: Settings = {
MCP_CONFIG: {
sse_servers: [],
stdio_servers: [],
shttp_servers: [],
},
GIT_USER_NAME: "openhands",
GIT_USER_EMAIL: "openhands@all-hands.dev",

View File

@@ -24,9 +24,15 @@ export type MCPStdioServer = {
env?: Record<string, string>;
};
export type MCPSHTTPServer = {
url: string;
api_key?: string;
};
export type MCPConfig = {
sse_servers: (string | MCPSSEServer)[];
stdio_servers: MCPStdioServer[];
shttp_servers: (string | MCPSHTTPServer)[];
};
export type Settings = {
@@ -77,6 +83,7 @@ export type ApiSettings = {
mcp_config?: {
sse_servers: (string | MCPSSEServer)[];
stdio_servers: MCPStdioServer[];
shttp_servers: (string | MCPSHTTPServer)[];
};
email?: string;
email_verified?: boolean;

View File

@@ -26,6 +26,7 @@ Here are some instructions for pushing, but ONLY do this if the user asks you to
* Use the `create_pr` tool to create a pull request, if you haven't already
* Once you've created your own branch or a pull request, continue to update it. Do NOT create a new one unless you are explicitly asked to. Update the PR title and description as necessary, but don't change the branch name.
* Use the main branch as the base branch, unless the user requests otherwise
* If you need to add labels when opening a PR, check the existing labels defined on that repository and select from existing ones. Do not invent your own labels.
* After opening or updating a pull request, send the user a short message with a link to the pull request.
* Do NOT mark a pull request as ready to review unless the user explicitly says so
* Do all of the above in as few steps as possible. E.g. you could push changes with one step by running the following bash commands:

View File

@@ -26,6 +26,7 @@ Here are some instructions for pushing, but ONLY do this if the user asks you to
* Use the `create_mr` tool to create a merge request, if you haven't already
* Once you've created your own branch or a merge request, continue to update it. Do NOT create a new one unless you are explicitly asked to. Update the PR title and description as necessary, but don't change the branch name.
* Use the main branch as the base branch, unless the user requests otherwise
* If you need to add labels when opening a MR, check the existing labels defined on that repository and select from existing ones. Do not invent your own labels.
* After opening or updating a merge request, send the user a short message with a link to the merge request.
* Do all of the above in as few steps as possible. E.g. you could push changes with one step by running the following bash commands:
```bash

View File

@@ -18,7 +18,7 @@ from openhands.events.action import (
from openhands.events.event import EventSource
from openhands.events.observation import BrowserOutputObservation
from openhands.events.observation.observation import Observation
from openhands.llm.llm import LLM
from openhands.llm.llm_registry import LLMRegistry
from openhands.runtime.plugins import (
PluginRequirement,
)
@@ -102,15 +102,15 @@ class BrowsingAgent(Agent):
def __init__(
self,
llm: LLM,
config: AgentConfig,
llm_registry: LLMRegistry,
) -> None:
"""Initializes a new instance of the BrowsingAgent class.
Parameters:
- llm (LLM): The llm to be used by this agent
"""
super().__init__(llm, config)
super().__init__(config, llm_registry)
# define a configurable action space, with chat functionality, web navigation, and webpage grounding using accessibility tree and HTML.
# see https://github.com/ServiceNow/BrowserGym/blob/main/core/src/browsergym/core/action/highlevel.py for more details
action_subsets = ['chat', 'bid']

View File

@@ -3,6 +3,8 @@ import sys
from collections import deque
from typing import TYPE_CHECKING
from openhands.llm.llm_registry import LLMRegistry
if TYPE_CHECKING:
from litellm import ChatCompletionToolParam
@@ -32,7 +34,6 @@ from openhands.core.logger import openhands_logger as logger
from openhands.core.message import Message
from openhands.events.action import AgentFinishAction, MessageAction
from openhands.events.event import Event
from openhands.llm.llm import LLM
from openhands.llm.llm_utils import check_tools
from openhands.memory.condenser import Condenser
from openhands.memory.condenser.condenser import Condensation, View
@@ -74,18 +75,13 @@ class CodeActAgent(Agent):
JupyterRequirement(),
]
def __init__(
self,
llm: LLM,
config: AgentConfig,
) -> None:
def __init__(self, config: AgentConfig, llm_registry: LLMRegistry) -> None:
"""Initializes a new instance of the CodeActAgent class.
Parameters:
- llm (LLM): The llm to be used by this agent
- config (AgentConfig): The configuration for this agent
"""
super().__init__(llm, config)
super().__init__(config, llm_registry)
self.pending_actions: deque['Action'] = deque()
self.reset()
self.tools = self._get_tools()
@@ -93,7 +89,7 @@ class CodeActAgent(Agent):
# Create a ConversationMemory instance
self.conversation_memory = ConversationMemory(self.config, self.prompt_manager)
self.condenser = Condenser.from_config(self.config.condenser)
self.condenser = Condenser.from_config(self.config.condenser, llm_registry)
logger.debug(f'Using condenser: {type(self.condenser)}')
@property

View File

@@ -22,7 +22,7 @@ from openhands.events.observation import (
Observation,
)
from openhands.events.serialization.event import event_to_dict
from openhands.llm.llm import LLM
from openhands.llm.llm_registry import LLMRegistry
"""
FIXME: There are a few problems this surfaced
@@ -42,8 +42,12 @@ class DummyAgent(Agent):
without making any LLM calls.
"""
def __init__(self, llm: LLM, config: AgentConfig):
super().__init__(llm, config)
def __init__(
self,
config: AgentConfig,
llm_registry: LLMRegistry,
):
super().__init__(config, llm_registry)
self.steps: list[ActionObs] = [
{
'action': MessageAction('Time to get started!'),

View File

@@ -4,7 +4,7 @@ import openhands.agenthub.loc_agent.function_calling as locagent_function_callin
from openhands.agenthub.codeact_agent import CodeActAgent
from openhands.core.config import AgentConfig
from openhands.core.logger import openhands_logger as logger
from openhands.llm.llm import LLM
from openhands.llm.llm_registry import LLMRegistry
if TYPE_CHECKING:
from openhands.events.action import Action
@@ -16,8 +16,8 @@ class LocAgent(CodeActAgent):
def __init__(
self,
llm: LLM,
config: AgentConfig,
llm_registry: LLMRegistry,
) -> None:
"""Initializes a new instance of the LocAgent class.
@@ -25,7 +25,8 @@ class LocAgent(CodeActAgent):
- llm (LLM): The llm to be used by this agent
- config (AgentConfig): The configuration for the agent
"""
super().__init__(llm, config)
super().__init__(config, llm_registry)
self.tools = locagent_function_calling.get_tools()
logger.debug(

View File

@@ -1,6 +1,4 @@
from openhands.agenthub.readonly_agent.readonly_agent import ReadOnlyPlanningAgent
from openhands.agenthub.readonly_agent.readonly_agent import ReadOnlyAgent
from openhands.controller.agent import Agent
Agent.register('ReadOnlyPlanningAgent', ReadOnlyPlanningAgent)
# Backward-compat alias for tests and existing references
Agent.register('ReadOnlyAgent', ReadOnlyPlanningAgent)
Agent.register('ReadOnlyAgent', ReadOnlyAgent)

View File

@@ -22,7 +22,6 @@ from openhands.agenthub.readonly_agent.tools import (
GlobTool,
GrepTool,
ViewTool,
create_plan_md_tool,
)
from openhands.core.exceptions import (
FunctionCallNotExistsError,
@@ -34,12 +33,11 @@ from openhands.events.action import (
AgentFinishAction,
AgentThinkAction,
CmdRunAction,
FileEditAction,
FileReadAction,
MCPAction,
MessageAction,
)
from openhands.events.event import FileEditSource, FileReadSource
from openhands.events.event import FileReadSource
from openhands.events.tool import ToolCallMetadata
@@ -195,42 +193,6 @@ def response_to_actions(
glob_cmd = glob_to_cmdrun(pattern, path)
action = CmdRunAction(command=glob_cmd, is_input=False)
# ================================================
# Plan MD Tool (ACI: FileReadAction/FileEditAction)
# ================================================
elif tool_call.function.name == 'plan_md':
cmd = arguments.get('command')
if cmd not in ['view', 'create', 'edit']:
raise FunctionCallValidationError(
f'Invalid command for plan_md: {cmd}'
)
plan_path = '/workspace/PLAN.md'
if cmd == 'view':
action = FileReadAction(
path=plan_path, impl_source=FileReadSource.OH_ACI
)
elif cmd == 'create':
file_text = arguments.get('file_text', '')
action = FileEditAction(
path=plan_path,
command='create',
file_text=file_text,
impl_source=FileEditSource.OH_ACI,
)
else: # edit
old_str = arguments.get('old_str')
new_str = arguments.get('new_str')
if old_str is None or new_str is None:
raise FunctionCallValidationError(
'plan_md.edit requires both old_str and new_str (replace full content). Use plan_md.view first to get exact old_str.'
)
action = FileEditAction(
path=plan_path,
command='str_replace',
old_str=old_str,
new_str=new_str,
impl_source=FileEditSource.OH_ACI,
)
# ================================================
# MCPAction (MCP)
@@ -283,5 +245,4 @@ def get_tools() -> list[ChatCompletionToolParam]:
GrepTool,
GlobTool,
ViewTool,
create_plan_md_tool(),
]

View File

@@ -1,29 +1,34 @@
You are the ReadOnlyPlanningAgent, a planning-focused assistant.
Your SOLE GOAL is to produce and iterate on a single planning document for the current task: /workspace/PLAN.md.
You must not perform implementation work; when the plan is ready, finish the session.
You are OpenHands ReadOnlyAgent, a helpful AI assistant focused on code analysis and exploration. You can:
- Explore and analyze codebases
- Browse the web for relevant information
- Plan potential changes
- Answer questions about code
CAPABILITIES
- grep: search patterns in files to understand the codebase
- glob: list matching files
- view: read files to gather context
- plan_md: manage the planning doc PLAN.md (only commands: view, create, edit)
- think: reason about findings
- finish: end the planning session when ready to execute
<CAPABILITIES>
✓ READ-ONLY TOOLS:
- view: Read file contents
- grep: Search for patterns
- glob: List matching files
- think: Analyze information
- web_read: Access web resources
- finish: Complete current task
STRICT RESTRICTIONS
- Do NOT run shell commands or execute code
- Do NOT modify any files except /workspace/PLAN.md via plan_md
- Do NOT create or edit any other files
RESTRICTIONS:
- Cannot modify any files
- Cannot execute state-changing commands
</CAPABILITIES>
PLANNING GUIDELINES
- Produce a concise, numbered plan with rationale and assumptions
- Capture open questions and risks
- Use plan_md.view to inspect current content, plan_md.create to create the file, and plan_md.edit to replace content
- For plan_md.edit, first view the file to obtain its current exact contents; then replace the entire file by setting old_str to the exact current contents and new_str to the updated full plan
- Iterate as needed until the plan is solid
- When done, use finish to end the planning session with a short summary of the plan
<GUIDELINES>
1. When analyzing code or answering questions:
- Be thorough and methodical
- Prioritize accuracy over speed
- Provide detailed explanations
OUTPUT STYLE
- Keep the plan readable and structured
- Prefer bullets and numbered lists
- Defer implementation details and code changes to the execution phase
2. For file operations:
- Always verify file locations before accessing
- Don't assume paths are relative to current directory
3. If asked to make changes:
- Explain you are read-only
- Recommend using CodeActAgent instead
</GUIDELINES>

View File

@@ -1,14 +1,10 @@
"""ReadOnlyPlanningAgent - A specialized planning agent for read-only research plus maintaining PLAN.md.
This agent is derived from CodeActAgent and constrained to:
- Use read-only research tools (grep, glob, view) to explore the repository
- Maintain a single planning document at /workspace/PLAN.md via a dedicated plan editor tool
- Avoid any other code execution or edits outside PLAN.md
"""
"""ReadOnlyAgent - A specialized version of CodeActAgent that only uses read-only tools."""
import os
from typing import TYPE_CHECKING
from openhands.llm.llm_registry import LLMRegistry
if TYPE_CHECKING:
from litellm import ChatCompletionToolParam
@@ -21,36 +17,40 @@ from openhands.agenthub.readonly_agent import (
)
from openhands.core.config import AgentConfig
from openhands.core.logger import openhands_logger as logger
from openhands.llm.llm import LLM
from openhands.utils.prompt import PromptManager
class ReadOnlyPlanningAgent(CodeActAgent):
class ReadOnlyAgent(CodeActAgent):
VERSION = '1.0'
"""
The ReadOnlyPlanningAgent is designed for planning large features safely.
It provides:
- Read-only repo exploration: grep, glob, view
- PLAN.md maintenance via a specialized plan editor tool (create/view/edit only)
- No other file modifications or command execution
The ReadOnlyAgent is a specialized version of CodeActAgent that only uses read-only tools.
This agent is designed for safely exploring codebases without making any changes.
It only has access to tools that don't modify the system: grep, glob, view, think, finish, web_read.
Use this agent when you want to:
1. Explore a codebase to understand its structure
2. Search for specific patterns or code
3. Research without making any changes
When you're ready to make changes, switch to the regular CodeActAgent.
"""
def __init__(
self,
llm: LLM,
config: AgentConfig,
llm_registry: LLMRegistry,
) -> None:
"""Initializes a new instance of the ReadOnlyAgent class.
Parameters:
- llm (LLM): The llm to be used by this agent
- config (AgentConfig): The configuration for this agent
"""
# Initialize the CodeActAgent class; some of it is overridden with class methods
super().__init__(llm, config)
super().__init__(config, llm_registry)
logger.debug(
f'TOOLS loaded for ReadOnlyPlanningAgent: {", ".join([tool.get("function").get("name") for tool in self.tools])}'
f'TOOLS loaded for ReadOnlyAgent: {", ".join([tool.get("function").get("name") for tool in self.tools])}'
)
@property
@@ -74,7 +74,7 @@ class ReadOnlyPlanningAgent(CodeActAgent):
- mcp_tools (list[dict]): The list of MCP tools.
"""
logger.warning(
'ReadOnlyPlanningAgent does not support MCP tools. MCP tools will be ignored by the agent.'
'ReadOnlyAgent does not support MCP tools. MCP tools will be ignored by the agent.'
)
def response_to_actions(self, response: 'ModelResponse') -> list['Action']:

View File

@@ -5,20 +5,17 @@ This module defines the read-only tools for the ReadOnlyAgent.
from .glob import GlobTool
from .grep import GrepTool
from .plan_md import create_plan_md_tool
from .view import ViewTool
__all__ = [
'ViewTool',
'GrepTool',
'GlobTool',
'create_plan_md_tool',
]
# Define the list of read-only tools for exploration and planning
# Define the list of read-only tools
READ_ONLY_TOOLS = [
ViewTool,
GrepTool,
GlobTool,
# plan_md tool is exported as a factory; included via function_calling.get_tools
]

Some files were not shown because too many files have changed in this diff Show More