Compare commits

..

25 Commits

Author SHA1 Message Date
openhands
adbfae2600 Fix SubprocessBashSession to allow multiple commands by default
- Add allow_multiple_commands parameter to SubprocessBashSession constructor
- Default to True to maintain compatibility with original CLIRuntime behavior
- When False, rejects multiple commands separated by newlines for security
- Fixes test_cliruntime_multiple_newline_commands test failure
- Maintains security by allowing fine-grained control over command execution
2025-06-18 18:32:29 +00:00
openhands
213d2dc056 Fix CLI runtime tests: handle interactive input and background processes
- Move interactive input handling back to CLIRuntime.run() method
- Return ErrorObservation for is_input=True actions with proper error message
- Fix timeout message format for CLIRuntime (simpler format)
- Add background process handling in SubprocessBashSession
- Detect commands ending with '&' and handle them appropriately
- Fix working directory metadata for CLIRuntime observations
- All CLI runtime tests now pass
2025-06-18 00:54:09 +00:00
openhands
da38890aaf Fix linting issues: trailing whitespace and formatting
- Remove trailing whitespace from CLI runtime and bash session files
- Fix string quote consistency and line formatting
- All pre-commit hooks now pass successfully
2025-06-18 00:46:43 +00:00
openhands
edb373cea8 Simplify implementation: Add SubprocessBashSession directly to bash.py
- Add INTERRUPTED status to BashCommandStatus enum
- Add SubprocessBashSession class inheriting from BashSession
- Update CLIRuntime to use SubprocessBashSession instead of subprocess directly
- Maintain all original BashSession (tmux) functionality
- Clean implementation with minimal diff changes
- Remove complex inheritance hierarchy files

This approach minimizes negative diffs by keeping original code in place
and adding new functionality alongside existing implementation.
2025-06-18 00:10:23 +00:00
openhands
f8c5be917c Fix pre-commit issues
- Fix import order in bash.py
- Make cwd property abstract in base class to match implementations
- Address trailing whitespace and formatting issues
2025-06-17 21:59:13 +00:00
Ray Myers
b7efeb11d9 Bump version to 0.44.0 (#9163)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 21:13:17 +00:00
Graham Neubig
7d0aadf8ed Rename ~/.openhands-state to ~/.openhands (#9135)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 20:44:52 +00:00
Mislav Lukach
78af1de870 chore(analytics): improve label clarity (#9161)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 20:33:52 +00:00
llamantino
6a9065960d fix(devcontainer): mark workspace as safe dir (#9136) 2025-06-18 04:22:42 +08:00
Maxim Evtush
653a8a7ce2 Refactor: Improve Consistency in Function Signatures and Regex Usage in compute_ism_pm_score.py (#9145) 2025-06-18 04:22:16 +08:00
Graham Neubig
3591c7a79f Add uvx installation option to CLI documentation (#9186)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 04:19:18 +08:00
Ivan Dagelic
bae6bd77f4 fix: daytona runtime sandbox handling (#9187)
Signed-off-by: Ivan Dagelic <dagelic.ivan@gmail.com>
2025-06-18 04:18:46 +08:00
Rohit Malhotra
30c71776e7 [Fix]: Loading microagents for integrations (#9189)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 16:16:19 -04:00
Robert Brennan
147ffb7e42 Suppress pydub warning about ffmpeg/avconv not found (#8940)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 14:44:32 -04:00
Tim O'Farrell
237037cee9 Fix remote runtime status (#9190) 2025-06-18 02:34:41 +08:00
Xingyao Wang
567af43a71 Fix deprecation warning: Replace get_events with search_events (#9188)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 00:54:29 +08:00
Rohit Malhotra
65071550b6 Fix grammar issues in Slack documentation (#9180)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 23:53:55 +08:00
Alexander
d81d2f62cb docs: local serving with ollama documented (#8807)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-17 07:18:18 -04:00
Ryan H. Tran
ddaa186971 [GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057) 2025-06-17 13:16:50 +07:00
Graham Neubig
e6e0f4673f docs: Add "Running OpenHands with OpenHands" section for recursive development (#9146)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 20:57:52 -04:00
Graham Neubig
7d78b65a1a docs: Add Python version requirement to CLI documentation (#9164)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 20:14:10 +00:00
Rohit Malhotra
1f90086030 (Hotfix): Slack app installation flow (#9162)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 19:33:43 +00:00
Xingyao Wang
2c4ecd02f7 feat(frontend): add user feedback Likert scale for agent performance rating (only on OH Cloud) (#8992)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-16 19:26:24 +00:00
Rohit Malhotra
2fd1fdcd7e [Refactor, Fix]: Agent controller state/metrics management (#9012)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 11:24:13 -04:00
Graham Neubig
cbe32a1a12 Fix bash timeout issue caused by interactive git clone prompts (#9148)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 08:39:28 -04:00
100 changed files with 2985 additions and 1137 deletions

View File

@@ -1,5 +1,9 @@
#!/bin/bash
# Mark the current repository as safe for Git to prevent "dubious ownership" errors,
# which can occur in containerized environments when directory ownership doesn't match the current user.
git config --global --add safe.directory "$(realpath .)"
# Install `nc`
sudo apt update && sudo apt install netcat -y

View File

@@ -5,6 +5,18 @@ This repository contains the code for OpenHands, an automated AI software engine
To set up the entire repo, including frontend and backend, run `make build`.
You don't need to do this unless the user asks you to, or if you're trying to run the entire application.
## Running OpenHands with OpenHands:
To run the full application for development or self-improvement:
```bash
export INSTALL_DOCKER=0
export RUNTIME=local
make build && make run
```
For external access (cloud environments), use:
```bash
make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
```
IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.

View File

@@ -103,6 +103,29 @@ components or interface enhancements.
make start-frontend
```
### 5. Running OpenHands with OpenHands
You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
#### Quick Start
1. **Build and run OpenHands:**
```bash
export INSTALL_DOCKER=0
export RUNTIME=local
make build && make run
```
2. **Access the interface:**
- Local development: http://localhost:3001
- Remote/cloud environments: Use the appropriate external URL
3. **Configure for external access (if needed):**
```bash
# For external access (e.g., cloud environments)
make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
```
### 6. LLM Debugging
If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
@@ -136,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.43-nikolaik`
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.44-nikolaik`
## Develop inside Docker container

View File

@@ -62,19 +62,21 @@ system requirements and more information.
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.43
docker.all-hands.dev/all-hands-ai/openhands:0.44
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
When you open the application, you'll be asked to choose an LLM provider and add an API key.

View File

@@ -51,19 +51,21 @@ OpenHands也可以使用Docker在本地系统上运行。
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.43
docker.all-hands.dev/all-hands-ai/openhands:0.44
```
> **注意**: 如果您在0.44版本之前使用过OpenHands您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
您将在[http://localhost:3000](http://localhost:3000)找到运行中的OpenHands
打开应用程序时您将被要求选择一个LLM提供商并添加API密钥。

View File

@@ -44,7 +44,7 @@ ENV WORKSPACE_BASE=/opt/workspace_base
ENV OPENHANDS_BUILD_VERSION=$OPENHANDS_BUILD_VERSION
ENV SANDBOX_USER_ID=0
ENV FILE_STORE=local
ENV FILE_STORE_PATH=/.openhands-state
ENV FILE_STORE_PATH=/.openhands
RUN mkdir -p $FILE_STORE_PATH
RUN mkdir -p $WORKSPACE_BASE

View File

@@ -12,7 +12,7 @@ services:
- SANDBOX_API_HOSTNAME=host.docker.internal
- DOCKER_HOST_ADDR=host.docker.internal
#
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.43-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.44-nikolaik}
- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

View File

@@ -7,8 +7,8 @@ services:
image: openhands:latest
container_name: openhands-app-${DATE:-}
environment:
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of openhands-state for this user
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:
- "3000:3000"
@@ -16,7 +16,7 @@ services:
- "host.docker.internal:host-gateway"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ~/.openhands-state:/.openhands-state
- ~/.openhands:/.openhands
- ${WORKSPACE_BASE:-$PWD/workspace}:/opt/workspace_base
pull_policy: build
stdin_open: true

View File

@@ -5,15 +5,38 @@ description: This guide walks you through installing the OpenHands Slack app.
## Prerequisites
- You are a slack workspace admin
- Access to OpenHands Cloud
## Installation Steps
1. Log in to [OpenHands Cloud](https://app.all-hands.dev)
2. Click the button below to OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
3. In the top right corner, select the workspace to install the OpenHands Slack app.
4. Review permissions and click allow
<AccordionGroup>
<Accordion title="Install Slack App (only for Slack admins/owners)">
**This step is for Slack admins/owners**
1. Make sure you have permissions to install Apps to your workspace.
2. Click the button below to install OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
3. In the top right corner, select the workspace to install the OpenHands Slack app.
4. Review permissions and click allow.
</Accordion>
<Accordion title="Authorize Slack App (for all Slack workspace members)">
**Make sure your Slack workspace admin/owner has installed OpenHands Slack App first**
Every user in the Slack workspace (including admins/owners) must link their Cloud OpenHands account to the OpenHands Slack App. To do this:
1. Visit [integrations settings](https://app.all-hands.dev/settings/integrations) in OpenHands Cloud.
2. Click the button "Install Slack App".
3. In the top right corner, select the workspace to install the OpenHands Slack app.
4. Review permissions and click allow.
Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App.
</Accordion>
</AccordionGroup>
## Working With the Slack App
@@ -45,6 +68,6 @@ You can mention a repo name when starting a new conversation in the following fo
2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)
The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
If the repo name partially matches against, multiple repos, you'll be asked to select a repo from the filtered list.
If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list.
![slack-pro-tip.png](/static/img/slack-pro-tip.png)

View File

@@ -11,10 +11,18 @@ for scripting.
### Running with Python
**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported)
1. Install OpenHands using pip:
```bash
pip install openhands-ai
```
Or if you prefer not to manage your own Python environment, you can use `uvx`:
```bash
uvx --python 3.12 --from openhands-ai openhands
```
2. Launch an interactive OpenHands conversation from the command line:
@@ -47,19 +55,21 @@ poetry run python -m openhands.cli.main
```bash
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
-e LLM_MODEL=$LLM_MODEL \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.43 \
docker.all-hands.dev/all-hands-ai/openhands:0.44 \
python -m openhands.cli.main --override-cli-mode true
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
This launches the CLI in Docker, allowing you to interact with OpenHands as described above.
The `-e SANDBOX_USER_ID=$(id -u)` ensures files created by the agent in your workspace have the correct permissions.

View File

@@ -32,19 +32,20 @@ To run OpenHands in Headless mode with Docker:
```bash
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
-e LLM_MODEL=$LLM_MODEL \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.43 \
docker.all-hands.dev/all-hands-ai/openhands:0.44 \
python -m openhands.core.main -t "write a bash script that prints hi"
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host users
permissions. This prevents the agent from creating root-owned files in the mounted workspace.

View File

@@ -54,25 +54,27 @@ Check [the installation guide](/usage/local-setup) to make sure you have all the
export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
export LMSTUDIO_URL="http://host.docker.internal:1234" # <- Replace this with the port from LMStudio
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json
mkdir -p ~/.openhands && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands/settings.json
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.43
docker.all-hands.dev/all-hands-ai/openhands:0.44
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
```
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.43
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.44
Starting OpenHands...
Running OpenHands as root
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
@@ -126,6 +128,18 @@ vllm serve all-hands/openhands-lm-32b-v0.1 \
--enable-prefix-caching
```
### Create an OpenAI-Compatible Endpoint with Ollama
- Install Ollama following [the official documentation](https://ollama.com/download).
- For Ollama configuration, use `ollama/<modelname>` as custom model in web. Api key also can be set to `ollama`.
- Example launch command for Devstral LM 24B:
```bash
OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve&
#The minimum context size is ~8196, even the system prompt won't fit smaller
ollama pull devstral:latest
```
## Advanced: Run and Configure OpenHands
### Run OpenHands

View File

@@ -67,19 +67,21 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
### Start the App
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.43
docker.all-hands.dev/all-hands-ai/openhands:0.44
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at http://localhost:3000!
### Setup

View File

@@ -31,9 +31,9 @@ On initial prompt, an error is seen with `Permission Denied` or `PermissionError
**Resolution**
* Check if the `~/.openhands-state` is owned by `root`. If so, you can:
* Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands-state`.
* or update permissions on the directory: `sudo chmod 777 ~/.openhands-state`
* Check if the `~/.openhands` is owned by `root`. If so, you can:
* Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands`.
* or update permissions on the directory: `sudo chmod 777 ~/.openhands`
* or delete it if you dont need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings.
* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running
OpenHands.
@@ -56,13 +56,16 @@ To fix this:
-e SANDBOX_VSCODE_PORT=41234 \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:latest \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
-p 41234:41234 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:latest
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
2. Make sure to expose the same port with `-p 41234:41234` in your Docker command.
3. If running with the development workflow, you can set this in your `config.toml` file:
```toml

1
evaluation/benchmarks/gaia/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
data/

View File

@@ -6,6 +6,13 @@ This folder contains evaluation harness for evaluating agents on the [GAIA bench
Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
To enable the Tavily MCP Server, you can add the Tavily API key under the `core` section of your `config.toml` file, like below:
```toml
[core]
search_api_key = "tvly-******"
```
## Run the evaluation
We are using the GAIA dataset hosted on [Hugging Face](https://huggingface.co/datasets/gaia-benchmark/GAIA).

View File

@@ -1,4 +1,5 @@
import asyncio
import copy
import functools
import os
import re
@@ -6,6 +7,7 @@ import re
import huggingface_hub
import pandas as pd
from datasets import load_dataset
from pydantic import SecretStr
from evaluation.benchmarks.gaia.scorer import question_scorer
from evaluation.utils.shared import (
@@ -24,6 +26,7 @@ from openhands.core.config import (
OpenHandsConfig,
get_llm_config_arg,
get_parser,
load_from_toml,
)
from openhands.core.config.utils import get_agent_config_arg
from openhands.core.logger import openhands_logger as logger
@@ -41,7 +44,7 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
}
AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
'CodeActAgent': 'When you think you have solved the question, please use the finish tool and include your final answer in the message parameter of the finish tool. Your final answer MUST be encapsulated within <solution> and </solution>.\n'
}
@@ -49,7 +52,7 @@ def get_config(
metadata: EvalMetadata,
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
@@ -67,6 +70,11 @@ def get_config(
logger.info('Agent config not provided, using default settings')
agent_config = config.get_agent_config(metadata.agent_class)
agent_config.enable_prompt_extensions = False
config_copy = copy.deepcopy(config)
load_from_toml(config_copy)
if config_copy.search_api_key:
config.search_api_key = SecretStr(config_copy.search_api_key)
return config
@@ -134,16 +142,26 @@ def process_instance(
dest_file = None
# Prepare instruction
instruction = f'{instance["Question"]}\n'
instruction = """You have one question to answer. It is paramount that you provide a correct answer.
Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
You must make sure you find the correct answer! You MUST strictly follow the task-specific formatting instructions for your final answer.
Here is the task:
{task_question}
""".format(
task_question=instance['Question'],
)
logger.info(f'Instruction: {instruction}')
if dest_file:
instruction += f'\n\nThe mentioned file is provided in the workspace at: {dest_file.split("/")[-1]}'
instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
instruction += 'Please encapsulate your final answer (answer ONLY) within <solution> and </solution>.\n'
instruction += """IMPORTANT: When seeking information from a website, REFRAIN from arbitrary URL navigation. You should utilize the designated search engine tool with precise keywords to obtain relevant URLs or use the specific website's search interface. DO NOT navigate directly to specific URLs as they may not exist.\n\nFor example: if you want to search for a research paper on Arxiv, either use the search engine tool with specific keywords or navigate to arxiv.org and then use its interface.\n"""
instruction += 'IMPORTANT: You should NEVER ask for Human Help.\n'
instruction += 'IMPORTANT: Please encapsulate your final answer (answer ONLY) within <solution> and </solution>. Your answer will be evaluated using string matching approaches so it important that you STRICTLY adhere to the output formatting instructions specified in the task (e.g., alphabetization, sequencing, units, rounding, decimal places, etc.)\n'
instruction += (
'For example: The answer to the question is <solution> 42 </solution>.\n'
)
instruction += "IMPORTANT: Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, express it numerically (i.e., with digits rather than words), do not use commas, and do not include units such as $ or percent signs unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities). If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.\n"
# NOTE: You can actually set slightly different instruction for different agents
instruction += AGENT_CLS_TO_INST_SUFFIX.get(metadata.agent_class, '')
logger.info(f'Instruction:\n{instruction}', extra={'msg_type': 'OBSERVATION'})
@@ -175,7 +193,7 @@ def process_instance(
for event in reversed(state.history):
if event.source == 'agent':
if isinstance(event, AgentFinishAction):
model_answer_raw = event.thought
model_answer_raw = event.final_thought
break
elif isinstance(event, CmdRunAction):
model_answer_raw = event.thought
@@ -222,6 +240,7 @@ def process_instance(
error=state.last_error if state and state.last_error else None,
test_result=test_result,
)
runtime.close()
return output
@@ -253,6 +272,8 @@ if __name__ == '__main__':
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
toml_config = OpenHandsConfig()
load_from_toml(toml_config)
metadata = make_metadata(
llm_config=llm_config,
dataset_name='gaia',
@@ -261,7 +282,10 @@ if __name__ == '__main__':
eval_note=args.eval_note,
eval_output_dir=args.eval_output_dir,
data_split=args.data_split,
details={'gaia-level': args.level},
details={
'gaia-level': args.level,
'mcp-servers': ['tavily'] if toml_config.search_api_key else [],
},
agent_config=agent_config,
)

View File

@@ -39,7 +39,7 @@ echo "LEVELS: $LEVELS"
COMMAND="poetry run python ./evaluation/benchmarks/gaia/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations 30 \
--max-iterations 60 \
--level $LEVELS \
--data-split validation \
--eval-num-workers $NUM_WORKERS \

View File

@@ -116,7 +116,7 @@ def get_token_per_line(code: str):
return identifiers_per_line
def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
def get_ISM(answer_code: str, model_output_list: list, answer_name: str) -> list:
"""
计算ISM返回一个有序的得分列表
:return:
@@ -126,13 +126,13 @@ def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list
if '```python' in code:
code = code.replace('```python', '')
code = code.replace('```', '')
if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
code
):
score_list.append(0)
continue
# if asnwer_name not in code:
# if answer_name not in code:
# score_list.append(0)
# continue
@@ -155,7 +155,7 @@ def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list
def get_ISM_without_verification(
answer_code: str, model_output_list: list, asnwer_name: str
answer_code: str, model_output_list: list, answer_name: str
) -> list:
"""
计算ISM返回一个有序的得分列表
@@ -163,11 +163,11 @@ def get_ISM_without_verification(
"""
score_list = []
for code in model_output_list:
if asnwer_name not in code:
if answer_name not in code:
score_list.append(0)
continue
# if asnwer_name not in code:
# if answer_name not in code:
# score_list.append(0)
# continue
@@ -215,7 +215,7 @@ def longest_common_prefix_with_lengths(list1, list2):
return max_length, len_list1, len_list2
def get_PM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
def get_PM(answer_code: str, model_output_list: list, answer_name: str) -> list:
"""
计算PM返回一个有序的得分列表
:return:
@@ -225,14 +225,14 @@ def get_PM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
if '```python' in code:
code = code.replace('```python', '')
code = code.replace('```', '')
if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
code
):
# if asnwer_name not in code or is_code_valid(code) == False:
# if answer_name not in code or is_code_valid(code) == False:
score_list.append(0)
continue
# if asnwer_name not in code:
# if answer_name not in code:
# score_list.append(0)
# continue

View File

@@ -31,7 +31,7 @@ const renderRepoConnector = () => {
},
{
Component: () => <div data-testid="git-settings-screen" />,
path: "/settings/git",
path: "/settings/integrations",
},
],
},

View File

@@ -35,13 +35,13 @@ const queryClient = new QueryClient();
const GitSettingsRouterStub = createRoutesStub([
{
Component: GitSettingsScreen,
path: "/settings/github",
path: "/settings/integrations",
},
]);
const renderGitSettingsScreen = () => {
const { rerender, ...rest } = render(
<GitSettingsRouterStub initialEntries={["/settings/github"]} />,
<GitSettingsRouterStub initialEntries={["/settings/integrations"]} />,
{
wrapper: ({ children }) => (
<QueryClientProvider client={queryClient}>
@@ -54,7 +54,7 @@ const renderGitSettingsScreen = () => {
const rerenderGitSettingsScreen = () =>
rerender(
<QueryClientProvider client={queryClient}>
<GitSettingsRouterStub initialEntries={["/settings/github"]} />
<GitSettingsRouterStub initialEntries={["/settings/integrations"]} />
</QueryClientProvider>,
);

View File

@@ -31,7 +31,7 @@ const RouterStub = createRoutesStub([
},
{
Component: () => <div data-testid="git-settings-screen" />,
path: "/settings/git",
path: "/settings/integrations",
},
],
},

View File

@@ -30,7 +30,7 @@ vi.mock("react-i18next", async () => {
useTranslation: () => ({
t: (key: string) => {
const translations: Record<string, string> = {
"SETTINGS$NAV_GIT": "Git",
"SETTINGS$NAV_INTEGRATIONS": "Integrations",
"SETTINGS$NAV_APPLICATION": "Application",
"SETTINGS$NAV_CREDITS": "Credits",
"SETTINGS$NAV_API_KEYS": "API Keys",
@@ -61,7 +61,7 @@ describe("Settings Billing", () => {
},
{
Component: () => <div data-testid="git-settings-screen" />,
path: "/settings/git",
path: "/settings/integrations",
},
{
Component: () => <div data-testid="user-settings-screen" />,

View File

@@ -14,7 +14,7 @@ vi.mock("react-i18next", async () => {
useTranslation: () => ({
t: (key: string) => {
const translations: Record<string, string> = {
SETTINGS$NAV_GIT: "Git",
SETTINGS$NAV_INTEGRATIONS: "Integrations",
SETTINGS$NAV_APPLICATION: "Application",
SETTINGS$NAV_CREDITS: "Credits",
SETTINGS$NAV_API_KEYS: "API Keys",
@@ -49,7 +49,7 @@ describe("Settings Screen", () => {
},
{
Component: () => <div data-testid="git-settings-screen" />,
path: "/settings/git",
path: "/settings/integrations",
},
{
Component: () => <div data-testid="application-settings-screen" />,
@@ -79,7 +79,7 @@ describe("Settings Screen", () => {
};
it("should render the navbar", async () => {
const sectionsToInclude = ["llm", "git", "application", "secrets"];
const sectionsToInclude = ["llm", "integrations", "application", "secrets"];
const sectionsToExclude = ["api keys", "credits"];
const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
// @ts-expect-error - only return app mode
@@ -111,7 +111,7 @@ describe("Settings Screen", () => {
APP_MODE: "saas",
});
const sectionsToInclude = [
"git",
"integrations",
"application",
"credits",
"secrets",

View File

@@ -1,12 +1,12 @@
{
"name": "openhands-frontend",
"version": "0.43.0",
"version": "0.44.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "openhands-frontend",
"version": "0.43.0",
"version": "0.44.0",
"dependencies": {
"@heroui/react": "^2.8.0-beta.7",
"@microlink/react-json-view": "^1.26.2",

View File

@@ -1,6 +1,6 @@
{
"name": "openhands-frontend",
"version": "0.43.0",
"version": "0.44.0",
"private": true,
"type": "module",
"engines": {

View File

@@ -111,6 +111,59 @@ class OpenHands {
return data;
}
/**
* Submit conversation feedback with rating
* @param conversationId The conversation ID
* @param rating The rating (1-5)
* @param eventId Optional event ID this feedback corresponds to
* @param reason Optional reason for the rating
* @returns Response from the feedback endpoint
*/
static async submitConversationFeedback(
conversationId: string,
rating: number,
eventId?: number,
reason?: string,
): Promise<{ status: string; message: string }> {
const url = `/feedback/conversation`;
const payload = {
conversation_id: conversationId,
event_id: eventId,
rating,
reason,
metadata: { source: "likert-scale" },
};
const { data } = await openHands.post<{ status: string; message: string }>(
url,
payload,
);
return data;
}
/**
* Check if feedback exists for a specific conversation and event
* @param conversationId The conversation ID
* @param eventId The event ID to check
* @returns Feedback data including existence, rating, and reason
*/
static async checkFeedbackExists(
conversationId: string,
eventId: number,
): Promise<{ exists: boolean; rating?: number; reason?: string }> {
try {
const url = `/feedback/conversation/${conversationId}/${eventId}`;
const { data } = await openHands.get<{
exists: boolean;
rating?: number;
reason?: string;
}>(url);
return data;
} catch (error) {
// Error checking if feedback exists
return { exists: false };
}
}
/**
* Authenticate with GitHub token
* @returns Response with authentication status and user info if successful

View File

@@ -18,6 +18,7 @@ import { useWsClient } from "#/context/ws-client-provider";
import { Messages } from "./messages";
import { ChatSuggestions } from "./chat-suggestions";
import { ActionSuggestions } from "./action-suggestions";
import { ScrollProvider } from "#/context/scroll-context";
import { ScrollToBottomButton } from "#/components/shared/buttons/scroll-to-bottom-button";
import { LoadingSpinner } from "#/components/shared/loading-spinner";
@@ -28,6 +29,7 @@ import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
import { ErrorMessageBanner } from "./error-message-banner";
import { shouldRenderEvent } from "./event-content-helpers/should-render-event";
import { useConfig } from "#/hooks/query/use-config";
function getEntryPoint(
hasRepository: boolean | null,
@@ -45,8 +47,15 @@ export function ChatInterface() {
useOptimisticUserMessage();
const { t } = useTranslation();
const scrollRef = React.useRef<HTMLDivElement>(null);
const { scrollDomToBottom, onChatBodyScroll, hitBottom } =
useScrollToBottom(scrollRef);
const {
scrollDomToBottom,
onChatBodyScroll,
hitBottom,
autoScroll,
setAutoScroll,
setHitBottom,
} = useScrollToBottom(scrollRef);
const { data: config } = useConfig();
const { curAgentState } = useSelector((state: RootState) => state.agent);
@@ -126,80 +135,97 @@ export function ChatInterface() {
curAgentState === AgentState.AWAITING_USER_INPUT ||
curAgentState === AgentState.FINISHED;
// Create a ScrollProvider with the scroll hook values
const scrollProviderValue = {
scrollRef,
autoScroll,
setAutoScroll,
scrollDomToBottom,
hitBottom,
setHitBottom,
onChatBodyScroll,
};
return (
<div className="h-full flex flex-col justify-between">
{events.length === 0 && !optimisticUserMessage && (
<ChatSuggestions onSuggestionsClick={setMessageToSend} />
)}
<div
ref={scrollRef}
onScroll={(e) => onChatBodyScroll(e.currentTarget)}
className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
>
{isLoadingMessages && (
<div className="flex justify-center">
<LoadingSpinner size="small" />
</div>
<ScrollProvider value={scrollProviderValue}>
<div className="h-full flex flex-col justify-between">
{events.length === 0 && !optimisticUserMessage && (
<ChatSuggestions onSuggestionsClick={setMessageToSend} />
)}
{!isLoadingMessages && (
<Messages
messages={events}
isAwaitingUserConfirmation={
curAgentState === AgentState.AWAITING_USER_CONFIRMATION
}
/>
)}
<div
ref={scrollRef}
onScroll={(e) => onChatBodyScroll(e.currentTarget)}
className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
>
{isLoadingMessages && (
<div className="flex justify-center">
<LoadingSpinner size="small" />
</div>
)}
{isWaitingForUserInput &&
events.length > 0 &&
!optimisticUserMessage && (
<ActionSuggestions
onSuggestionsClick={(value) => handleSendMessage(value, [])}
{!isLoadingMessages && (
<Messages
messages={events}
isAwaitingUserConfirmation={
curAgentState === AgentState.AWAITING_USER_CONFIRMATION
}
/>
)}
</div>
<div className="flex flex-col gap-[6px] px-4 pb-4">
<div className="flex justify-between relative">
<TrajectoryActions
onPositiveFeedback={() =>
onClickShareFeedbackActionButton("positive")
}
onNegativeFeedback={() =>
onClickShareFeedbackActionButton("negative")
}
onExportTrajectory={() => onClickExportTrajectoryButton()}
/>
<div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
{curAgentState === AgentState.RUNNING && <TypingIndicator />}
</div>
{!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
{isWaitingForUserInput &&
events.length > 0 &&
!optimisticUserMessage && (
<ActionSuggestions
onSuggestionsClick={(value) => handleSendMessage(value, [])}
/>
)}
</div>
{errorMessage && <ErrorMessageBanner message={errorMessage} />}
<div className="flex flex-col gap-[6px] px-4 pb-4">
<div className="flex justify-between relative">
{config?.APP_MODE !== "saas" && (
<TrajectoryActions
onPositiveFeedback={() =>
onClickShareFeedbackActionButton("positive")
}
onNegativeFeedback={() =>
onClickShareFeedbackActionButton("negative")
}
onExportTrajectory={() => onClickExportTrajectoryButton()}
/>
)}
<InteractiveChatBox
onSubmit={handleSendMessage}
onStop={handleStop}
isDisabled={
curAgentState === AgentState.LOADING ||
curAgentState === AgentState.AWAITING_USER_CONFIRMATION
}
mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
value={messageToSend ?? undefined}
onChange={setMessageToSend}
/>
<div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
{curAgentState === AgentState.RUNNING && <TypingIndicator />}
</div>
{!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
</div>
{errorMessage && <ErrorMessageBanner message={errorMessage} />}
<InteractiveChatBox
onSubmit={handleSendMessage}
onStop={handleStop}
isDisabled={
curAgentState === AgentState.LOADING ||
curAgentState === AgentState.AWAITING_USER_CONFIRMATION
}
mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
value={messageToSend ?? undefined}
onChange={setMessageToSend}
/>
</div>
{config?.APP_MODE !== "saas" && (
<FeedbackModal
isOpen={feedbackModalIsOpen}
onClose={() => setFeedbackModalIsOpen(false)}
polarity={feedbackPolarity}
/>
)}
</div>
<FeedbackModal
isOpen={feedbackModalIsOpen}
onClose={() => setFeedbackModalIsOpen(false)}
polarity={feedbackPolarity}
/>
</div>
</ScrollProvider>
);
}

View File

@@ -1,3 +1,4 @@
import React from "react";
import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
import { OpenHandsAction } from "#/types/core/actions";
import {
@@ -18,6 +19,10 @@ import { MCPObservationContent } from "./mcp-observation-content";
import { getObservationResult } from "./event-content-helpers/get-observation-result";
import { getEventContent } from "./event-content-helpers/get-event-content";
import { GenericEventMessage } from "./generic-event-message";
import { LikertScale } from "../feedback/likert-scale";
import { useConfig } from "#/hooks/query/use-config";
import { useFeedbackExists } from "#/hooks/query/use-feedback-exists";
const hasThoughtProperty = (
obj: Record<string, unknown>,
@@ -39,6 +44,14 @@ export function EventMessage({
const shouldShowConfirmationButtons =
isLastMessage && event.source === "agent" && isAwaitingUserConfirmation;
const { data: config } = useConfig();
// Use our query hook to check if feedback exists and get rating/reason
const {
data: feedbackData = { exists: false },
isLoading: isCheckingFeedback,
} = useFeedbackExists(isFinishAction(event) ? event.id : undefined);
if (isErrorObservation(event)) {
return (
<ErrorMessage
@@ -55,9 +68,25 @@ export function EventMessage({
return null;
}
const showLikertScale =
config?.APP_MODE === "saas" &&
isFinishAction(event) &&
isLastMessage &&
!isCheckingFeedback;
if (isFinishAction(event)) {
return (
<ChatMessage type="agent" message={getEventContent(event).details} />
<>
<ChatMessage type="agent" message={getEventContent(event).details} />
{showLikertScale && (
<LikertScale
eventId={event.id}
initiallySubmitted={feedbackData.exists}
initialRating={feedbackData.rating}
initialReason={feedbackData.reason}
/>
)}
</>
);
}

View File

@@ -0,0 +1,248 @@
import React, { useState, useEffect, useContext } from "react";
import { cn } from "#/utils/utils";
import i18n from "#/i18n";
import { useSubmitConversationFeedback } from "#/hooks/mutation/use-submit-conversation-feedback";
import { ScrollContext } from "#/context/scroll-context";
// Global timeout duration in milliseconds
const AUTO_SUBMIT_TIMEOUT = 10000;
interface LikertScaleProps {
eventId?: number;
initiallySubmitted?: boolean;
initialRating?: number;
initialReason?: string;
}
const FEEDBACK_REASONS = [
i18n.t("FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION"),
i18n.t("FEEDBACK$REASON_FORGOT_CONTEXT"),
i18n.t("FEEDBACK$REASON_UNNECESSARY_CHANGES"),
i18n.t("FEEDBACK$REASON_OTHER"),
];
export function LikertScale({
eventId,
initiallySubmitted = false,
initialRating,
initialReason,
}: LikertScaleProps) {
const [selectedRating, setSelectedRating] = useState<number | null>(
initialRating || null,
);
const [selectedReason, setSelectedReason] = useState<string | null>(
initialReason || null,
);
const [showReasons, setShowReasons] = useState(false);
const [reasonTimeout, setReasonTimeout] = useState<NodeJS.Timeout | null>(
null,
);
const [isSubmitted, setIsSubmitted] = useState(initiallySubmitted);
const [countdown, setCountdown] = useState<number>(0);
// Get scroll context
const scrollContext = useContext(ScrollContext);
// If scrollContext is undefined, we're not inside a ScrollProvider
const scrollToBottom = scrollContext?.scrollDomToBottom;
const autoScroll = scrollContext?.autoScroll;
// Use our mutation hook
const { mutate: submitConversationFeedback } =
useSubmitConversationFeedback();
// Update isSubmitted if initiallySubmitted changes
useEffect(() => {
setIsSubmitted(initiallySubmitted);
}, [initiallySubmitted]);
// Update selectedRating if initialRating changes
useEffect(() => {
if (initialRating) {
setSelectedRating(initialRating);
}
}, [initialRating]);
// Update selectedReason if initialReason changes
useEffect(() => {
if (initialReason) {
setSelectedReason(initialReason);
}
}, [initialReason]);
// Submit feedback and disable the component
const submitFeedback = (rating: number, reason?: string) => {
submitConversationFeedback(
{
rating,
eventId,
reason,
},
{
onSuccess: () => {
setSelectedReason(reason || null);
setShowReasons(false);
setIsSubmitted(true);
},
},
);
};
// Handle star rating selection
const handleRatingClick = (rating: number) => {
if (isSubmitted) return; // Prevent changes after submission
setSelectedRating(rating);
// Only show reasons if rating is 3 or less (1, 2, or 3 stars)
// For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
if (rating <= 3) {
setShowReasons(true);
setCountdown(Math.ceil(AUTO_SUBMIT_TIMEOUT / 1000));
// Set a timeout to auto-submit if no reason is selected
const timeout = setTimeout(() => {
submitFeedback(rating);
}, AUTO_SUBMIT_TIMEOUT);
setReasonTimeout(timeout);
// Only scroll to bottom if the user is already at the bottom (autoScroll is true)
if (scrollToBottom && autoScroll) {
// Small delay to ensure the reasons are fully rendered
setTimeout(() => {
scrollToBottom();
}, 100);
}
} else {
// For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
setShowReasons(false);
submitFeedback(rating);
}
};
// Handle reason selection
const handleReasonClick = (reason: string) => {
if (selectedRating && reasonTimeout && !isSubmitted) {
clearTimeout(reasonTimeout);
setCountdown(0);
submitFeedback(selectedRating, reason);
}
};
// Countdown effect
useEffect(() => {
if (countdown > 0 && showReasons && !isSubmitted) {
const timer = setTimeout(() => {
setCountdown(countdown - 1);
}, 1000);
return () => clearTimeout(timer);
}
return () => {};
}, [countdown, showReasons, isSubmitted]);
// Clean up timeout on unmount
useEffect(
() => () => {
if (reasonTimeout) {
clearTimeout(reasonTimeout);
}
},
[reasonTimeout],
);
// Scroll to bottom when component mounts, but only if user is already at the bottom
useEffect(() => {
if (scrollToBottom && autoScroll && !isSubmitted) {
// Small delay to ensure the component is fully rendered
setTimeout(() => {
scrollToBottom();
}, 100);
}
}, [scrollToBottom, autoScroll, isSubmitted]);
// Scroll to bottom when reasons are shown, but only if user is already at the bottom
useEffect(() => {
if (scrollToBottom && autoScroll && showReasons) {
// Small delay to ensure the reasons are fully rendered
setTimeout(() => {
scrollToBottom();
}, 100);
}
}, [scrollToBottom, autoScroll, showReasons]);
// Helper function to get button class based on state
const getButtonClass = (rating: number) => {
if (isSubmitted) {
return selectedRating && selectedRating >= rating
? "text-yellow-400 cursor-not-allowed"
: "text-gray-300 opacity-50 cursor-not-allowed";
}
return selectedRating && selectedRating >= rating
? "text-yellow-400"
: "text-gray-300 hover:text-yellow-200";
};
return (
<div className="mt-3 flex flex-col gap-1">
<div className="text-sm text-gray-500 mb-1">
{isSubmitted
? i18n.t("FEEDBACK$THANK_YOU_FOR_FEEDBACK")
: i18n.t("FEEDBACK$RATE_AGENT_PERFORMANCE")}
</div>
<div className="flex flex-col gap-1">
<span className="flex gap-2 items-center flex-wrap">
{[1, 2, 3, 4, 5].map((rating) => (
<button
type="button"
key={rating}
onClick={() => handleRatingClick(rating)}
disabled={isSubmitted}
className={cn("text-xl transition-all", getButtonClass(rating))}
aria-label={`Rate ${rating} stars`}
>
</button>
))}
{/* Show selected reason inline with stars when submitted (only for ratings <= 3) */}
{isSubmitted &&
selectedReason &&
selectedRating &&
selectedRating <= 3 && (
<span className="text-sm text-gray-500 italic">
{selectedReason}
</span>
)}
</span>
</div>
{showReasons && !isSubmitted && (
<div className="mt-1 flex flex-col gap-1">
<div className="text-xs text-gray-500 mb-1">
{i18n.t("FEEDBACK$SELECT_REASON")}
</div>
{countdown > 0 && (
<div className="text-xs text-gray-400 mb-1 italic">
{i18n.t("FEEDBACK$SELECT_REASON_COUNTDOWN", {
countdown,
})}
</div>
)}
<div className="flex flex-col gap-0.5">
{FEEDBACK_REASONS.map((reason) => (
<button
type="button"
key={reason}
onClick={() => handleReasonClick(reason)}
className="text-sm text-left py-1 px-2 rounded hover:bg-gray-700 transition-colors"
>
{reason}
</button>
))}
</div>
</div>
)}
</div>
);
}

View File

@@ -10,7 +10,10 @@ export function ConnectToProviderMessage() {
return (
<div className="flex flex-col gap-4">
<p>{t("HOME$CONNECT_PROVIDER_MESSAGE")}</p>
<Link data-testid="navigate-to-settings-button" to="/settings/git">
<Link
data-testid="navigate-to-settings-button"
to="/settings/integrations"
>
<BrandButton type="button" variant="primary" isDisabled={isLoading}>
{!isLoading && t("SETTINGS$TITLE")}
{isLoading && t("HOME$LOADING")}

View File

@@ -0,0 +1,21 @@
import { useTranslation } from "react-i18next";
import { I18nKey } from "#/i18n/declaration";
import { BrandButton } from "../brand-button";
export function InstallSlackAppAnchor() {
const { t } = useTranslation();
return (
<a
data-testid="install-slack-app-button"
href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"
target="_blank"
rel="noreferrer noopener"
className="py-9"
>
<BrandButton type="button" variant="secondary">
{t(I18nKey.SLACK$INSTALL_APP)}
</BrandButton>
</a>
);
}

View File

@@ -0,0 +1,42 @@
import React, { createContext, useContext, ReactNode, RefObject } from "react";
import { useScrollToBottom } from "#/hooks/use-scroll-to-bottom";
interface ScrollContextType {
scrollRef: RefObject<HTMLDivElement | null>;
autoScroll: boolean;
setAutoScroll: (value: boolean) => void;
scrollDomToBottom: () => void;
hitBottom: boolean;
setHitBottom: (value: boolean) => void;
onChatBodyScroll: (e: HTMLElement) => void;
}
export const ScrollContext = createContext<ScrollContextType | undefined>(
undefined,
);
interface ScrollProviderProps {
children: ReactNode;
value?: ScrollContextType;
}
export function ScrollProvider({ children, value }: ScrollProviderProps) {
const scrollHook = useScrollToBottom(React.useRef<HTMLDivElement>(null));
// Use provided value or default to the hook
const contextValue = value || scrollHook;
return (
<ScrollContext.Provider value={contextValue}>
{children}
</ScrollContext.Provider>
);
}
export function useScrollContext() {
const context = useContext(ScrollContext);
if (context === undefined) {
throw new Error("useScrollContext must be used within a ScrollProvider");
}
return context;
}

View File

@@ -0,0 +1,39 @@
import { useMutation, useQueryClient } from "@tanstack/react-query";
import { useTranslation } from "react-i18next";
import OpenHands from "#/api/open-hands";
import { useConversationId } from "#/hooks/use-conversation-id";
type SubmitConversationFeedbackArgs = {
rating: number;
eventId?: number;
reason?: string;
};
export const useSubmitConversationFeedback = () => {
const { conversationId } = useConversationId();
const queryClient = useQueryClient();
const { t } = useTranslation();
return useMutation({
mutationFn: ({ rating, eventId, reason }: SubmitConversationFeedbackArgs) =>
OpenHands.submitConversationFeedback(
conversationId,
rating,
eventId,
reason,
),
onSuccess: (_, { eventId }) => {
// Invalidate the feedback existence query to trigger a refetch
if (eventId) {
queryClient.invalidateQueries({
queryKey: ["feedback", "exists", conversationId, eventId],
});
}
},
onError: (error) => {
// Log error but don't show toast - user will just see the UI stay in unsubmitted state
// eslint-disable-next-line no-console
console.error(t("FEEDBACK$FAILED_TO_SUBMIT"), error);
},
});
};

View File

@@ -0,0 +1,24 @@
import { useQuery } from "@tanstack/react-query";
import OpenHands from "#/api/open-hands";
import { useConversationId } from "#/hooks/use-conversation-id";
export interface FeedbackData {
exists: boolean;
rating?: number;
reason?: string;
}
export const useFeedbackExists = (eventId?: number) => {
const { conversationId } = useConversationId();
return useQuery<FeedbackData>({
queryKey: ["feedback", "exists", conversationId, eventId],
queryFn: () => {
if (!eventId) return { exists: false };
return OpenHands.checkFeedbackExists(conversationId, eventId);
},
enabled: !!eventId,
staleTime: 1000 * 60 * 5, // 5 minutes
gcTime: 1000 * 60 * 15, // 15 minutes
});
};

View File

@@ -80,7 +80,7 @@ export enum I18nKey {
ANALYTICS$CONFIRM_PREFERENCES = "ANALYTICS$CONFIRM_PREFERENCES",
SETTINGS$SAVING = "SETTINGS$SAVING",
SETTINGS$SAVE_CHANGES = "SETTINGS$SAVE_CHANGES",
SETTINGS$NAV_GIT = "SETTINGS$NAV_GIT",
SETTINGS$NAV_INTEGRATIONS = "SETTINGS$NAV_INTEGRATIONS",
SETTINGS$NAV_APPLICATION = "SETTINGS$NAV_APPLICATION",
SETTINGS$NAV_CREDITS = "SETTINGS$NAV_CREDITS",
SETTINGS$NAV_SECRETS = "SETTINGS$NAV_SECRETS",
@@ -170,10 +170,10 @@ export enum I18nKey {
GITHUB$TOKEN_LINK_TEXT = "GITHUB$TOKEN_LINK_TEXT",
GITHUB$INSTRUCTIONS_LINK_TEXT = "GITHUB$INSTRUCTIONS_LINK_TEXT",
COMMON$HERE = "COMMON$HERE",
ANALYTICS$ENABLE = "ANALYTICS$ENABLE",
GITHUB$TOKEN_INVALID = "GITHUB$TOKEN_INVALID",
BUTTON$DISCONNECT = "BUTTON$DISCONNECT",
GITHUB$CONFIGURE_REPOS = "GITHUB$CONFIGURE_REPOS",
SLACK$INSTALL_APP = "SLACK$INSTALL_APP",
COMMON$CLICK_FOR_INSTRUCTIONS = "COMMON$CLICK_FOR_INSTRUCTIONS",
LLM$SELECT_MODEL_PLACEHOLDER = "LLM$SELECT_MODEL_PLACEHOLDER",
LLM$MODEL = "LLM$MODEL",
@@ -583,4 +583,13 @@ export enum I18nKey {
SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE = "SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE",
SETTINGS$RESEND_VERIFICATION = "SETTINGS$RESEND_VERIFICATION",
SETTINGS$FAILED_TO_RESEND_VERIFICATION = "SETTINGS$FAILED_TO_RESEND_VERIFICATION",
FEEDBACK$RATE_AGENT_PERFORMANCE = "FEEDBACK$RATE_AGENT_PERFORMANCE",
FEEDBACK$SELECT_REASON = "FEEDBACK$SELECT_REASON",
FEEDBACK$SELECT_REASON_COUNTDOWN = "FEEDBACK$SELECT_REASON_COUNTDOWN",
FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION = "FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION",
FEEDBACK$REASON_FORGOT_CONTEXT = "FEEDBACK$REASON_FORGOT_CONTEXT",
FEEDBACK$REASON_UNNECESSARY_CHANGES = "FEEDBACK$REASON_UNNECESSARY_CHANGES",
FEEDBACK$REASON_OTHER = "FEEDBACK$REASON_OTHER",
FEEDBACK$THANK_YOU_FOR_FEEDBACK = "FEEDBACK$THANK_YOU_FOR_FEEDBACK",
FEEDBACK$FAILED_TO_SUBMIT = "FEEDBACK$FAILED_TO_SUBMIT",
}

View File

@@ -1279,21 +1279,21 @@
"de": "Änderungen speichern",
"uk": "Зберегти зміни"
},
"SETTINGS$NAV_GIT": {
"en": "Git",
"ja": "Git",
"zh-CN": "Git",
"zh-TW": "Git",
"ko-KR": "Git",
"no": "Git",
"it": "Git",
"pt": "Git",
"es": "Git",
"ar": "Git",
"fr": "Git",
"tr": "Git",
"de": "Git",
"uk": "Git"
"SETTINGS$NAV_INTEGRATIONS": {
"en": "Integrations",
"ja": "統合",
"zh-CN": "集成",
"zh-TW": "整合",
"ko-KR": "통합",
"no": "Integrasjoner",
"it": "Integrazioni",
"pt": "Integrações",
"es": "Integraciones",
"ar": "التكامل",
"fr": "Intégrations",
"tr": "Entegrasyonlar",
"de": "Integrationen",
"uk": "Інтеграції"
},
"SETTINGS$NAV_APPLICATION": {
"en": "Application",
@@ -2719,22 +2719,6 @@
"de": "Hier",
"uk": "тут"
},
"ANALYTICS$ENABLE": {
"en": "Enable analytics",
"ja": "アナリティクスを有効にする",
"zh-CN": "启用分析",
"zh-TW": "啟用分析功能",
"ko-KR": "분석 활성화",
"no": "Aktiver analyse",
"it": "Abilita analisi",
"pt": "Ativar análise",
"es": "Habilitar análisis",
"ar": "تمكين التحليلات",
"fr": "Activer les analyses",
"tr": "Analitiği etkinleştir",
"de": "Analyse aktivieren",
"uk": "Увімкнути аналітику"
},
"GITHUB$TOKEN_INVALID": {
"en": "Invalid GitHub token",
"ja": "GitHubトークンが無効です",
@@ -2783,6 +2767,22 @@
"de": "GitHub-Repositories konfigurieren",
"uk": "Налаштування репозиторіїв Github"
},
"SLACK$INSTALL_APP": {
"en": "Install OpenHands Slack App",
"ja": "OpenHands Slackアプリをインストール",
"zh-CN": "安装 OpenHands Slack 应用",
"zh-TW": "安裝 OpenHands Slack 應用程式",
"ko-KR": "OpenHands Slack 앱 설치",
"no": "Installer OpenHands Slack-app",
"it": "Installa l'app Slack di OpenHands",
"pt": "Instalar aplicativo Slack do OpenHands",
"es": "Instalar aplicación Slack de OpenHands",
"ar": "تثبيت تطبيق OpenHands Slack",
"fr": "Installer l'application Slack OpenHands",
"tr": "OpenHands Slack uygulamasını yükle",
"de": "OpenHands Slack-App installieren",
"uk": "Встановити додаток OpenHands Slack"
},
"COMMON$CLICK_FOR_INSTRUCTIONS": {
"en": "Click here for instructions",
"ja": "手順はこちらをクリック",
@@ -9326,5 +9326,149 @@
"tr": "Doğrulama e-postası yeniden gönderilemedi",
"de": "Bestätigungs-E-Mail konnte nicht erneut gesendet werden",
"uk": "Не вдалося повторно надіслати лист підтвердження"
},
"FEEDBACK$RATE_AGENT_PERFORMANCE": {
"en": "Rate the agent's performance:",
"ja": "エージェントのパフォーマンスを評価してください:",
"zh-CN": "评价代理的表现:",
"zh-TW": "評價代理的表現:",
"ko-KR": "에이전트의 성능을 평가하세요:",
"no": "Vurder agentens ytelse:",
"it": "Valuta le prestazioni dell'agente:",
"pt": "Avalie o desempenho do agente:",
"es": "Evalúe el rendimiento del agente:",
"ar": "قيم أداء الوكيل:",
"fr": "Évaluez la performance de l'agent :",
"tr": "Ajanın performansını değerlendirin:",
"de": "Bewerten Sie die Leistung des Agenten:",
"uk": "Оцініть продуктивність агента:"
},
"FEEDBACK$SELECT_REASON": {
"en": "Select a reason (optional):",
"ja": "理由を選択してください(任意):",
"zh-CN": "选择原因(可选):",
"zh-TW": "選擇原因(可選):",
"ko-KR": "이유 선택 (선택 사항):",
"no": "Velg en grunn (valgfritt):",
"it": "Seleziona un motivo (opzionale):",
"pt": "Selecione um motivo (opcional):",
"es": "Seleccione un motivo (opcional):",
"ar": "حدد سببًا (اختياري):",
"fr": "Sélectionnez une raison (facultatif) :",
"tr": "Bir neden seçin (isteğe bağlı):",
"de": "Wählen Sie einen Grund (optional):",
"uk": "Виберіть причину (необов'язково):"
},
"FEEDBACK$SELECT_REASON_COUNTDOWN": {
"en": "Auto-submitting in {{countdown}} seconds...",
"ja": "{{countdown}}秒後に自動送信されます...",
"zh-CN": "{{countdown}}秒后自动提交...",
"zh-TW": "{{countdown}}秒後自動提交...",
"ko-KR": "{{countdown}}초 후 자동 제출...",
"no": "Sender automatisk om {{countdown}} sekunder...",
"it": "Invio automatico tra {{countdown}} secondi...",
"pt": "Enviando automaticamente em {{countdown}} segundos...",
"es": "Enviando automáticamente en {{countdown}} segundos...",
"ar": "الإرسال التلقائي خلال {{countdown}} ثانية...",
"fr": "Envoi automatique dans {{countdown}} secondes...",
"tr": "{{countdown}} saniye içinde otomatik gönderilecek...",
"de": "Automatische Übermittlung in {{countdown}} Sekunden...",
"uk": "Автоматична відправка через {{countdown}} секунд..."
},
"FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION": {
"en": "The agent misunderstood my instruction",
"ja": "エージェントは私の指示を誤解しました",
"zh-CN": "代理误解了我的指示",
"zh-TW": "代理誤解了我的指示",
"ko-KR": "에이전트가 내 지시를 잘못 이해했습니다",
"no": "Agenten misforsto instruksjonene mine",
"it": "L'agente ha frainteso le mie istruzioni",
"pt": "O agente não entendeu minhas instruções",
"es": "El agente malinterpretó mis instrucciones",
"ar": "أساء الوكيل فهم تعليماتي",
"fr": "L'agent a mal compris mes instructions",
"tr": "Ajan talimatlarımı yanlış anladı",
"de": "Der Agent hat meine Anweisungen missverstanden",
"uk": "Агент неправильно зрозумів мої інструкції"
},
"FEEDBACK$REASON_FORGOT_CONTEXT": {
"en": "The agent forgot about the earlier context",
"ja": "エージェントは以前のコンテキストを忘れました",
"zh-CN": "代理忘记了之前的上下文",
"zh-TW": "代理忘記了之前的上下文",
"ko-KR": "에이전트가 이전 컨텍스트를 잊었습니다",
"no": "Agenten glemte den tidligere konteksten",
"it": "L'agente ha dimenticato il contesto precedente",
"pt": "O agente esqueceu o contexto anterior",
"es": "El agente olvidó el contexto anterior",
"ar": "نسي الوكيل السياق السابق",
"fr": "L'agent a oublié le contexte précédent",
"tr": "Ajan önceki bağlamı unuttu",
"de": "Der Agent hat den früheren Kontext vergessen",
"uk": "Агент забув про попередній контекст"
},
"FEEDBACK$REASON_UNNECESSARY_CHANGES": {
"en": "The agent made unnecessary changes",
"ja": "エージェントは不要な変更を行いました",
"zh-CN": "代理进行了不必要的更改",
"zh-TW": "代理進行了不必要的更改",
"ko-KR": "에이전트가 불필요한 변경을 했습니다",
"no": "Agenten gjorde unødvendige endringer",
"it": "L'agente ha apportato modifiche non necessarie",
"pt": "O agente fez alterações desnecessárias",
"es": "El agente hizo cambios innecesarios",
"ar": "قام الوكيل بتغييرات غير ضرورية",
"fr": "L'agent a apporté des modifications inutiles",
"tr": "Ajan gereksiz değişiklikler yaptı",
"de": "Der Agent hat unnötige Änderungen vorgenommen",
"uk": "Агент зробив непотрібні зміни"
},
"FEEDBACK$REASON_OTHER": {
"en": "Other",
"ja": "その他",
"zh-CN": "其他",
"zh-TW": "其他",
"ko-KR": "기타",
"no": "Annet",
"it": "Altro",
"pt": "Outro",
"es": "Otro",
"ar": "أخرى",
"fr": "Autre",
"tr": "Diğer",
"de": "Andere",
"uk": "Інше"
},
"FEEDBACK$THANK_YOU_FOR_FEEDBACK": {
"en": "Thank you for your feedback! This will help us improve OpenHands going forward.",
"ja": "フィードバックをありがとうございますこれにより、今後OpenHandsを改善していくことができます。",
"zh-CN": "感谢您的反馈这将帮助我们改进OpenHands。",
"zh-TW": "感謝您的反饋這將幫助我們改進OpenHands。",
"ko-KR": "피드백 감사합니다! 이를 통해 OpenHands를 개선해 나가겠습니다.",
"no": "Takk for tilbakemeldingen! Dette vil hjelpe oss med å forbedre OpenHands fremover.",
"it": "Grazie per il tuo feedback! Questo ci aiuterà a migliorare OpenHands in futuro.",
"pt": "Obrigado pelo seu feedback! Isso nos ajudará a melhorar o OpenHands no futuro.",
"es": "¡Gracias por su comentario! Esto nos ayudará a mejorar OpenHands en el futuro.",
"ar": "شكرا على ملاحظاتك! سيساعدنا هذا في تحسين OpenHands في المستقبل.",
"fr": "Merci pour votre retour ! Cela nous aidera à améliorer OpenHands à l'avenir.",
"tr": "Geri bildiriminiz için teşekkürler! Bu, OpenHands'i ileride geliştirmemize yardımcı olacak.",
"de": "Vielen Dank für Ihr Feedback! Das hilft uns, OpenHands in Zukunft zu verbessern.",
"uk": "Дякуємо за ваш відгук! Це допоможе нам покращити OpenHands у майбутньому."
},
"FEEDBACK$FAILED_TO_SUBMIT": {
"en": "Failed to submit feedback",
"ja": "フィードバックの送信に失敗しました",
"zh-CN": "提交反馈失败",
"zh-TW": "提交反饋失敗",
"ko-KR": "피드백 제출 실패",
"no": "Kunne ikke sende tilbakemelding",
"it": "Impossibile inviare feedback",
"pt": "Falha ao enviar feedback",
"es": "Error al enviar comentarios",
"ar": "فشل في تقديم التعليقات",
"fr": "Échec de l'envoi des commentaires",
"tr": "Geri bildirim gönderilemedi",
"de": "Feedback konnte nicht gesendet werden",
"uk": "Не вдалося надіслати відгук"
}
}

View File

@@ -13,7 +13,7 @@ export default [
index("routes/llm-settings.tsx"),
route("mcp", "routes/mcp-settings.tsx"),
route("user", "routes/user-settings.tsx"),
route("git", "routes/git-settings.tsx"),
route("integrations", "routes/git-settings.tsx"),
route("app", "routes/app-settings.tsx"),
route("billing", "routes/billing.tsx"),
route("secrets", "routes/secrets-settings.tsx"),

View File

@@ -139,7 +139,7 @@ function AppSettingsScreen() {
defaultIsToggled={!!settings.USER_CONSENTS_TO_ANALYTICS}
onToggle={checkIfAnalyticsSwitchHasChanged}
>
{t(I18nKey.ANALYTICS$ENABLE)}
{t(I18nKey.ANALYTICS$SEND_ANONYMOUS_DATA)}
</SettingsSwitch>
<SettingsSwitch

View File

@@ -7,6 +7,7 @@ import { useLogout } from "#/hooks/mutation/use-logout";
import { GitHubTokenInput } from "#/components/features/settings/git-settings/github-token-input";
import { GitLabTokenInput } from "#/components/features/settings/git-settings/gitlab-token-input";
import { ConfigureGitHubRepositoriesAnchor } from "#/components/features/settings/git-settings/configure-github-repositories-anchor";
import { InstallSlackAppAnchor } from "#/components/features/settings/git-settings/install-slack-app-anchor";
import { I18nKey } from "#/i18n/declaration";
import {
displayErrorToast,
@@ -103,6 +104,10 @@ function GitSettingsScreen() {
<ConfigureGitHubRepositoriesAnchor slug={config.APP_SLUG!} />
)}
{shouldRenderExternalConfigureButtons && !isLoading && (
<InstallSlackAppAnchor />
)}
{!isSaas && (
<GitHubTokenInput
name="github-token-input"

View File

@@ -84,7 +84,11 @@ function SecretsSettingsScreen() {
)}
{shouldRenderConnectToGitButton && (
<Link to="/settings/git" data-testid="connect-git-button" type="button">
<Link
to="/settings/integrations"
data-testid="connect-git-button"
type="button"
>
<BrandButton type="button" variant="secondary">
Connect a Git provider to manage secrets
</BrandButton>

View File

@@ -16,7 +16,7 @@ function SettingsScreen() {
const saasNavItems = [
{ to: "/settings/user", text: t("SETTINGS$NAV_USER") },
{ to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
{ to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
{ to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
{ to: "/settings/billing", text: t("SETTINGS$NAV_CREDITS") },
{ to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
@@ -26,7 +26,7 @@ function SettingsScreen() {
const ossNavItems = [
{ to: "/settings", text: t("SETTINGS$NAV_LLM") },
{ to: "/settings/mcp", text: t("SETTINGS$NAV_MCP") },
{ to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
{ to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
{ to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
{ to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
];

View File

@@ -125,9 +125,9 @@ class BrowsingAgent(Agent):
self.reset()
def reset(self) -> None:
"""Resets the Browsing Agent."""
"""Resets the Browsing Agent's internal state."""
super().reset()
self.cost_accumulator = 0
# Reset agent-specific counters but not LLM metrics
self.error_accumulator = 0
def step(self, state: State) -> Action:

View File

@@ -136,8 +136,9 @@ class CodeActAgent(Agent):
return tools
def reset(self) -> None:
"""Resets the CodeAct Agent."""
"""Resets the CodeAct Agent's internal state."""
super().reset()
# Only clear pending actions, not LLM metrics
self.pending_actions.clear()
def step(self, state: State) -> 'Action':

View File

@@ -119,14 +119,14 @@ class DummyAgent(Agent):
]
def step(self, state: State) -> Action:
if state.iteration >= len(self.steps):
if state.iteration_flag.current_value >= len(self.steps):
return AgentFinishAction()
current_step = self.steps[state.iteration]
current_step = self.steps[state.iteration_flag.current_value]
action = current_step['action']
if state.iteration > 0:
prev_step = self.steps[state.iteration - 1]
if state.iteration_flag.current_value > 0:
prev_step = self.steps[state.iteration_flag.current_value - 1]
if 'observations' in prev_step and prev_step['observations']:
expected_observations = prev_step['observations']

View File

@@ -176,9 +176,9 @@ Note:
self.reset()
def reset(self) -> None:
"""Resets the VisualBrowsingAgent."""
"""Resets the VisualBrowsingAgent's internal state."""
super().reset()
self.cost_accumulator = 0
# Reset agent-specific counters but not LLM metrics
self.error_accumulator = 0
def step(self, state: State) -> Action:

View File

@@ -8,6 +8,7 @@ from prompt_toolkit.formatted_text import HTML
from prompt_toolkit.shortcuts import clear
import openhands.agenthub # noqa F401 (we import this to get the agents registered)
import openhands.cli.suppress_warnings # noqa: F401
from openhands.cli.commands import (
check_folder_security_agreement,
handle_commands,
@@ -273,9 +274,9 @@ async def run_session(
)
)
config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
await add_mcp_tools_to_agent(agent, runtime, memory, config)
await add_mcp_tools_to_agent(agent, runtime, memory)
# Clear loading animation
is_loaded.set()

View File

@@ -0,0 +1,10 @@
"""Module to suppress common warnings."""
import warnings
# Suppress pydub warning about ffmpeg/avconv
warnings.filterwarnings(
'ignore',
message="Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work",
category=RuntimeWarning,
)

View File

@@ -103,16 +103,10 @@ class Agent(ABC):
pass
def reset(self) -> None:
"""Resets the agent's execution status and clears the history. This method can be used
to prepare the agent for restarting the instruction or cleaning up before destruction.
"""
# TODO clear history
"""Resets the agent's execution status."""
# Only reset the completion status, not the LLM metrics
self._complete = False
if self.llm:
self.llm.reset()
@property
def name(self) -> str:
return self.__class__.__name__

View File

@@ -7,7 +7,6 @@ import time
import traceback
from typing import Callable
import litellm # noqa
from litellm.exceptions import ( # noqa
APIConnectionError,
APIError,
@@ -25,7 +24,8 @@ from litellm.exceptions import ( # noqa
from openhands.controller.agent import Agent
from openhands.controller.replay import ReplayManager
from openhands.controller.state.state import State, TrafficControlState
from openhands.controller.state.state import State
from openhands.controller.state.state_tracker import StateTracker
from openhands.controller.stuck import StuckDetector
from openhands.core.config import AgentConfig, LLMConfig
from openhands.core.exceptions import (
@@ -61,7 +61,6 @@ from openhands.events.action import (
)
from openhands.events.action.agent import CondensationAction, RecallAction
from openhands.events.event import Event
from openhands.events.event_filter import EventFilter
from openhands.events.observation import (
AgentDelegateObservation,
AgentStateChangedObservation,
@@ -69,10 +68,11 @@ from openhands.events.observation import (
NullObservation,
Observation,
)
from openhands.events.serialization.event import event_to_trajectory, truncate_content
from openhands.events.serialization.event import truncate_content
from openhands.llm.llm import LLM
from openhands.llm.metrics import Metrics, TokenUsage
from openhands.memory.view import View
from openhands.storage.files import FileStore
# note: RESUME is only available on web GUI
TRAFFIC_CONTROL_REMINDER = (
@@ -101,11 +101,13 @@ class AgentController:
self,
agent: Agent,
event_stream: EventStream,
max_iterations: int,
max_budget_per_task: float | None = None,
iteration_delta: int,
budget_per_task_delta: float | None = None,
agent_to_llm_config: dict[str, LLMConfig] | None = None,
agent_configs: dict[str, AgentConfig] | None = None,
sid: str | None = None,
file_store: FileStore | None = None,
user_id: str | None = None,
confirmation_mode: bool = False,
initial_state: State | None = None,
is_delegate: bool = False,
@@ -132,7 +134,10 @@ class AgentController:
status_callback: Optional callback function to handle status updates.
replay_events: A list of logs to replay.
"""
self.id = sid or event_stream.sid
self.user_id = user_id
self.file_store = file_store
self.agent = agent
self.headless_mode = headless_mode
self.is_delegate = is_delegate
@@ -146,29 +151,22 @@ class AgentController:
EventStreamSubscriber.AGENT_CONTROLLER, self.on_event, self.id
)
# filter out events that are not relevant to the agent
# so they will not be included in the agent history
self.agent_history_filter = EventFilter(
exclude_types=(
NullAction,
NullObservation,
ChangeAgentStateAction,
AgentStateChangedObservation,
),
exclude_hidden=True,
)
self.state_tracker = StateTracker(sid, file_store, user_id)
# state from the previous session, state from a parent agent, or a fresh state
self.set_initial_state(
state=initial_state,
max_iterations=max_iterations,
max_iterations=iteration_delta,
max_budget_per_task=budget_per_task_delta,
confirmation_mode=confirmation_mode,
)
self.max_budget_per_task = max_budget_per_task
self.state = self.state_tracker.state # TODO: share between manager and controller for backward compatability; we should ideally move all state related logic to the state manager
self.agent_to_llm_config = agent_to_llm_config if agent_to_llm_config else {}
self.agent_configs = agent_configs if agent_configs else {}
self._initial_max_iterations = max_iterations
self._initial_max_budget_per_task = max_budget_per_task
self._initial_max_iterations = iteration_delta
self._initial_max_budget_per_task = budget_per_task_delta
# stuck helper
self._stuck_detector = StuckDetector(self.state)
@@ -181,7 +179,7 @@ class AgentController:
self._add_system_message()
def _add_system_message(self):
for event in self.event_stream.get_events(start_id=self.state.start_id):
for event in self.event_stream.search_events(start_id=self.state.start_id):
if isinstance(event, MessageAction) and event.source == EventSource.USER:
# FIXME: Remove this after 6/1/2025
# Do not try to add a system message if we first run into
@@ -214,26 +212,7 @@ class AgentController:
if set_stop_state:
await self.set_agent_state_to(AgentState.STOPPED)
# we made history, now is the time to rewrite it!
# the final state.history will be used by external scripts like evals, tests, etc.
# history will need to be complete WITH delegates events
# like the regular agent history, it does not include:
# - 'hidden' events, events with hidden=True
# - backend events (the default 'filtered out' types, types in self.filter_out)
start_id = self.state.start_id if self.state.start_id >= 0 else 0
end_id = (
self.state.end_id
if self.state.end_id >= 0
else self.event_stream.get_latest_event_id()
)
self.state.history = list(
self.event_stream.search_events(
start_id=start_id,
end_id=end_id,
reverse=False,
filter=self.agent_history_filter,
)
)
self.state_tracker.close(self.event_stream)
# unsubscribe from the event stream
# only the root parent controller subscribes to the event stream
@@ -257,14 +236,6 @@ class AgentController:
extra_merged = {'session_id': self.id, **extra}
getattr(logger, level)(message, extra=extra_merged, stacklevel=2)
def update_state_before_step(self) -> None:
self.state.iteration += 1
self.state.local_iteration += 1
async def update_state_after_step(self) -> None:
# update metrics especially for cost. Use deepcopy to avoid it being modified by agent._reset()
self.state.local_metrics = copy.deepcopy(self.agent.llm.metrics)
async def _react_to_exception(
self,
e: Exception,
@@ -390,10 +361,17 @@ class AgentController:
# If we have a delegate that is not finished or errored, forward events to it
if self.delegate is not None:
delegate_state = self.delegate.get_agent_state()
if delegate_state not in (
AgentState.FINISHED,
AgentState.ERROR,
AgentState.REJECTED,
if (
delegate_state
not in (
AgentState.FINISHED,
AgentState.ERROR,
AgentState.REJECTED,
)
or 'RuntimeError: Agent reached maximum iteration.'
in self.delegate.state.last_error
or 'RuntimeError:Agent reached maximum budget for conversation'
in self.delegate.state.last_error
):
# Forward the event to delegate and skip parent processing
asyncio.get_event_loop().run_until_complete(
@@ -412,9 +390,7 @@ class AgentController:
if hasattr(event, 'hidden') and event.hidden:
return
# if the event is not filtered out, add it to the history
if self.agent_history_filter.include(event):
self.state.history.append(event)
self.state_tracker.add_history(event)
if isinstance(event, Action):
await self._handle_action(event)
@@ -457,11 +433,9 @@ class AgentController:
elif isinstance(action, AgentFinishAction):
self.state.outputs = action.outputs
self.state.metrics.merge(self.state.local_metrics)
await self.set_agent_state_to(AgentState.FINISHED)
elif isinstance(action, AgentRejectAction):
self.state.outputs = action.outputs
self.state.metrics.merge(self.state.local_metrics)
await self.set_agent_state_to(AgentState.REJECTED)
async def _handle_observation(self, observation: Observation) -> None:
@@ -481,8 +455,10 @@ class AgentController:
log_level, str(observation_to_print), extra={'msg_type': 'OBSERVATION'}
)
# TODO: these metrics come from the draft editor, and they get accumulated into controller's state metrics and the agent's llm metrics
# In the future, we should have a more principled way to sharing metrics across all LLM instances for a given conversation
if observation.llm_metrics is not None:
self.agent.llm.metrics.merge(observation.llm_metrics)
self.state_tracker.merge_metrics(observation.llm_metrics)
# this happens for runnable actions and microagent actions
if self._pending_action and self._pending_action.id == observation.cause:
@@ -496,9 +472,6 @@ class AgentController:
if self.state.agent_state == AgentState.USER_REJECTED:
await self.set_agent_state_to(AgentState.AWAITING_USER_INPUT)
return
elif isinstance(observation, ErrorObservation):
if self.state.agent_state == AgentState.ERROR:
self.state.metrics.merge(self.state.local_metrics)
async def _handle_message_action(self, action: MessageAction) -> None:
"""Handles message actions from the event stream.
@@ -516,22 +489,6 @@ class AgentController:
str(action),
extra={'msg_type': 'ACTION', 'event_source': EventSource.USER},
)
# Extend max iterations when the user sends a message (only in non-headless mode)
if self._initial_max_iterations is not None and not self.headless_mode:
self.state.max_iterations = (
self.state.iteration + self._initial_max_iterations
)
if (
self.state.traffic_control_state == TrafficControlState.THROTTLING
or self.state.traffic_control_state == TrafficControlState.PAUSED
):
self.state.traffic_control_state = TrafficControlState.NORMAL
self.log(
'debug',
f'Extended max iterations to {self.state.max_iterations} after user message',
)
# try to retrieve microagents relevant to the user message
# set pending_action while we search for information
# if this is the first user message for this agent, matters for the microagent info type
first_user_message = self._first_user_message()
@@ -605,36 +562,16 @@ class AgentController:
return
if new_state in (AgentState.STOPPED, AgentState.ERROR):
# sync existing metrics BEFORE resetting the agent
await self.update_state_after_step()
self.state.metrics.merge(self.state.local_metrics)
self._reset()
elif (
new_state == AgentState.RUNNING
and self.state.agent_state == AgentState.PAUSED
# TODO: do we really need both THROTTLING and PAUSED states, or can we clean up one of them completely?
and self.state.traffic_control_state == TrafficControlState.THROTTLING
):
# user intends to interrupt traffic control and let the task resume temporarily
self.state.traffic_control_state = TrafficControlState.PAUSED
# User has chosen to deliberately continue - lets double the max iterations
if (
self.state.iteration is not None
and self.state.max_iterations is not None
and self._initial_max_iterations is not None
and not self.headless_mode
):
if self.state.iteration >= self.state.max_iterations:
self.state.max_iterations += self._initial_max_iterations
if (
self.state.metrics.accumulated_cost is not None
and self.max_budget_per_task is not None
and self._initial_max_budget_per_task is not None
):
if self.state.metrics.accumulated_cost >= self.max_budget_per_task:
self.max_budget_per_task += self._initial_max_budget_per_task
elif self._pending_action is not None and (
# User is allowing to check control limits and expand them if applicable
if (
self.state.agent_state == AgentState.ERROR
and new_state == AgentState.RUNNING
):
self.state_tracker.maybe_increase_control_flags_limits(self.headless_mode)
if self._pending_action is not None and (
new_state in (AgentState.USER_CONFIRMED, AgentState.USER_REJECTED)
):
if hasattr(self._pending_action, 'thought'):
@@ -659,6 +596,10 @@ class AgentController:
EventSource.ENVIRONMENT,
)
# Save state whenever agent state changes to ensure we don't lose state
# in case of crashes or unexpected circumstances
self.save_state()
def get_agent_state(self) -> AgentState:
"""Returns the current state of the agent.
@@ -686,19 +627,27 @@ class AgentController:
agent_cls: type[Agent] = Agent.get_cls(action.agent)
agent_config = self.agent_configs.get(action.agent, self.agent.config)
llm_config = self.agent_to_llm_config.get(action.agent, self.agent.llm.config)
llm = LLM(config=llm_config, retry_listener=self._notify_on_llm_retry)
# Make sure metrics are shared between parent and child for global accumulation
llm = LLM(
config=llm_config,
retry_listener=self.agent.llm.retry_listener,
metrics=self.state.metrics,
)
delegate_agent = agent_cls(llm=llm, config=agent_config)
# Take a snapshot of the current metrics before starting the delegate
state = State(
session_id=self.id.removesuffix('-delegate'),
inputs=action.inputs or {},
local_iteration=0,
iteration=self.state.iteration,
max_iterations=self.state.max_iterations,
iteration_flag=self.state.iteration_flag,
budget_flag=self.state.budget_flag,
delegate_level=self.state.delegate_level + 1,
# global metrics should be shared between parent and child
metrics=self.state.metrics,
# start on top of the stream
start_id=self.event_stream.get_latest_event_id() + 1,
parent_metrics_snapshot=self.state_tracker.get_metrics_snapshot(),
parent_iteration=self.state.iteration_flag.current_value,
)
self.log(
'debug',
@@ -708,10 +657,12 @@ class AgentController:
# Create the delegate with is_delegate=True so it does NOT subscribe directly
self.delegate = AgentController(
sid=self.id + '-delegate',
file_store=self.file_store,
user_id=self.user_id,
agent=delegate_agent,
event_stream=self.event_stream,
max_iterations=self.state.max_iterations,
max_budget_per_task=self.max_budget_per_task,
iteration_delta=self._initial_max_iterations,
budget_per_task_delta=self._initial_max_budget_per_task,
agent_to_llm_config=self.agent_to_llm_config,
agent_configs=self.agent_configs,
initial_state=state,
@@ -730,7 +681,13 @@ class AgentController:
delegate_state = self.delegate.get_agent_state()
# update iteration that is shared across agents
self.state.iteration = self.delegate.state.iteration
self.state.iteration_flag.current_value = (
self.delegate.state.iteration_flag.current_value
)
# Calculate delegate-specific metrics before closing the delegate
delegate_metrics = self.state.get_local_metrics()
logger.info(f'Local metrics for delegate: {delegate_metrics}')
# close the delegate controller before adding new events
asyncio.get_event_loop().run_until_complete(self.delegate.close())
@@ -743,8 +700,12 @@ class AgentController:
# prepare delegate result observation
# TODO: replace this with AI-generated summary (#2395)
# Filter out metrics from the formatted output to avoid clutter
display_outputs = {
k: v for k, v in delegate_outputs.items() if k != 'metrics'
}
formatted_output = ', '.join(
f'{key}: {value}' for key, value in delegate_outputs.items()
f'{key}: {value}' for key, value in display_outputs.items()
)
content = (
f'{self.delegate.agent.name} finishes task with {formatted_output}'
@@ -798,24 +759,16 @@ class AgentController:
self.log(
'debug',
f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.local_iteration} GLOBAL STEP {self.state.iteration}',
f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.get_local_step()} GLOBAL STEP {self.state.iteration_flag.current_value}',
extra={'msg_type': 'STEP'},
)
stop_step = False
if self.state.iteration >= self.state.max_iterations:
stop_step = await self._handle_traffic_control(
'iteration', self.state.iteration, self.state.max_iterations
)
if self.max_budget_per_task is not None:
current_cost = self.state.metrics.accumulated_cost
if current_cost > self.max_budget_per_task:
stop_step = await self._handle_traffic_control(
'budget', current_cost, self.max_budget_per_task
)
if stop_step:
logger.warning('Stopping agent due to traffic control')
return
# Ensure budget control flag is synchronized with the latest metrics.
# In the future, we should centralized the use of one LLM object per conversation.
# This will help us unify the cost for auto generating titles, running the condensor, etc.
# Before many microservices will touh the same llm cost field, we should sync with the budget flag for the controller
# and check that we haven't exceeded budget BEFORE executing an agent step.
self.state_tracker.sync_budget_flag_with_metrics()
if self._is_stuck():
await self._react_to_exception(
@@ -823,7 +776,13 @@ class AgentController:
)
return
self.update_state_before_step()
try:
self.state_tracker.run_control_flags()
except Exception as e:
logger.warning('Control flag limits hit')
await self._react_to_exception(e)
return
action: Action = NullAction()
if self._replay_manager.should_replay():
@@ -894,60 +853,9 @@ class AgentController:
self.event_stream.add_event(action, action._source) # type: ignore [attr-defined]
await self.update_state_after_step()
log_level = 'info' if LOG_ALL_EVENTS else 'debug'
self.log(log_level, str(action), extra={'msg_type': 'ACTION'})
def _notify_on_llm_retry(self, retries: int, max: int) -> None:
if self.status_callback is not None:
msg_id = 'STATUS$LLM_RETRY'
self.status_callback(
'info', msg_id, f'Retrying LLM request, {retries} / {max}'
)
async def _handle_traffic_control(
self, limit_type: str, current_value: float, max_value: float
) -> bool:
"""Handles agent state after hitting the traffic control limit.
Args:
limit_type (str): The type of limit that was hit.
current_value (float): The current value of the limit.
max_value (float): The maximum value of the limit.
"""
stop_step = False
if self.state.traffic_control_state == TrafficControlState.PAUSED:
self.log(
'debug', 'Hitting traffic control, temporarily resume upon user request'
)
self.state.traffic_control_state = TrafficControlState.NORMAL
else:
self.state.traffic_control_state = TrafficControlState.THROTTLING
# Format values as integers for iterations, keep decimals for budget
if limit_type == 'iteration':
current_str = str(int(current_value))
max_str = str(int(max_value))
else:
current_str = f'{current_value:.2f}'
max_str = f'{max_value:.2f}'
if self.headless_mode:
e = RuntimeError(
f'Agent reached maximum {limit_type} in headless mode. '
f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}'
)
await self._react_to_exception(e)
else:
e = RuntimeError(
f'Agent reached maximum {limit_type}. '
f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}. '
)
# FIXME: this isn't really an exception--we should have a different path
await self._react_to_exception(e)
stop_step = True
return stop_step
@property
def _pending_action(self) -> Action | None:
"""Get the current pending action with time tracking.
@@ -1015,150 +923,26 @@ class AgentController:
self,
state: State | None,
max_iterations: int,
max_budget_per_task: float | None,
confirmation_mode: bool = False,
) -> None:
"""Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
Args:
state: The state to initialize with, or None to create a new state.
max_iterations: The maximum number of iterations allowed for the task.
confirmation_mode: Whether to enable confirmation mode.
"""
# state can come from:
# - the previous session, in which case it has history
# - from a parent agent, in which case it has no history
# - None / a new state
# If state is None, we create a brand new state and still load the event stream so we can restore the history
if state is None:
self.state = State(
session_id=self.id.removesuffix('-delegate'),
inputs={},
max_iterations=max_iterations,
confirmation_mode=confirmation_mode,
)
self.state.start_id = 0
self.log(
'info',
f'AgentController {self.id} - created new state. start_id: {self.state.start_id}',
)
else:
self.state = state
if self.state.start_id <= -1:
self.state.start_id = 0
self.log(
'info',
f'AgentController {self.id} initializing history from event {self.state.start_id}',
)
):
self.state_tracker.set_initial_state(
self.id,
self.agent,
state,
max_iterations,
max_budget_per_task,
confirmation_mode,
)
# Always load from the event stream to avoid losing history
self._init_history()
self.state_tracker._init_history(
self.event_stream,
)
def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
# state history could be partially hidden/truncated before controller is closed
assert self._closed
return [
event_to_trajectory(event, include_screenshots)
for event in self.state.history
]
def _init_history(self) -> None:
"""Initializes the agent's history from the event stream.
The history is a list of events that:
- Excludes events of types listed in self.filter_out
- Excludes events with hidden=True attribute
- For delegate events (between AgentDelegateAction and AgentDelegateObservation):
- Excludes all events between the action and observation
- Includes the delegate action and observation themselves
"""
# define range of events to fetch
# delegates start with a start_id and initially won't find any events
# otherwise we're restoring a previous session
start_id = self.state.start_id if self.state.start_id >= 0 else 0
end_id = (
self.state.end_id
if self.state.end_id >= 0
else self.event_stream.get_latest_event_id()
)
# sanity check
if start_id > end_id + 1:
self.log(
'warning',
f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
)
self.state.history = []
return
events: list[Event] = []
# Get rest of history
events_to_add = list(
self.event_stream.search_events(
start_id=start_id,
end_id=end_id,
reverse=False,
filter=self.agent_history_filter,
)
)
events.extend(events_to_add)
# Find all delegate action/observation pairs
delegate_ranges: list[tuple[int, int]] = []
delegate_action_ids: list[int] = [] # stack of unmatched delegate action IDs
for event in events:
if isinstance(event, AgentDelegateAction):
delegate_action_ids.append(event.id)
# Note: we can get agent=event.agent and task=event.inputs.get('task','')
# if we need to track these in the future
elif isinstance(event, AgentDelegateObservation):
# Match with most recent unmatched delegate action
if not delegate_action_ids:
self.log(
'warning',
f'Found AgentDelegateObservation without matching action at id={event.id}',
)
continue
action_id = delegate_action_ids.pop()
delegate_ranges.append((action_id, event.id))
# Filter out events between delegate action/observation pairs
if delegate_ranges:
filtered_events: list[Event] = []
current_idx = 0
for start_id, end_id in sorted(delegate_ranges):
# Add events before delegate range
filtered_events.extend(
event for event in events[current_idx:] if event.id < start_id
)
# Add delegate action and observation
filtered_events.extend(
event for event in events if event.id in (start_id, end_id)
)
# Update index to after delegate range
current_idx = next(
(i for i, e in enumerate(events) if e.id > end_id), len(events)
)
# Add any remaining events after last delegate range
filtered_events.extend(events[current_idx:])
self.state.history = filtered_events
else:
self.state.history = events
# make sure history is in sync
self.state.start_id = start_id
return self.state_tracker.get_trajectory(include_screenshots)
def _handle_long_context_error(self) -> None:
# When context window is exceeded, keep roughly half of agent interactions
@@ -1359,7 +1143,7 @@ class AgentController:
action: The action to attach metrics to
"""
# Get metrics from agent LLM
agent_metrics = self.agent.llm.metrics
agent_metrics = self.state.metrics
# Get metrics from condenser LLM if it exists
condenser_metrics: TokenUsage | None = None
@@ -1390,10 +1174,10 @@ class AgentController:
# Log the metrics information for debugging
# Get the latest usage directly from the agent's metrics
latest_usage = None
if self.agent.llm.metrics.token_usages:
latest_usage = self.agent.llm.metrics.token_usages[-1]
if self.state.metrics.token_usages:
latest_usage = self.state.metrics.token_usages[-1]
accumulated_usage = self.agent.llm.metrics.accumulated_token_usage
accumulated_usage = self.state.metrics.accumulated_token_usage
self.log(
'debug',
f'Action metrics - accumulated_cost: {metrics.accumulated_cost}, '
@@ -1432,7 +1216,7 @@ class AgentController:
)
def _is_awaiting_observation(self) -> bool:
events = self.event_stream.get_events(reverse=True)
events = self.event_stream.search_events(reverse=True)
for event in events:
if isinstance(event, AgentStateChangedObservation):
result = event.agent_state == AgentState.RUNNING
@@ -1473,7 +1257,7 @@ class AgentController:
self._cached_first_user_message = next(
(
e
for e in self.event_stream.get_events(
for e in self.event_stream.search_events(
start_id=self.state.start_id,
)
if isinstance(e, MessageAction) and e.source == EventSource.USER
@@ -1481,3 +1265,6 @@ class AgentController:
None,
)
return self._cached_first_user_message
def save_state(self):
self.state_tracker.save_state()

View File

@@ -0,0 +1,95 @@
from __future__ import annotations
from dataclasses import dataclass
from typing import Generic, TypeVar
T = TypeVar(
'T', int, float
) # Type for the value (int for iterations, float for budget)
@dataclass
class ControlFlag(Generic[T]):
"""Base class for control flags that manage limits and state transitions."""
limit_increase_amount: T
current_value: T
max_value: T
headless_mode: bool = False
_hit_limit: bool = False
def reached_limit(self) -> bool:
"""Check if the limit has been reached.
Returns:
bool: True if the limit has been reached, False otherwise.
"""
raise NotImplementedError
def increase_limit(self, headless_mode: bool) -> None:
"""Expand the limit when needed."""
raise NotImplementedError
def step(self):
"""Determine the next state based on the current state and mode.
Returns:
ControlFlagState: The next state.
"""
raise NotImplementedError
@dataclass
class IterationControlFlag(ControlFlag[int]):
"""Control flag for managing iteration limits."""
def reached_limit(self) -> bool:
"""Check if the iteration limit has been reached."""
self._hit_limit = self.current_value >= self.max_value
return self._hit_limit
def increase_limit(self, headless_mode: bool) -> None:
"""Expand the iteration limit by adding the initial value."""
if not headless_mode and self._hit_limit:
self.max_value += self.limit_increase_amount
self._hit_limit = False
def step(self):
if self.reached_limit():
raise RuntimeError(
f'Agent reached maximum iteration. '
f'Current iteration: {self.current_value}, max iteration: {self.max_value}'
)
# Increment the current value
self.current_value += 1
@dataclass
class BudgetControlFlag(ControlFlag[float]):
"""Control flag for managing budget limits."""
def reached_limit(self) -> bool:
"""Check if the budget limit has been reached."""
self._hit_limit = self.current_value >= self.max_value
return self._hit_limit
def increase_limit(self, headless_mode) -> None:
"""Expand the budget limit by adding the initial value to the current value."""
if self._hit_limit:
self.max_value = self.current_value + self.limit_increase_amount
self._hit_limit = False
def step(self):
"""Check if we've reached the limit and update state accordingly.
Note: Unlike IterationControlFlag, this doesn't increment the value
as the budget is updated externally.
"""
if self.reached_limit():
current_str = f'{self.current_value:.2f}'
max_str = f'{self.max_value:.2f}'
raise RuntimeError(
f'Agent reached maximum budget for conversation.'
f'Current budget: {current_str}, max budget: {max_str}'
)

View File

@@ -8,6 +8,10 @@ from enum import Enum
from typing import Any
import openhands
from openhands.controller.state.control_flags import (
BudgetControlFlag,
IterationControlFlag,
)
from openhands.core.logger import openhands_logger as logger
from openhands.core.schema import AgentState
from openhands.events.action import (
@@ -20,7 +24,15 @@ from openhands.memory.view import View
from openhands.storage.files import FileStore
from openhands.storage.locations import get_conversation_agent_state_filename
RESUMABLE_STATES = [
AgentState.RUNNING,
AgentState.PAUSED,
AgentState.AWAITING_USER_INPUT,
AgentState.FINISHED,
]
# NOTE: this is deprecated
class TrafficControlState(str, Enum):
# default state, no rate limiting
NORMAL = 'normal'
@@ -32,14 +44,6 @@ class TrafficControlState(str, Enum):
PAUSED = 'paused'
RESUMABLE_STATES = [
AgentState.RUNNING,
AgentState.PAUSED,
AgentState.AWAITING_USER_INPUT,
AgentState.FINISHED,
]
@dataclass
class State:
"""
@@ -75,35 +79,43 @@ class State:
"""
session_id: str = ''
# global iteration for the current task
iteration: int = 0
# local iteration for the current subtask
local_iteration: int = 0
# max number of iterations for the current task
max_iterations: int = 100
iteration_flag: IterationControlFlag = field(
default_factory=lambda: IterationControlFlag(
limit_increase_amount=100, current_value=0, max_value=100
)
)
budget_flag: BudgetControlFlag | None = None
confirmation_mode: bool = False
history: list[Event] = field(default_factory=list)
inputs: dict = field(default_factory=dict)
outputs: dict = field(default_factory=dict)
agent_state: AgentState = AgentState.LOADING
resume_state: AgentState | None = None
traffic_control_state: TrafficControlState = TrafficControlState.NORMAL
# global metrics for the current task
metrics: Metrics = field(default_factory=Metrics)
# local metrics for the current subtask
local_metrics: Metrics = field(default_factory=Metrics)
# root agent has level 0, and every delegate increases the level by one
delegate_level: int = 0
# start_id and end_id track the range of events in history
start_id: int = -1
end_id: int = -1
delegates: dict[tuple[int, int], tuple[str, str]] = field(default_factory=dict)
# NOTE: This will never be used by the controller, but it can be used by different
parent_metrics_snapshot: Metrics | None = None
parent_iteration: int = 100
# NOTE: this is used by the controller to track parent's metrics snapshot before delegation
# evaluation tasks to store extra data needed to track the progress/state of the task.
extra_data: dict[str, Any] = field(default_factory=dict)
last_error: str = ''
# NOTE: deprecated args, kept here temporarily for backwards compatability
# Will be remove in 30 days
iteration: int | None = None
local_iteration: int | None = None
max_iterations: int | None = None
traffic_control_state: TrafficControlState | None = None
local_metrics: Metrics | None = None
delegates: dict[tuple[int, int], tuple[str, str]] | None = None
def save_to_session(
self, sid: str, file_store: FileStore, user_id: str | None
) -> None:
@@ -165,6 +177,10 @@ class State:
# first state after restore
state.agent_state = AgentState.LOADING
# We don't need to clean up deprecated fields here
# They will be handled by __getstate__ when the state is saved again
return state
def __getstate__(self) -> dict:
@@ -177,15 +193,52 @@ class State:
state.pop('_history_checksum', None)
state.pop('_view', None)
# Remove deprecated fields before pickling
state.pop('iteration', None)
state.pop('local_iteration', None)
state.pop('max_iterations', None)
state.pop('traffic_control_state', None)
state.pop('local_metrics', None)
state.pop('delegates', None)
return state
def __setstate__(self, state: dict) -> None:
# Check if we're restoring from an older version (before control flags)
is_old_version = 'iteration' in state
# Convert old iteration tracking to new iteration_flag if needed
if is_old_version:
# Create iteration_flag from old values
max_iterations = state.get('max_iterations', 100)
current_iteration = state.get('iteration', 0)
# Add the iteration_flag to the state
state['iteration_flag'] = IterationControlFlag(
limit_increase_amount=max_iterations,
current_value=current_iteration,
max_value=max_iterations,
)
# Update the state
self.__dict__.update(state)
# We keep the deprecated fields for backward compatibility
# They will be removed by __getstate__ when the state is saved again
# make sure we always have the attribute history
if not hasattr(self, 'history'):
self.history = []
# Ensure we have default values for new fields if they're missing
if not hasattr(self, 'iteration_flag'):
self.iteration_flag = IterationControlFlag(
limit_increase_amount=100, current_value=0, max_value=100
)
if not hasattr(self, 'budget_flag'):
self.budget_flag = None
def get_current_user_intent(self) -> tuple[str | None, list[str] | None]:
"""Returns the latest user message and image(if provided) that appears after a FinishAction, or the first (the task) if nothing was finished yet."""
last_user_message = None
@@ -223,6 +276,17 @@ class State:
],
}
def get_local_step(self):
if not self.parent_iteration:
return self.iteration_flag.current_value
return self.iteration_flag.current_value - self.parent_iteration
def get_local_metrics(self):
if not self.parent_metrics_snapshot:
return self.metrics
return self.metrics.diff(self.parent_metrics_snapshot)
@property
def view(self) -> View:
# Compute a simple checksum from the history to see if we can re-use any

View File

@@ -0,0 +1,290 @@
from openhands.controller.agent import Agent
from openhands.controller.state.control_flags import (
BudgetControlFlag,
IterationControlFlag,
)
from openhands.controller.state.state import State
from openhands.core.logger import openhands_logger as logger
from openhands.events.action.agent import AgentDelegateAction, ChangeAgentStateAction
from openhands.events.action.empty import NullAction
from openhands.events.event import Event
from openhands.events.event_filter import EventFilter
from openhands.events.observation.agent import AgentStateChangedObservation
from openhands.events.observation.delegate import AgentDelegateObservation
from openhands.events.observation.empty import NullObservation
from openhands.events.serialization.event import event_to_trajectory
from openhands.events.stream import EventStream
from openhands.llm.metrics import Metrics
from openhands.storage.files import FileStore
class StateTracker:
"""Manages and synchronizes the state of an agent throughout its lifecycle.
It is responsible for:
1. Maintaining agent state persistence across sessions
2. Managing agent history by filtering and tracking relevant events (previously done in the agent controller)
3. Synchronizing metrics between the controller and LLM components
4. Updating control flags for budget and iteration limits
"""
def __init__(
self, sid: str | None, file_store: FileStore | None, user_id: str | None
):
self.sid = sid
self.file_store = file_store
self.user_id = user_id
# filter out events that are not relevant to the agent
# so they will not be included in the agent history
self.agent_history_filter = EventFilter(
exclude_types=(
NullAction,
NullObservation,
ChangeAgentStateAction,
AgentStateChangedObservation,
),
exclude_hidden=True,
)
def set_initial_state(
self,
id: str,
agent: Agent,
state: State | None,
max_iterations: int,
max_budget_per_task: float | None,
confirmation_mode: bool = False,
) -> None:
"""Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
Args:
state: The state to initialize with, or None to create a new state.
max_iterations: The maximum number of iterations allowed for the task.
confirmation_mode: Whether to enable confirmation mode.
"""
# state can come from:
# - the previous session, in which case it has history
# - from a parent agent, in which case it has no history
# - None / a new state
# If state is None, we create a brand new state and still load the event stream so we can restore the history
if state is None:
self.state = State(
session_id=id.removesuffix('-delegate'),
inputs={},
iteration_flag=IterationControlFlag(
limit_increase_amount=max_iterations,
current_value=0,
max_value=max_iterations,
),
budget_flag=None
if not max_budget_per_task
else BudgetControlFlag(
limit_increase_amount=max_budget_per_task,
current_value=0,
max_value=max_budget_per_task,
),
confirmation_mode=confirmation_mode,
)
self.state.start_id = 0
logger.info(
f'AgentController {id} - created new state. start_id: {self.state.start_id}'
)
else:
self.state = state
if self.state.start_id <= -1:
self.state.start_id = 0
logger.info(
f'AgentController {id} initializing history from event {self.state.start_id}',
)
# Share the state metrics with the agent's LLM metrics
# This ensures that all accumulated metrics are always in sync between controller and llm
agent.llm.metrics = self.state.metrics
def _init_history(self, event_stream: EventStream) -> None:
"""Initializes the agent's history from the event stream.
The history is a list of events that:
- Excludes events of types listed in self.filter_out
- Excludes events with hidden=True attribute
- For delegate events (between AgentDelegateAction and AgentDelegateObservation):
- Excludes all events between the action and observation
- Includes the delegate action and observation themselves
"""
# define range of events to fetch
# delegates start with a start_id and initially won't find any events
# otherwise we're restoring a previous session
start_id = self.state.start_id if self.state.start_id >= 0 else 0
end_id = (
self.state.end_id
if self.state.end_id >= 0
else event_stream.get_latest_event_id()
)
# sanity check
if start_id > end_id + 1:
logger.warning(
f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
)
self.state.history = []
return
events: list[Event] = []
# Get rest of history
events_to_add = list(
event_stream.search_events(
start_id=start_id,
end_id=end_id,
reverse=False,
filter=self.agent_history_filter,
)
)
events.extend(events_to_add)
# Find all delegate action/observation pairs
delegate_ranges: list[tuple[int, int]] = []
delegate_action_ids: list[int] = [] # stack of unmatched delegate action IDs
for event in events:
if isinstance(event, AgentDelegateAction):
delegate_action_ids.append(event.id)
# Note: we can get agent=event.agent and task=event.inputs.get('task','')
# if we need to track these in the future
elif isinstance(event, AgentDelegateObservation):
# Match with most recent unmatched delegate action
if not delegate_action_ids:
logger.warning(
f'Found AgentDelegateObservation without matching action at id={event.id}',
)
continue
action_id = delegate_action_ids.pop()
delegate_ranges.append((action_id, event.id))
# Filter out events between delegate action/observation pairs
if delegate_ranges:
filtered_events: list[Event] = []
current_idx = 0
for start_id, end_id in sorted(delegate_ranges):
# Add events before delegate range
filtered_events.extend(
event for event in events[current_idx:] if event.id < start_id
)
# Add delegate action and observation
filtered_events.extend(
event for event in events if event.id in (start_id, end_id)
)
# Update index to after delegate range
current_idx = next(
(i for i, e in enumerate(events) if e.id > end_id), len(events)
)
# Add any remaining events after last delegate range
filtered_events.extend(events[current_idx:])
self.state.history = filtered_events
else:
self.state.history = events
# make sure history is in sync
self.state.start_id = start_id
def close(self, event_stream: EventStream):
# we made history, now is the time to rewrite it!
# the final state.history will be used by external scripts like evals, tests, etc.
# history will need to be complete WITH delegates events
# like the regular agent history, it does not include:
# - 'hidden' events, events with hidden=True
# - backend events (the default 'filtered out' types, types in self.filter_out)
start_id = self.state.start_id if self.state.start_id >= 0 else 0
end_id = (
self.state.end_id
if self.state.end_id >= 0
else event_stream.get_latest_event_id()
)
self.state.history = list(
event_stream.search_events(
start_id=start_id,
end_id=end_id,
reverse=False,
filter=self.agent_history_filter,
)
)
def add_history(self, event: Event):
# if the event is not filtered out, add it to the history
if self.agent_history_filter.include(event):
self.state.history.append(event)
def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
return [
event_to_trajectory(event, include_screenshots)
for event in self.state.history
]
def maybe_increase_control_flags_limits(self, headless_mode: bool):
# Iteration and budget extensions are independent of each other
# An error will be thrown if any one of the control flags have reached or exceeded its limit
self.state.iteration_flag.increase_limit(headless_mode)
if self.state.budget_flag:
self.state.budget_flag.increase_limit(headless_mode)
def get_metrics_snapshot(self):
"""
Deep copy of metrics
This serves as a snapshot for the parent's metrics at the time a delegate is created
It will be stored and used to compute local metrics for the delegate
(since delegates now accumulate metrics from where its parent left off)
"""
return self.state.metrics.copy()
def save_state(self):
"""
Save's current state to persistent store
"""
if self.sid and self.file_store:
self.state.save_to_session(self.sid, self.file_store, self.user_id)
def run_control_flags(self):
"""
Performs one step of the control flags
"""
self.state.iteration_flag.step()
if self.state.budget_flag:
self.state.budget_flag.step()
def sync_budget_flag_with_metrics(self):
"""
Ensures that budget flag is up to date with accumulated costs from llm completions
Budget flag will monitor for when budget is exceeded
"""
if self.state.budget_flag:
self.state.budget_flag.current_value = self.state.metrics.accumulated_cost
def merge_metrics(self, metrics: Metrics):
"""
Merges metrics with the state metrics
NOTE: this should be refactored in the future. We should have services (draft llm, title autocomplete, condenser, etc)
use their own LLMs, but the metrics object should be shared. This way we have one source of truth for accumulated costs from
all services
This would prevent having fragmented stores for metrics, and we don't have the burden of deciding where and how to store them
if we decide introduce more specialized services that require llm completions
"""
self.state.metrics.merge(metrics)
if self.state.budget_flag:
self.state.budget_flag.current_value = self.state.metrics.accumulated_cost

View File

@@ -744,27 +744,6 @@ def get_parser() -> argparse.ArgumentParser:
type=bool,
default=False,
)
# LLM configuration arguments for local models
parser.add_argument(
'--llm-model',
help='LLM model to use (e.g., "lm_studio/devstral", "openai/gpt-4")',
type=str,
default=None,
)
parser.add_argument(
'--llm-base-url',
help='Base URL for LLM API (required for local models, e.g., "http://localhost:1234/v1")',
type=str,
default=None,
)
parser.add_argument(
'--llm-api-key',
help='API key for LLM (use "dummy" for local models)',
type=str,
default=None,
)
return parser
@@ -842,21 +821,6 @@ def setup_config_from_args(args: argparse.Namespace) -> OpenHandsConfig:
raise ValueError(f'Invalid toml file, cannot read {args.llm_config}')
config.set_llm_config(llm_config)
# Override LLM settings with direct CLI arguments
if args.llm_model or args.llm_base_url or args.llm_api_key:
from pydantic import SecretStr
llm_config = config.get_llm_config()
if args.llm_model:
llm_config.model = args.llm_model
if args.llm_base_url:
llm_config.base_url = args.llm_base_url
if args.llm_api_key:
llm_config.api_key = SecretStr(args.llm_api_key)
config.set_llm_config(llm_config)
# Override default agent if provided
if args.agent_cls:
config.default_agent = args.agent_cls

View File

@@ -5,6 +5,7 @@ from pathlib import Path
from typing import Callable, Protocol
import openhands.agenthub # noqa F401 (we import this to get the agents registered)
import openhands.cli.suppress_warnings # noqa: F401
from openhands.controller.agent import Agent
from openhands.controller.replay import ReplayManager
from openhands.controller.state.state import State
@@ -139,9 +140,9 @@ async def run_controller(
config.mcp_host, config, None
)
)
config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
await add_mcp_tools_to_agent(agent, runtime, memory, config)
await add_mcp_tools_to_agent(agent, runtime, memory)
replay_events: list[Event] | None = None
if config.replay_trajectory_path:

View File

@@ -206,8 +206,8 @@ def create_controller(
controller = AgentController(
agent=agent,
max_iterations=config.max_iterations,
max_budget_per_task=config.max_budget_per_task,
iteration_delta=config.max_iterations,
budget_per_task_delta=config.max_budget_per_task,
agent_to_llm_config=config.get_agent_to_llm_config_map(),
event_stream=event_stream,
initial_state=initial_state,

View File

@@ -15,8 +15,8 @@ class AsyncEventStoreWrapper:
loop = asyncio.get_running_loop()
# Create an async generator that yields events
for event in self.event_store.get_events(*self.args, **self.kwargs):
# Run the blocking get_events() in a thread pool
for event in self.event_store.search_events(*self.args, **self.kwargs):
# Run the blocking search_events() in a thread pool
def get_event(e: Event = event) -> Event:
return e

View File

@@ -140,7 +140,7 @@ class EventStore(EventStoreABC):
return self.cur_id - 1
def filtered_events_by_source(self, source: EventSource) -> Iterable[Event]:
for event in self.get_events():
for event in self.search_events():
if event.source == source:
yield event

View File

@@ -321,7 +321,7 @@ class ProviderHandler:
async def verify_repo_provider(
self, repository: str, specified_provider: ProviderType | None = None
):
) -> Repository:
if specified_provider:
try:
service = self._get_service(specified_provider)

View File

@@ -773,9 +773,6 @@ class LLM(RetryMixin, DebugMixin):
def __repr__(self) -> str:
return str(self)
def reset(self) -> None:
self.metrics.reset()
def format_messages_for_llm(self, messages: Message | list[Message]) -> list[dict]:
if isinstance(messages, Message):
messages = [messages]

View File

@@ -193,22 +193,6 @@ class Metrics:
'token_usages': [usage.model_dump() for usage in self._token_usages],
}
def reset(self) -> None:
self._accumulated_cost = 0.0
self._costs = []
self._response_latencies = []
self._token_usages = []
# Reset accumulated token usage with a new instance
self._accumulated_token_usage = TokenUsage(
model=self.model_name,
prompt_tokens=0,
completion_tokens=0,
cache_read_tokens=0,
cache_write_tokens=0,
context_window=0,
response_id='',
)
def log(self) -> str:
"""Log the metrics."""
metrics = self.get()
@@ -221,5 +205,58 @@ class Metrics:
"""Create a deep copy of the Metrics object."""
return copy.deepcopy(self)
def diff(self, baseline: 'Metrics') -> 'Metrics':
"""Calculate the difference between current metrics and a baseline.
This is useful for tracking metrics for specific operations like delegates.
Args:
baseline: A metrics object representing the baseline state
Returns:
A new Metrics object containing only the differences since the baseline
"""
result = Metrics(self.model_name)
# Calculate cost difference
result._accumulated_cost = self._accumulated_cost - baseline._accumulated_cost
# Include only costs that were added after the baseline
if baseline._costs:
last_baseline_timestamp = baseline._costs[-1].timestamp
result._costs = [
cost for cost in self._costs if cost.timestamp > last_baseline_timestamp
]
else:
result._costs = self._costs.copy()
# Include only response latencies that were added after the baseline
result._response_latencies = self._response_latencies[
len(baseline._response_latencies) :
]
# Include only token usages that were added after the baseline
result._token_usages = self._token_usages[len(baseline._token_usages) :]
# Calculate accumulated token usage difference
base_usage = baseline.accumulated_token_usage
current_usage = self.accumulated_token_usage
result._accumulated_token_usage = TokenUsage(
model=self.model_name,
prompt_tokens=current_usage.prompt_tokens - base_usage.prompt_tokens,
completion_tokens=current_usage.completion_tokens
- base_usage.completion_tokens,
cache_read_tokens=current_usage.cache_read_tokens
- base_usage.cache_read_tokens,
cache_write_tokens=current_usage.cache_write_tokens
- base_usage.cache_write_tokens,
context_window=current_usage.context_window,
per_turn_token=0,
response_id='',
)
return result
def __repr__(self) -> str:
return f'Metrics({self.get()}'

View File

@@ -10,7 +10,6 @@ from openhands.core.config.mcp_config import (
MCPSHTTPServerConfig,
MCPSSEServerConfig,
)
from openhands.core.config.openhands_config import OpenHandsConfig
from openhands.core.logger import openhands_logger as logger
from openhands.events.action.mcp import MCPAction
from openhands.events.observation.mcp import MCPObservation
@@ -187,9 +186,7 @@ async def call_tool_mcp(mcp_clients: list[MCPClient], action: MCPAction) -> Obse
)
async def add_mcp_tools_to_agent(
agent: 'Agent', runtime: Runtime, memory: 'Memory', app_config: OpenHandsConfig
):
async def add_mcp_tools_to_agent(agent: 'Agent', runtime: Runtime, memory: 'Memory'):
"""
Add MCP tools to an agent.
"""
@@ -208,7 +205,6 @@ async def add_mcp_tools_to_agent(
extra_stdio_servers = []
# Add microagent MCP tools if available
mcp_config: MCPConfig = app_config.mcp
microagent_mcp_configs = memory.get_microagent_mcp_tools()
for mcp_config in microagent_mcp_configs:
if mcp_config.sse_servers:

View File

@@ -1,6 +1,13 @@
from openhands.runtime.base import Runtime
from openhands.runtime.impl.cli.cli_runtime import CLIRuntime
from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
try:
from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
_DAYTONA_AVAILABLE = True
except ImportError:
_DAYTONA_AVAILABLE = False
DaytonaRuntime = None # type: ignore
from openhands.runtime.impl.docker.docker_runtime import (
DockerRuntime,
)
@@ -20,7 +27,7 @@ _DEFAULT_RUNTIME_CLASSES: dict[str, type[Runtime]] = {
'modal': ModalRuntime,
'runloop': RunloopRuntime,
'local': LocalRuntime,
'daytona': DaytonaRuntime,
**({'daytona': DaytonaRuntime} if _DAYTONA_AVAILABLE else {}),
'cli': CLIRuntime,
}
@@ -49,7 +56,9 @@ __all__ = [
'ModalRuntime',
'RunloopRuntime',
'DockerRuntime',
'DaytonaRuntime',
'CLIRuntime',
'get_runtime_cls',
]
if _DAYTONA_AVAILABLE:
__all__.append('DaytonaRuntime')

View File

@@ -372,20 +372,6 @@ class Runtime(FileEditRuntimeMixin):
selected_repository: str | None,
selected_branch: str | None,
) -> str:
repository = None
if selected_repository: # Determine provider from repo name
try:
provider_handler = ProviderHandler(
git_provider_tokens or MappingProxyType({})
)
repository = await provider_handler.verify_repo_provider(
selected_repository
)
except AuthenticationError:
raise RuntimeError(
'Git provider authentication issue when cloning repo'
)
if not selected_repository:
# In SaaS mode (indicated by user_id being set), always run git init
# In OSS mode, only run git init if workspace_base is not set
@@ -403,34 +389,9 @@ class Runtime(FileEditRuntimeMixin):
)
return ''
# This satisfies mypy because param is optional, but `verify_repo_provider` guarentees this gets populated
if not repository:
return ''
provider = repository.git_provider
provider_domains = {
ProviderType.GITHUB: 'github.com',
ProviderType.GITLAB: 'gitlab.com',
}
domain = provider_domains[provider]
# If git_provider_tokens is provided, use the host from the token if available
if git_provider_tokens and provider in git_provider_tokens:
domain = git_provider_tokens[provider].host or domain
# Try to use token if available, otherwise use public URL
if git_provider_tokens and provider in git_provider_tokens:
git_token = git_provider_tokens[provider].token
if git_token:
if provider == ProviderType.GITLAB:
remote_repo_url = f'https://oauth2:{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
else:
remote_repo_url = f'https://{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
else:
remote_repo_url = f'https://{domain}/{selected_repository}.git'
else:
remote_repo_url = f'https://{domain}/{selected_repository}.git'
remote_repo_url = await self._get_authenticated_git_url(
selected_repository, git_provider_tokens
)
if not remote_repo_url:
raise ValueError('Missing either Git token or valid repository')
@@ -630,36 +591,52 @@ fi
return loaded_microagents
def _get_authenticated_git_url(self, repo_path: str) -> str:
async def _get_authenticated_git_url(
self, repo_name: str, git_provider_tokens: PROVIDER_TOKEN_TYPE | None
) -> str:
"""Get an authenticated git URL for a repository.
Args:
repo_path: Repository path (e.g., "github.com/acme-co/api")
repo_path: Repository name (owner/repo)
Returns:
Authenticated git URL if credentials are available, otherwise regular HTTPS URL
"""
remote_url = f'https://{repo_path}.git'
# Determine provider from repo path
provider = None
if 'github.com' in repo_path:
provider = ProviderType.GITHUB
elif 'gitlab.com' in repo_path:
provider = ProviderType.GITLAB
try:
provider_handler = ProviderHandler(
git_provider_tokens or MappingProxyType({})
)
repository = await provider_handler.verify_repo_provider(repo_name)
except AuthenticationError:
raise Exception('Git provider authentication issue when getting remote URL')
# Add authentication if available
if (
provider
and self.git_provider_tokens
and provider in self.git_provider_tokens
):
git_token = self.git_provider_tokens[provider].token
provider = repository.git_provider
repo_name = repository.full_name
provider_domains = {
ProviderType.GITHUB: 'github.com',
ProviderType.GITLAB: 'gitlab.com',
}
domain = provider_domains[provider]
# If git_provider_tokens is provided, use the host from the token if available
if git_provider_tokens and provider in git_provider_tokens:
domain = git_provider_tokens[provider].host or domain
# Try to use token if available, otherwise use public URL
if git_provider_tokens and provider in git_provider_tokens:
git_token = git_provider_tokens[provider].token
if git_token:
if provider == ProviderType.GITLAB:
remote_url = f'https://oauth2:{git_token.get_secret_value()}@{repo_path.replace("gitlab.com/", "")}.git'
remote_url = f'https://oauth2:{git_token.get_secret_value()}@{domain}/{repo_name}.git'
else:
remote_url = f'https://{git_token.get_secret_value()}@{repo_path.replace("github.com/", "")}.git'
remote_url = f'https://{git_token.get_secret_value()}@{domain}/{repo_name}.git'
else:
remote_url = f'https://{domain}/{repo_name}.git'
else:
remote_url = f'https://{domain}/{repo_name}.git'
return remote_url
@@ -685,13 +662,10 @@ fi
return loaded_microagents
# Extract the domain and org/user name
domain = repo_parts[0] if len(repo_parts) > 2 else 'github.com'
org_name = repo_parts[-2]
# Construct the org-level .openhands repo path
org_openhands_repo = f'{domain}/{org_name}/.openhands'
if domain not in org_openhands_repo:
org_openhands_repo = f'github.com/{org_openhands_repo}'
org_openhands_repo = f'{org_name}/.openhands'
self.log(
'info',
@@ -704,9 +678,18 @@ fi
org_repo_dir = self.workspace_root / f'org_openhands_{org_name}'
# Get authenticated URL and do a shallow clone (--depth 1) for efficiency
remote_url = self._get_authenticated_git_url(org_openhands_repo)
clone_cmd = f'git clone --depth 1 {remote_url} {org_repo_dir}'
try:
remote_url = call_async_from_sync(
self._get_authenticated_git_url,
GENERAL_TIMEOUT,
org_openhands_repo,
self.git_provider_tokens,
)
except Exception as e:
raise Exception(str(e))
clone_cmd = (
f'GIT_TERMINAL_PROMPT=0 git clone --depth 1 {remote_url} {org_repo_dir}'
)
action = CmdRunAction(command=clone_cmd)
obs = self.run_action(action)

View File

@@ -6,7 +6,14 @@ from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
from openhands.runtime.impl.cli import CLIRuntime
from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
try:
from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
_DAYTONA_AVAILABLE = True
except ImportError:
_DAYTONA_AVAILABLE = False
DaytonaRuntime = None # type: ignore
from openhands.runtime.impl.docker.docker_runtime import DockerRuntime
from openhands.runtime.impl.e2b.e2b_runtime import E2BRuntime
from openhands.runtime.impl.local.local_runtime import LocalRuntime
@@ -17,7 +24,6 @@ from openhands.runtime.impl.runloop.runloop_runtime import RunloopRuntime
__all__ = [
'ActionExecutionClient',
'CLIRuntime',
'DaytonaRuntime',
'DockerRuntime',
'E2BRuntime',
'LocalRuntime',
@@ -25,3 +31,6 @@ __all__ = [
'RemoteRuntime',
'RunloopRuntime',
]
if _DAYTONA_AVAILABLE:
__all__.append('DaytonaRuntime')

View File

@@ -5,6 +5,7 @@ It does not implement browser functionality.
import asyncio
import os
import re
import select
import shutil
import signal
@@ -50,6 +51,7 @@ from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.base import Runtime
from openhands.runtime.plugins import PluginRequirement
from openhands.runtime.runtime_status import RuntimeStatus
from openhands.runtime.utils.bash import SubprocessBashSession
class CLIRuntime(Runtime):
@@ -119,6 +121,13 @@ class CLIRuntime(Runtime):
self.file_editor = OHEditor(workspace_root=self._workspace_path)
self._shell_stream_callback: Callable[[str], None] | None = None
# Initialize bash session
self.bash_session = SubprocessBashSession(
work_dir=self._workspace_path,
username=None,
no_change_timeout_seconds=30,
)
logger.warning(
'Initializing CLIRuntime. WARNING: NO SANDBOX IS USED. '
'This runtime executes commands directly on the local system. '
@@ -138,6 +147,9 @@ class CLIRuntime(Runtime):
if not self.attach_to_existing:
await asyncio.to_thread(self.setup_initial_env)
# Initialize bash session
self.bash_session.initialize()
self._runtime_initialized = True
self.set_runtime_status(RuntimeStatus.RUNTIME_STARTED)
logger.info(f'CLIRuntime initialized with workspace at {self._workspace_path}')
@@ -351,7 +363,7 @@ class CLIRuntime(Runtime):
)
def run(self, action: CmdRunAction) -> Observation:
"""Run a command using subprocess."""
"""Run a command using the bash session."""
if not self._runtime_initialized:
return ErrorObservation(
f'Runtime not initialized for command: {action.command}'
@@ -369,18 +381,36 @@ class CLIRuntime(Runtime):
)
try:
effective_timeout = (
action.timeout
if action.timeout is not None
else self.config.sandbox.timeout
)
# Set effective timeout if not already set
if action.timeout is None:
action.set_hard_timeout(self.config.sandbox.timeout)
logger.debug(
f'Running command in CLIRuntime: "{action.command}" with effective timeout: {effective_timeout}s'
)
return self._execute_shell_command(
action.command, timeout=effective_timeout
f'Running command in CLIRuntime: "{action.command}" with effective timeout: {action.timeout}s'
)
# Use the bash session to execute the command
obs = self.bash_session.execute(action)
# For CLIRuntime, we need to adjust the timeout message format and working directory
if isinstance(obs, CmdOutputObservation):
# Fix timeout message format for CLIRuntime
if obs.metadata.suffix and 'timed out after' in obs.metadata.suffix:
# Extract timeout duration from the suffix
match = re.search(
r'timed out after ([\d.]+) seconds', obs.metadata.suffix
)
if match:
timeout_duration = match.group(1)
obs.metadata.suffix = (
f'[The command timed out after {timeout_duration} seconds.]'
)
# Fix working directory for CLIRuntime
obs.metadata.working_dir = self._workspace_path
return obs
except Exception as e:
logger.error(
f'Error in CLIRuntime.run for command "{action.command}": {str(e)}'
@@ -737,6 +767,10 @@ class CLIRuntime(Runtime):
raise RuntimeError(f'Error creating zip file: {str(e)}')
def close(self) -> None:
# Clean up bash session
if hasattr(self, 'bash_session'):
self.bash_session.close()
self._runtime_initialized = False
super().close()

View File

@@ -89,12 +89,14 @@ docker run -it --rm --pull=always \
-e LOG_ALL_EVENTS=true \
-e RUNTIME=daytona \
-e DAYTONA_API_KEY=${DAYTONA_API_KEY} \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:${OPENHANDS_VERSION}
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
#### Windows:
```powershell
docker run -it --rm --pull=always `
@@ -102,12 +104,14 @@ docker run -it --rm --pull=always `
-e LOG_ALL_EVENTS=true `
-e RUNTIME=daytona `
-e DAYTONA_API_KEY=${env:DAYTONA_API_KEY} `
-v ~/.openhands-state:/.openhands-state `
-v ~/.openhands:/.openhands `
-p 3000:3000 `
--name openhands-app `
docker.all-hands.dev/all-hands-ai/openhands:${env:OPENHANDS_VERSION}
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
> **Tip:** If you don't want your sandboxes to default to the EU region, you can set the `DAYTONA_TARGET` environment variable to `us`
### Running OpenHands Locally Without Docker

View File

@@ -1,18 +1,18 @@
import json
from typing import Callable
import httpx
import tenacity
from daytona_sdk import (
CreateWorkspaceParams,
from daytona import (
CreateSandboxFromSnapshotParams,
Daytona,
DaytonaConfig,
Sandbox,
SessionExecuteRequest,
Workspace,
)
from openhands.core.config.openhands_config import OpenHandsConfig
from openhands.events.stream import EventStream
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
@@ -23,11 +23,11 @@ from openhands.runtime.utils.request import RequestHTTPError
from openhands.utils.async_utils import call_sync_from_async
from openhands.utils.tenacity_stop import stop_if_should_exit
WORKSPACE_PREFIX = 'openhands-sandbox-'
OPENHANDS_SID_LABEL = 'OpenHands_SID'
class DaytonaRuntime(ActionExecutionClient):
"""The DaytonaRuntime class is a DockerRuntime that utilizes Daytona workspace as a runtime environment."""
"""The DaytonaRuntime class is a DockerRuntime that utilizes Daytona Sandboxes as runtime environments."""
_sandbox_port: int = 4444
_vscode_port: int = 4445
@@ -42,13 +42,14 @@ class DaytonaRuntime(ActionExecutionClient):
status_callback: Callable | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
):
assert config.daytona_api_key, 'Daytona API key is required'
self.config = config
self.sid = sid
self.workspace_id = WORKSPACE_PREFIX + sid
self.workspace: Workspace | None = None
self.sandbox: Sandbox | None = None
self._vscode_url: str | None = None
daytona_config = DaytonaConfig(
@@ -74,22 +75,28 @@ class DaytonaRuntime(ActionExecutionClient):
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
def _get_workspace(self) -> Workspace | None:
def _get_sandbox(self) -> Sandbox | None:
try:
workspace = self.daytona.get_current_workspace(self.workspace_id)
self.log(
'info', f'Attached to existing workspace with id: {self.workspace_id}'
)
sandboxes = self.daytona.list({OPENHANDS_SID_LABEL: self.sid})
if len(sandboxes) == 0:
return None
assert len(sandboxes) == 1, 'Multiple sandboxes found for SID'
sandbox = sandboxes[0]
self.log('info', f'Attached to existing sandbox with id: {self.sid}')
except Exception:
self.log(
'warning',
f'Failed to attach to existing workspace with id: {self.workspace_id}',
f'Failed to attach to existing sandbox with id: {self.sid}',
)
workspace = None
sandbox = None
return workspace
return sandbox
def _get_creation_env_vars(self) -> dict[str, str]:
env_vars: dict[str, str] = {
@@ -103,37 +110,28 @@ class DaytonaRuntime(ActionExecutionClient):
return env_vars
def _create_workspace(self) -> Workspace:
workspace_params = CreateWorkspaceParams(
id=self.workspace_id,
def _create_sandbox(self) -> Sandbox:
sandbox_params = CreateSandboxFromSnapshotParams(
language='python',
image=self.config.sandbox.runtime_container_image,
snapshot=self.config.sandbox.runtime_container_image,
public=True,
env_vars=self._get_creation_env_vars(),
labels={OPENHANDS_SID_LABEL: self.sid},
)
workspace = self.daytona.create(workspace_params)
return workspace
return self.daytona.create(sandbox_params)
def _construct_api_url(self, port: int) -> str:
assert self.workspace is not None, 'Workspace is not initialized'
assert self.workspace.instance.info is not None, (
'Workspace info is not available'
)
assert self.workspace.instance.info.provider_metadata is not None, (
'Provider metadata is not available'
)
assert self.sandbox is not None, 'Sandbox is not initialized'
assert self.sandbox.runner_domain is not None, 'Runner domain is not available'
node_domain = json.loads(self.workspace.instance.info.provider_metadata)[
'nodeDomain'
]
return f'https://{port}-{self.workspace.id}.{node_domain}'
return f'https://{port}-{self.sandbox.id}.{self.sandbox.runner_domain}'
@property
def action_execution_server_url(self) -> str:
return self.api_url
def _start_action_execution_server(self) -> None:
assert self.workspace is not None, 'Workspace is not initialized'
assert self.sandbox is not None, 'Sandbox is not initialized'
start_command: list[str] = get_action_execution_server_startup_command(
server_port=self._sandbox_port,
@@ -153,9 +151,9 @@ class DaytonaRuntime(ActionExecutionClient):
)
exec_session_id = 'action-execution-server'
self.workspace.process.create_session(exec_session_id)
self.sandbox.process.create_session(exec_session_id)
exec_command = self.workspace.process.execute_session_command(
exec_command = self.sandbox.process.execute_session_command(
exec_session_id,
SessionExecuteRequest(command=start_command_str, var_async=True),
)
@@ -175,27 +173,27 @@ class DaytonaRuntime(ActionExecutionClient):
should_start_action_execution_server = False
if self.attach_to_existing:
self.workspace = await call_sync_from_async(self._get_workspace)
self.sandbox = await call_sync_from_async(self._get_sandbox)
else:
should_start_action_execution_server = True
if self.workspace is None:
if self.sandbox is None:
self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
self.workspace = await call_sync_from_async(self._create_workspace)
self.log('info', f'Created new workspace with id: {self.workspace_id}')
self.sandbox = await call_sync_from_async(self._create_sandbox)
self.log('info', f'Created a new sandbox with id: {self.sid}')
self.api_url = self._construct_api_url(self._sandbox_port)
state = self.workspace.instance.state
state = self.sandbox.state
if state == 'stopping':
self.log('info', 'Waiting for Daytona workspace to stop...')
await call_sync_from_async(self.workspace.wait_for_workspace_stop)
self.log('info', 'Waiting for the Daytona sandbox to stop...')
await call_sync_from_async(self.sandbox.wait_for_sandbox_stop)
state = 'stopped'
if state == 'stopped':
self.log('info', 'Starting Daytona workspace...')
await call_sync_from_async(self.workspace.start)
self.log('info', 'Starting the Daytona sandbox...')
await call_sync_from_async(self.sandbox.start)
should_start_action_execution_server = True
if should_start_action_execution_server:
@@ -242,8 +240,8 @@ class DaytonaRuntime(ActionExecutionClient):
if self.attach_to_existing:
return
if self.workspace:
self.daytona.remove(self.workspace)
if self.sandbox:
self.sandbox.delete()
@property
def vscode_url(self) -> str | None:
@@ -255,9 +253,9 @@ class DaytonaRuntime(ActionExecutionClient):
'warning', 'Failed to get VSCode token while trying to get VSCode URL'
)
return None
if not self.workspace:
if not self.sandbox:
self.log(
'warning', 'Workspace is not initialized while trying to get VSCode URL'
'warning', 'Sandbox is not initialized while trying to get VSCode URL'
)
return None
self._vscode_url = (

View File

@@ -17,6 +17,7 @@ from openhands.core.exceptions import (
from openhands.core.logger import DEBUG, DEBUG_RUNTIME
from openhands.core.logger import openhands_logger as logger
from openhands.events import EventStream
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.builder import DockerRuntimeBuilder
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
@@ -86,6 +87,8 @@ class DockerRuntime(ActionExecutionClient):
status_callback: Callable | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
main_module: str = DEFAULT_MAIN_MODULE,
):
if not DockerRuntime._shutdown_listener_id:
@@ -132,6 +135,8 @@ class DockerRuntime(ActionExecutionClient):
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
# Log runtime_extra_deps after base class initialization so self.sid is available

View File

@@ -12,29 +12,42 @@ from openhands.events.observation import (
Observation,
)
from openhands.events.stream import EventStream
from openhands.runtime.base import Runtime
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
from openhands.runtime.impl.e2b.filestore import E2BFileStore
from openhands.runtime.impl.e2b.sandbox import E2BSandbox
from openhands.runtime.plugins import PluginRequirement
from openhands.runtime.utils.files import insert_lines, read_lines
class E2BRuntime(Runtime):
class E2BRuntime(ActionExecutionClient):
def __init__(
self,
config: OpenHandsConfig,
event_stream: EventStream,
sid: str = 'default',
plugins: list[PluginRequirement] | None = None,
sandbox: E2BSandbox | None = None,
env_vars: dict[str, str] | None = None,
status_callback: Callable | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
sandbox: E2BSandbox | None = None,
):
super().__init__(
config,
event_stream,
sid,
plugins,
status_callback=status_callback,
env_vars,
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
if sandbox is None:
self.sandbox = E2BSandbox()

View File

@@ -25,6 +25,7 @@ from openhands.events.observation import (
Observation,
)
from openhands.events.serialization import event_to_dict, observation_from_dict
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
@@ -145,6 +146,8 @@ class LocalRuntime(ActionExecutionClient):
status_callback: Callable[[str, str, str], None] | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
) -> None:
self.is_windows = sys.platform == 'win32'
if self.is_windows:
@@ -194,6 +197,8 @@ class LocalRuntime(ActionExecutionClient):
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
# If there is an API key in the environment we use this in requests to the runtime

View File

@@ -9,6 +9,7 @@ import tenacity
from openhands.core.config import OpenHandsConfig
from openhands.events import EventStream
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
@@ -53,6 +54,8 @@ class ModalRuntime(ActionExecutionClient):
status_callback: Callable | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
):
assert config.modal_api_token_id, 'Modal API token id is required'
assert config.modal_api_token_secret, 'Modal API token secret is required'
@@ -100,6 +103,8 @@ class ModalRuntime(ActionExecutionClient):
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
async def connect(self):

View File

@@ -140,7 +140,6 @@ class RemoteRuntime(ActionExecutionClient):
)
else:
self.log('info', 'No existing runtime found, starting a new one')
self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
if self.config.sandbox.runtime_container_image is None:
self.log(
'info',
@@ -160,7 +159,6 @@ class RemoteRuntime(ActionExecutionClient):
assert self.runtime_url is not None, (
'Runtime URL is not set. This should never happen.'
)
self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
if not self.attach_to_existing:
self.log('info', 'Waiting for runtime to be alive...')
self._wait_until_alive()
@@ -221,6 +219,7 @@ class RemoteRuntime(ActionExecutionClient):
def _build_runtime(self) -> None:
self.log('debug', f'Building RemoteRuntime config:\n{self.config}')
self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
response = self._send_runtime_api_request(
'GET',
f'{self.config.sandbox.remote_runtime_api_url}/registry_prefix',
@@ -265,6 +264,7 @@ class RemoteRuntime(ActionExecutionClient):
def _start_runtime(self) -> None:
# Prepare the request body for the /start endpoint
self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
command = self.get_action_execution_server_startup_command()
environment: dict[str, str] = {}
if self.config.debug or os.environ.get('DEBUG', 'false').lower() == 'true':

View File

@@ -9,6 +9,7 @@ from runloop_api_client.types.shared_params import LaunchParameters
from openhands.core.config import OpenHandsConfig
from openhands.core.logger import openhands_logger as logger
from openhands.events import EventStream
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
@@ -36,6 +37,8 @@ class RunloopRuntime(ActionExecutionClient):
status_callback: Callable | None = None,
attach_to_existing: bool = False,
headless_mode: bool = True,
user_id: str | None = None,
git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
):
assert config.runloop_api_key is not None, 'Runloop API key is required'
self.devbox: DevboxView | None = None
@@ -53,6 +56,8 @@ class RunloopRuntime(ActionExecutionClient):
status_callback,
attach_to_existing,
headless_mode,
user_id,
git_provider_tokens,
)
# Buffer for container logs
self._vscode_url: str | None = None

View File

@@ -1,5 +1,6 @@
import os
import re
import subprocess
import time
import traceback
import uuid
@@ -167,6 +168,7 @@ class BashCommandStatus(Enum):
COMPLETED = 'completed'
NO_CHANGE_TIMEOUT = 'no_change_timeout'
HARD_TIMEOUT = 'hard_timeout'
INTERRUPTED = 'interrupted'
def _remove_command_prefix(command_output: str, command: str) -> str:
@@ -654,3 +656,247 @@ class BashSession:
logger.debug(f'SLEEPING for {self.POLL_INTERVAL} seconds for next poll')
time.sleep(self.POLL_INTERVAL)
raise RuntimeError('Bash session was likely interrupted...')
class SubprocessBashSession(BashSession):
"""
A bash session implementation using individual subprocess calls
instead of tmux, while maintaining the same interface as BashSession.
"""
def __init__(
self,
work_dir: str,
username: str | None = None,
no_change_timeout_seconds: int = 30,
max_memory_mb: int | None = None,
allow_multiple_commands: bool = True,
):
# Initialize parent class attributes
self.work_dir = work_dir
self.username = username
self.no_change_timeout_seconds = no_change_timeout_seconds
self.max_memory_mb = max_memory_mb
self.allow_multiple_commands = allow_multiple_commands
self._initialized = False
# Set initial state
self.prev_status: BashCommandStatus | None = None
self.prev_output: str = ''
self._closed: bool = False
self._cwd = os.path.abspath(self.work_dir)
self._current_process: subprocess.Popen | None = None
def initialize(self) -> None:
"""Initialize the bash session."""
logger.debug(
f'Initializing subprocess bash session with work dir: {self.work_dir}'
)
# Set initial state
self._initialized = True
logger.debug(
f'Subprocess bash session initialized with work dir: {self.work_dir}'
)
def close(self) -> None:
"""Clean up the session."""
if self._current_process and self._current_process.poll() is None:
self._current_process.terminate()
try:
self._current_process.wait(timeout=5)
except subprocess.TimeoutExpired:
self._current_process.kill()
self._closed = True
def interrupt(self) -> None:
"""Interrupt the currently running command (Ctrl+C equivalent)."""
if self._current_process and self._current_process.poll() is None:
logger.debug('Interrupting current command')
self._current_process.terminate()
self.prev_status = BashCommandStatus.INTERRUPTED
def get_status(self) -> BashCommandStatus | None:
"""Get the status of the last command."""
return self.prev_status
def is_running(self) -> bool:
"""Check if a command is currently running."""
return (
self._current_process is not None and self._current_process.poll() is None
)
def execute(self, action: CmdRunAction) -> CmdOutputObservation | ErrorObservation:
"""Execute a command in the bash session using subprocess."""
from openhands.events.observation.commands import CmdOutputMetadata
if not self._initialized:
return ErrorObservation(content='Subprocess bash session not initialized')
command = action.command
# Handle interactive input (not supported in subprocess mode)
if action.is_input:
return ErrorObservation(
content=f"Subprocess bash session does not support interactive input. The command '{command}' was not sent to any process."
)
# Handle empty commands
if command == '':
return CmdOutputObservation(
content='ERROR: No command provided.',
command='',
metadata=CmdOutputMetadata(),
)
# Check for multiple commands based on configuration
if not self.allow_multiple_commands:
splited_commands = split_bash_commands(command)
if len(splited_commands) > 1:
return ErrorObservation(
content=(
f'ERROR: Cannot execute multiple commands at once.\n'
f'Please run each command separately OR chain them into a single command via && or ;\n'
f'Provided commands:\n{"\n".join(f"({i + 1}) {cmd}" for i, cmd in enumerate(splited_commands))}'
)
)
start_time = time.time()
try:
# Prepare the command
escaped_command = escape_bash_special_chars(command)
logger.debug(f'EXECUTING COMMAND: {escaped_command!r}')
# Set effective timeout
effective_timeout = action.timeout if action.timeout else 30.0
# Check if this is a background command (ends with &)
is_background = command.strip().endswith('&')
# Execute the command using subprocess
self._current_process = subprocess.Popen(
['bash', '-c', escaped_command],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
cwd=self._cwd,
)
try:
if is_background:
# For background commands, wait a short time to see if bash exits quickly
# Background commands should cause bash to return immediately with exit code 0
try:
stdout, stderr = self._current_process.communicate(timeout=0.5)
exit_code = self._current_process.returncode
except subprocess.TimeoutExpired:
# If bash doesn't exit quickly, it means the command is still running
# This shouldn't happen for proper background commands, but handle it
self._current_process.kill()
stdout, stderr = self._current_process.communicate()
exit_code = 0 # Treat as successful background launch
else:
stdout, stderr = self._current_process.communicate(
timeout=effective_timeout
)
exit_code = self._current_process.returncode
# Check if process was interrupted (negative exit codes indicate signals)
if exit_code < 0:
self.prev_status = BashCommandStatus.INTERRUPTED
else:
self.prev_status = BashCommandStatus.COMPLETED
# Combine output and error
combined_output = stdout
if stderr:
combined_output += f'\n{stderr}'
# Update working directory if it's a cd command
if command.strip().startswith('cd '):
try:
# Try to get the new working directory
pwd_process = subprocess.run(
['bash', '-c', f'{escaped_command}; pwd'],
capture_output=True,
text=True,
cwd=self._cwd,
timeout=5,
)
if pwd_process.returncode == 0:
new_cwd = pwd_process.stdout.strip()
if os.path.isdir(new_cwd):
self._cwd = new_cwd
except Exception as e:
logger.debug(f'Failed to update working directory: {e}')
# Create metadata
metadata = CmdOutputMetadata()
metadata.exit_code = exit_code
metadata.working_dir = self._cwd
self.prev_output = ''
return CmdOutputObservation(
content=combined_output.rstrip() if combined_output else '',
command=command,
metadata=metadata,
)
except subprocess.TimeoutExpired:
# Handle timeout
self._current_process.kill()
elapsed_time = time.time() - start_time
# Try to get partial output
try:
stdout, stderr = self._current_process.communicate(timeout=1.0)
partial_output = stdout
if stderr:
partial_output += f'\n{stderr}'
except subprocess.TimeoutExpired:
partial_output = ''
metadata = CmdOutputMetadata()
metadata.suffix = (
f'\n[The command timed out after {elapsed_time:.1f} seconds. '
f'{TIMEOUT_MESSAGE_TEMPLATE}]'
)
self.prev_status = BashCommandStatus.HARD_TIMEOUT
return CmdOutputObservation(
content=partial_output.rstrip() if partial_output else '',
command=command,
metadata=metadata,
)
finally:
# Clear current process reference
self._current_process = None
except Exception as e:
logger.error(f'Error executing command "{command}": {e}')
return ErrorObservation(
content=f'Error executing command "{command}": {str(e)}'
)
def _ready_for_next_command(self) -> None:
"""Reset state for next command."""
pass
def _get_pane_content(self) -> str:
"""Get current output."""
return ''
@property
def cwd(self) -> str:
"""Get current working directory."""
return self._cwd
@property
def initialized(self) -> bool:
"""Check if the session is initialized."""
return self._initialized

View File

@@ -305,7 +305,6 @@ class FileEditRuntimeMixin(FileEditRuntimeInterface):
return ErrorObservation(error_msg)
content_to_edit = '\n'.join(old_file_lines[start_idx:end_idx])
self.draft_editor_llm.reset()
_edited_content = get_new_file_contents(
self.draft_editor_llm, content_to_edit, action.content
)

View File

@@ -158,7 +158,7 @@ class AgentSession:
# NOTE: this needs to happen before controller is created
# so MCP tools can be included into the SystemMessageAction
if self.runtime and runtime_connected and agent.config.enable_mcp:
await add_mcp_tools_to_agent(agent, self.runtime, self.memory, config)
await add_mcp_tools_to_agent(agent, self.runtime, self.memory)
if replay_json:
initial_message = self._run_replay(
@@ -232,8 +232,7 @@ class AgentSession:
if self.event_stream is not None:
self.event_stream.close()
if self.controller is not None:
end_state = self.controller.get_state()
end_state.save_to_session(self.sid, self.file_store, self.user_id)
self.controller.save_state()
await self.controller.close()
if self.runtime is not None:
EXECUTOR.submit(self.runtime.close)
@@ -366,6 +365,7 @@ class AgentSession:
headless_mode=False,
attach_to_existing=False,
env_vars=env_vars,
git_provider_tokens=git_provider_tokens,
)
# FIXME: this sleep is a terrible hack.
@@ -438,10 +438,12 @@ class AgentSession:
initial_state = self._maybe_restore_state()
controller = AgentController(
sid=self.sid,
user_id=self.user_id,
file_store=self.file_store,
event_stream=self.event_stream,
agent=agent,
max_iterations=int(max_iterations),
max_budget_per_task=max_budget_per_task,
iteration_delta=int(max_iterations),
budget_per_task_delta=max_budget_per_task,
agent_to_llm_config=agent_to_llm_config,
agent_configs=agent_configs,
confirmation_mode=confirmation_mode,

View File

@@ -95,7 +95,7 @@ async def auto_generate_title(
# Find the first user message
first_user_message = None
for event in event_stream.get_events():
for event in event_stream.search_events():
if (
event.source == EventSource.USER
and isinstance(event, MessageAction)

View File

@@ -127,5 +127,5 @@ class PromptManager:
None,
)
if latest_user_message:
reminder_text = f'\n\nENVIRONMENT REMINDER: You have {state.max_iterations - state.iteration} turns left to complete the task. When finished reply with <finish></finish>.'
reminder_text = f'\n\nENVIRONMENT REMINDER: You have {state.iteration_flag.max_value - state.iteration_flag.current_value} turns left to complete the task. When finished reply with <finish></finish>.'
latest_user_message.content.append(TextContent(text=reminder_text))

188
poetry.lock generated
View File

@@ -1,4 +1,50 @@
# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.
[[package]]
name = "aioboto3"
version = "14.3.0"
description = "Async boto3 wrapper"
optional = false
python-versions = "<4.0,>=3.8"
groups = ["main"]
files = [
{file = "aioboto3-14.3.0-py3-none-any.whl", hash = "sha256:aec5de94e9edc1ffbdd58eead38a37f00ddac59a519db749a910c20b7b81bca7"},
{file = "aioboto3-14.3.0.tar.gz", hash = "sha256:1d18f88bb56835c607b62bb6cb907754d717bedde3ddfff6935727cb48a80135"},
]
[package.dependencies]
aiobotocore = {version = "2.22.0", extras = ["boto3"]}
aiofiles = ">=23.2.1"
[package.extras]
chalice = ["chalice (>=1.24.0)"]
s3cse = ["cryptography (>=44.0.1)"]
[[package]]
name = "aiobotocore"
version = "2.22.0"
description = "Async client for aws services using botocore and aiohttp"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "aiobotocore-2.22.0-py3-none-any.whl", hash = "sha256:b4e6306f79df9d81daff1f9d63189a2dbee4b77ce3ab937304834e35eaaeeccf"},
{file = "aiobotocore-2.22.0.tar.gz", hash = "sha256:11091477266b75c2b5d28421c1f2bc9a87d175d0b8619cb830805e7a113a170b"},
]
[package.dependencies]
aiohttp = ">=3.9.2,<4.0.0"
aioitertools = ">=0.5.1,<1.0.0"
boto3 = {version = ">=1.37.2,<1.37.4", optional = true, markers = "extra == \"boto3\""}
botocore = ">=1.37.2,<1.37.4"
jmespath = ">=0.7.1,<2.0.0"
multidict = ">=6.0.0,<7.0.0"
python-dateutil = ">=2.1,<3.0.0"
wrapt = ">=1.10.10,<2.0.0"
[package.extras]
awscli = ["awscli (>=1.38.2,<1.38.4)"]
boto3 = ["boto3 (>=1.37.2,<1.37.4)"]
[[package]]
name = "aiofiles"
@@ -147,6 +193,22 @@ files = [
[package.dependencies]
aiohttp = "*"
[[package]]
name = "aioitertools"
version = "0.12.0"
description = "itertools and builtins for AsyncIO and mixed iterables"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "aioitertools-0.12.0-py3-none-any.whl", hash = "sha256:fc1f5fac3d737354de8831cbba3eb04f79dd649d8f3afb4c5b114925e662a796"},
{file = "aioitertools-0.12.0.tar.gz", hash = "sha256:c2a9055b4fbb7705f561b9d86053e8af5d10cc845d22c32008c43490b2d8dd6b"},
]
[package.extras]
dev = ["attribution (==1.8.0)", "black (==24.8.0)", "build (>=1.2)", "coverage (==7.6.1)", "flake8 (==7.1.1)", "flit (==3.9.0)", "mypy (==1.11.2)", "ufmt (==2.7.1)", "usort (==1.0.8.post1)"]
docs = ["sphinx (==8.0.2)", "sphinx-mdinclude (==0.6.2)"]
[[package]]
name = "aiolimiter"
version = "1.2.1"
@@ -400,7 +462,7 @@ description = "LTS Port of Python audioop"
optional = false
python-versions = ">=3.13"
groups = ["main"]
markers = "python_version >= \"3.13\""
markers = "python_version == \"3.13\""
files = [
{file = "audioop_lts-0.2.1-cp313-abi3-macosx_10_13_universal2.whl", hash = "sha256:fd1345ae99e17e6910f47ce7d52673c6a1a70820d78b67de1b7abb3af29c426a"},
{file = "audioop_lts-0.2.1-cp313-abi3-macosx_10_13_x86_64.whl", hash = "sha256:e175350da05d2087e12cea8e72a70a1a8b14a17e92ed2022952a4419689ede5e"},
@@ -581,20 +643,20 @@ files = [
[[package]]
name = "boto3"
version = "1.38.36"
version = "1.37.3"
description = "The AWS SDK for Python"
optional = false
python-versions = ">=3.9"
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "boto3-1.38.36-py3-none-any.whl", hash = "sha256:34c27d7317cadb62c0e9856e5d5aa0271ef47202d340584831048bc7ac904136"},
{file = "boto3-1.38.36.tar.gz", hash = "sha256:efe0aaa060f8fedd76e5c942055f051aee0432fc722d79d8830a9fd9db83593e"},
{file = "boto3-1.37.3-py3-none-any.whl", hash = "sha256:2063b40af99fd02f6228ff52397b552ff3353831edaf8d25cc04801827ab9794"},
{file = "boto3-1.37.3.tar.gz", hash = "sha256:21f3ce0ef111297e63a6eb998a25197b8c10982970c320d4c6e8db08be2157be"},
]
[package.dependencies]
botocore = ">=1.38.36,<1.39.0"
botocore = ">=1.37.3,<1.38.0"
jmespath = ">=0.7.1,<2.0.0"
s3transfer = ">=0.13.0,<0.14.0"
s3transfer = ">=0.11.0,<0.12.0"
[package.extras]
crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
@@ -1028,14 +1090,14 @@ xray = ["mypy-boto3-xray (>=1.38.0,<1.39.0)"]
[[package]]
name = "botocore"
version = "1.38.36"
version = "1.37.3"
description = "Low-level, data-driven core of boto 3."
optional = false
python-versions = ">=3.9"
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "botocore-1.38.36-py3-none-any.whl", hash = "sha256:b6a50b853f6d23af9edfed89a59800c6bc1687a947cdd3492879f7d64e002d30"},
{file = "botocore-1.38.36.tar.gz", hash = "sha256:4a1ced1a4218bdff0ed5b46abb54570d473154ddefafa5d121a8d96e4b76ebc1"},
{file = "botocore-1.37.3-py3-none-any.whl", hash = "sha256:d01bd3bf4c80e61fa88d636ad9f5c9f60a551d71549b481386c6b4efe0bb2b2e"},
{file = "botocore-1.37.3.tar.gz", hash = "sha256:fe8403eb55a88faf9b0f9da6615e5bee7be056d75e17af66c3c8f0a3b0648da4"},
]
[package.dependencies]
@@ -1580,7 +1642,7 @@ files = [
{file = "colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6"},
{file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"},
]
markers = {main = "platform_system == \"Windows\" or sys_platform == \"win32\" or os_name == \"nt\"", dev = "os_name == \"nt\" or sys_platform == \"win32\"", runtime = "sys_platform == \"win32\"", test = "platform_system == \"Windows\" or sys_platform == \"win32\""}
markers = {main = "platform_system == \"Windows\" or os_name == \"nt\" or sys_platform == \"win32\"", dev = "os_name == \"nt\" or sys_platform == \"win32\"", runtime = "sys_platform == \"win32\"", test = "platform_system == \"Windows\" or sys_platform == \"win32\""}
[[package]]
name = "comm"
@@ -1932,16 +1994,48 @@ tests-numpy2 = ["Pillow (>=9.4.0)", "absl-py", "decorator", "elasticsearch (<8.0
torch = ["torch"]
vision = ["Pillow (>=9.4.0)"]
[[package]]
name = "daytona"
version = "0.21.1"
description = "Python SDK for Daytona"
optional = false
python-versions = ">=3.7"
groups = ["main"]
files = [
{file = "daytona-0.21.1-py3-none-any.whl", hash = "sha256:1ce6b352f52ef92e667098b7bdaa60c22ffbfb8e686a8cbd12418bf7698ac834"},
{file = "daytona-0.21.1.tar.gz", hash = "sha256:01d83dd2b627f87e82491fb97f41845768d75c33f0767eaa44f6e8378bd58e60"},
]
[package.dependencies]
aioboto3 = ">=14.0.0,<15.0.0"
aiofiles = ">=24.1.0,<24.2.0"
aiohttp = ">=3.12.0,<4.0.0"
aiohttp_retry = ">=2.9.0,<3.0.0"
boto3 = ">=1.0.0,<2.0.0"
daytona_api_client = ">=0.21.0,<0.22.0"
daytona_api_client_async = ">=0.21.0,<0.22.0"
Deprecated = ">=1.2.18,<2.0.0"
environs = ">=9.5.0,<10.0.0"
httpx = ">=0.28.0,<0.29.0"
marshmallow = ">=3.19.0,<4.0.0"
pydantic = ">=2.4.2,<3.0.0"
python-dateutil = ">=2.8.2,<3.0.0"
toml = ">=0.10.0,<0.11.0"
urllib3 = ">=2.0.7,<3.0.0"
[package.extras]
dev = ["black[jupyter] (>=23.1.0,<24.0.0)", "build (>=1.0.3)", "isort (>=5.10.0,<6.0.0)", "matplotlib (>=3.10.0,<3.11.0)", "nbqa (>=1.9.1,<2.0.0)", "pydoc-markdown (>=4.8.2)", "pylint (>=3.3.4,<4.0.0)", "setuptools (>=68.0.0)", "twine (>=4.0.2)", "unasync (>=0.6.0,<0.7.0)", "wheel (>=0.41.2)"]
[[package]]
name = "daytona-api-client"
version = "0.20.1"
version = "0.21.0"
description = "Daytona"
optional = false
python-versions = "*"
groups = ["main"]
files = [
{file = "daytona_api_client-0.20.1-py3-none-any.whl", hash = "sha256:4d5023108013365eba76bd0bd4704f30dee54c13e2ac5b62e8c88bcd4af5db92"},
{file = "daytona_api_client-0.20.1.tar.gz", hash = "sha256:ff2061f7e7dc9c935a9087216600be277cb9cf6b8c1eecdfe333ef30d6b208fd"},
{file = "daytona_api_client-0.21.0-py3-none-any.whl", hash = "sha256:a8ff1f0fb397368dbd6ddb224c28d679e599c657eab2ec5821cf0c972a60229a"},
{file = "daytona_api_client-0.21.0.tar.gz", hash = "sha256:92d591c5a1750a827b5850425ce483441609b72b05d35a618d5353fbbba50bca"},
]
[package.dependencies]
@@ -1952,14 +2046,14 @@ urllib3 = ">=1.25.3,<3.0.0"
[[package]]
name = "daytona-api-client-async"
version = "0.20.1"
version = "0.21.0"
description = "Daytona"
optional = false
python-versions = "*"
groups = ["main"]
files = [
{file = "daytona_api_client_async-0.20.1-py3-none-any.whl", hash = "sha256:f24e06e3ab6e554214ed064f1b4c8723356c76c14c69de9a73a6cad60a386127"},
{file = "daytona_api_client_async-0.20.1.tar.gz", hash = "sha256:043045cb173b0b53416c19a9e276124a5c4fe14209f409a8572ef1975240e53f"},
{file = "daytona_api_client_async-0.21.0-py3-none-any.whl", hash = "sha256:f5731963d0dd6c1e207b92bdc7f5b59952d3365444bc9dc8b013d77a4dddf377"},
{file = "daytona_api_client_async-0.21.0.tar.gz", hash = "sha256:08a22c0d1616f82efa8d157d7be6c432554fd43d75560725c4e0cef0228607d6"},
]
[package.dependencies]
@@ -1970,35 +2064,6 @@ python-dateutil = ">=2.8.2"
typing-extensions = ">=4.7.1"
urllib3 = ">=1.25.3,<3.0.0"
[[package]]
name = "daytona-sdk"
version = "0.20.0"
description = "Python SDK for Daytona"
optional = false
python-versions = ">=3.7"
groups = ["main"]
files = [
{file = "daytona_sdk-0.20.0-py3-none-any.whl", hash = "sha256:7919acfff21c072a0ea826a3b250c0d9c5765e58c054d2bd5b91ea76f0df4709"},
{file = "daytona_sdk-0.20.0.tar.gz", hash = "sha256:b5c13b999fcce1e6460974dbbb0dd336d8ca1e96d6a25afe705f476fba4e6f11"},
]
[package.dependencies]
aiofiles = ">=24.1.0,<24.2.0"
aiohttp = ">=3.12.0,<4.0.0"
aiohttp_retry = ">=2.9.0,<3.0.0"
daytona_api_client = ">=0.20.0,<0.21.0"
daytona_api_client_async = ">=0.20.0,<0.21.0"
Deprecated = ">=1.2.18,<2.0.0"
environs = ">=9.5.0,<10.0.0"
httpx = ">=0.28.0,<0.29.0"
marshmallow = ">=3.19.0,<4.0.0"
pydantic = ">=2.4.2,<3.0.0"
python-dateutil = ">=2.8.2,<3.0.0"
urllib3 = ">=2.0.7,<3.0.0"
[package.extras]
dev = ["black[jupyter] (>=23.1.0,<24.0.0)", "build (>=1.0.3)", "isort (>=5.10.0,<6.0.0)", "matplotlib (>=3.10.0,<3.11.0)", "nbqa (>=1.9.1,<2.0.0)", "pydoc-markdown (>=4.8.2)", "pylint (>=3.3.4,<4.0.0)", "setuptools (>=68.0.0)", "twine (>=4.0.2)", "unasync (>=0.6.0,<0.7.0)", "wheel (>=0.41.2)"]
[[package]]
name = "debugpy"
version = "1.8.14"
@@ -2974,8 +3039,8 @@ files = [
google-api-core = {version = ">=1.34.1,<2.0.dev0 || >=2.11.dev0,<3.0.0dev", extras = ["grpc"]}
google-auth = ">=2.14.1,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0dev"
proto-plus = [
{version = ">=1.22.3,<2.0.0dev"},
{version = ">=1.25.0,<2.0.0dev", markers = "python_version >= \"3.13\""},
{version = ">=1.22.3,<2.0.0dev"},
]
protobuf = ">=3.20.2,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0dev"
@@ -2997,8 +3062,8 @@ googleapis-common-protos = ">=1.56.2,<2.0.0"
grpcio = {version = ">=1.49.1,<2.0.0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
grpcio-status = {version = ">=1.49.1,<2.0.0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
proto-plus = [
{version = ">=1.22.3,<2.0.0"},
{version = ">=1.25.0,<2.0.0", markers = "python_version >= \"3.13\""},
{version = ">=1.22.3,<2.0.0"},
]
protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<7.0.0"
requests = ">=2.18.0,<3.0.0"
@@ -3216,8 +3281,8 @@ google-api-core = {version = ">=1.34.1,<2.0.dev0 || >=2.11.dev0,<3.0.0", extras
google-auth = ">=2.14.1,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0"
grpc-google-iam-v1 = ">=0.14.0,<1.0.0"
proto-plus = [
{version = ">=1.22.3,<2.0.0"},
{version = ">=1.25.0,<2.0.0", markers = "python_version >= \"3.13\""},
{version = ">=1.22.3,<2.0.0"},
]
protobuf = ">=3.20.2,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<7.0.0"
@@ -6479,8 +6544,8 @@ files = [
[package.dependencies]
googleapis-common-protos = ">=1.52,<2.0"
grpcio = [
{version = ">=1.63.2,<2.0.0", markers = "python_version < \"3.13\""},
{version = ">=1.66.2,<2.0.0", markers = "python_version >= \"3.13\""},
{version = ">=1.63.2,<2.0.0", markers = "python_version < \"3.13\""},
]
opentelemetry-api = ">=1.15,<2.0"
opentelemetry-exporter-otlp-proto-common = "1.34.1"
@@ -8967,21 +9032,21 @@ typing-extensions = ">=4.10,<5"
[[package]]
name = "s3transfer"
version = "0.13.0"
version = "0.11.3"
description = "An Amazon S3 Transfer Manager"
optional = false
python-versions = ">=3.9"
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "s3transfer-0.13.0-py3-none-any.whl", hash = "sha256:0148ef34d6dd964d0d8cf4311b2b21c474693e57c2e069ec708ce043d2b527be"},
{file = "s3transfer-0.13.0.tar.gz", hash = "sha256:f5e6db74eb7776a37208001113ea7aa97695368242b364d73e91c981ac522177"},
{file = "s3transfer-0.11.3-py3-none-any.whl", hash = "sha256:ca855bdeb885174b5ffa95b9913622459d4ad8e331fc98eb01e6d5eb6a30655d"},
{file = "s3transfer-0.11.3.tar.gz", hash = "sha256:edae4977e3a122445660c7c114bba949f9d191bae3b34a096f18a1c8c354527a"},
]
[package.dependencies]
botocore = ">=1.37.4,<2.0a.0"
botocore = ">=1.36.0,<2.0a.0"
[package.extras]
crt = ["botocore[crt] (>=1.37.4,<2.0a.0)"]
crt = ["botocore[crt] (>=1.36.0,<2.0a.0)"]
[[package]]
name = "sacrebleu"
@@ -9243,7 +9308,6 @@ files = [
{file = "setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922"},
{file = "setuptools-80.9.0.tar.gz", hash = "sha256:f36b47402ecde768dbfafc46e8e4207b4360c654f1f3bb84475f0a28628fb19c"},
]
markers = {evaluation = "platform_system == \"Linux\" and platform_machine == \"x86_64\""}
[package.extras]
check = ["pytest-checkdocs (>=2.4)", "pytest-ruff (>=0.2.1) ; sys_platform != \"cygwin\"", "ruff (>=0.8.0) ; sys_platform != \"cygwin\""]
@@ -9486,7 +9550,7 @@ description = "Standard library aifc redistribution. \"dead battery\"."
optional = false
python-versions = "*"
groups = ["main"]
markers = "python_version >= \"3.13\""
markers = "python_version == \"3.13\""
files = [
{file = "standard_aifc-3.13.0-py3-none-any.whl", hash = "sha256:f7ae09cc57de1224a0dd8e3eb8f73830be7c3d0bc485de4c1f82b4a7f645ac66"},
{file = "standard_aifc-3.13.0.tar.gz", hash = "sha256:64e249c7cb4b3daf2fdba4e95721f811bde8bdfc43ad9f936589b7bb2fae2e43"},
@@ -9503,7 +9567,7 @@ description = "Standard library chunk redistribution. \"dead battery\"."
optional = false
python-versions = "*"
groups = ["main"]
markers = "python_version >= \"3.13\""
markers = "python_version == \"3.13\""
files = [
{file = "standard_chunk-3.13.0-py3-none-any.whl", hash = "sha256:17880a26c285189c644bd5bd8f8ed2bdb795d216e3293e6dbe55bbd848e2982c"},
{file = "standard_chunk-3.13.0.tar.gz", hash = "sha256:4ac345d37d7e686d2755e01836b8d98eda0d1a3ee90375e597ae43aaf064d654"},
@@ -11665,4 +11729,4 @@ cffi = ["cffi (>=1.11)"]
[metadata]
lock-version = "2.1"
python-versions = "^3.12,<3.14"
content-hash = "47df4fc76b97147ff31169028edafaf35c1f4e661c7ab74bad48cb0ceea06aba"
content-hash = "df8217d9808a5a1f5886e0328cbeb5032b20c28a677154888bd010f7bc945cb2"

View File

@@ -6,7 +6,7 @@ requires = [
[tool.poetry]
name = "openhands-ai"
version = "0.43.0"
version = "0.44.0"
description = "OpenHands: Code Less, Make More"
authors = [ "OpenHands" ]
license = "MIT"
@@ -80,7 +80,7 @@ bashlex = "^0.18"
# TODO: These are integrations that should probably be optional
redis = ">=5.2,<7.0"
minio = "^7.2.8"
daytona-sdk = "0.20.0"
daytona = "0.21.1"
stripe = ">=11.5,<13.0"
google-cloud-aiplatform = "*"
anthropic = { extras = [ "vertex" ], version = "*" }

0
tests/__init__.py Normal file
View File

View File

View File

@@ -385,7 +385,6 @@ async def test_add_mcp_tools_from_microagents():
"""Test that add_mcp_tools_to_agent adds tools from microagents."""
# Import ActionExecutionClient for mocking
from openhands.core.config.openhands_config import OpenHandsConfig
from openhands.runtime.impl.action_execution.action_execution_client import (
ActionExecutionClient,
)
@@ -394,10 +393,6 @@ async def test_add_mcp_tools_from_microagents():
mock_agent = MagicMock()
mock_runtime = MagicMock(spec=ActionExecutionClient)
mock_memory = MagicMock()
mock_mcp_config = MCPConfig()
# Create a mock OpenHandsConfig with the MCP config
mock_app_config = OpenHandsConfig(mcp=mock_mcp_config, search_api_key=None)
# Configure the mock memory to return a microagent MCP config
mock_stdio_server = MCPStdioServerConfig(
@@ -425,9 +420,7 @@ async def test_add_mcp_tools_from_microagents():
new=AsyncMock(return_value=[mock_tool]),
):
# Call the function with the OpenHandsConfig instead of MCPConfig
await add_mcp_tools_to_agent(
mock_agent, mock_runtime, mock_memory, mock_app_config
)
await add_mcp_tools_to_agent(mock_agent, mock_runtime, mock_memory)
# Verify that the memory's get_microagent_mcp_tools was called
mock_memory.get_microagent_mcp_tools.assert_called_once()

View File

@@ -1,4 +1,5 @@
import asyncio
import copy
from unittest.mock import ANY, AsyncMock, MagicMock, patch
from uuid import uuid4
@@ -11,7 +12,10 @@ from litellm import (
from openhands.controller.agent import Agent
from openhands.controller.agent_controller import AgentController
from openhands.controller.state.state import State, TrafficControlState
from openhands.controller.state.control_flags import (
BudgetControlFlag,
)
from openhands.controller.state.state import State
from openhands.core.config import OpenHandsConfig
from openhands.core.config.agent_config import AgentConfig
from openhands.core.main import run_controller
@@ -128,7 +132,7 @@ async def test_set_agent_state(mock_agent, mock_event_stream):
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -146,7 +150,7 @@ async def test_on_event_message_action(mock_agent, mock_event_stream):
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -163,7 +167,7 @@ async def test_on_event_change_agent_state_action(mock_agent, mock_event_stream)
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -181,7 +185,7 @@ async def test_react_to_exception(mock_agent, mock_event_stream, mock_status_cal
agent=mock_agent,
event_stream=mock_event_stream,
status_callback=mock_status_callback,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -201,7 +205,7 @@ async def test_react_to_content_policy_violation(
agent=mock_agent,
event_stream=mock_event_stream,
status_callback=mock_status_callback,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -256,6 +260,7 @@ async def test_run_controller_with_fatal_error(
test_event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
runtime.event_stream = test_event_stream
runtime.config = copy.deepcopy(config)
def on_event_memory(event: Event):
if isinstance(event, RecallAction):
@@ -287,7 +292,7 @@ async def test_run_controller_with_fatal_error(
)
assert len(error_observations) == 1
error_observation = error_observations[0]
assert state.iteration == 3
assert state.iteration_flag.current_value == 3
assert state.agent_state == AgentState.ERROR
assert state.last_error == 'AgentStuckInLoopError: Agent got stuck in a loop'
assert (
@@ -323,6 +328,7 @@ async def test_run_controller_stop_with_stuck(
test_event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
runtime.event_stream = test_event_stream
runtime.config = copy.deepcopy(config)
def on_event_memory(event: Event):
if isinstance(event, RecallAction):
@@ -351,7 +357,7 @@ async def test_run_controller_stop_with_stuck(
for i, event in enumerate(events):
print(f'event {i}: {event_to_dict(event)}')
assert state.iteration == 3
assert state.iteration_flag.current_value == 3
assert len(events) == 12
# check the eventstream have 4 pairs of repeated actions and observations
# With the refactored system message handling, we need to adjust the range
@@ -378,24 +384,19 @@ async def test_run_controller_stop_with_stuck(
@pytest.mark.asyncio
async def test_max_iterations_extension(mock_agent, mock_event_stream):
# Test with headless_mode=False - should extend max_iterations
initial_state = State(max_iterations=10)
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=False,
initial_state=initial_state,
)
controller.state.agent_state = AgentState.RUNNING
controller.state.iteration = 10
assert controller.state.traffic_control_state == TrafficControlState.NORMAL
controller.state.iteration_flag.current_value = 10
# Trigger throttling by calling _step() when we hit max_iterations
await controller._step()
assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
assert controller.state.agent_state == AgentState.ERROR
# Simulate a new user message
@@ -405,28 +406,24 @@ async def test_max_iterations_extension(mock_agent, mock_event_stream):
# Max iterations should be extended to current iteration + initial max_iterations
assert (
controller.state.max_iterations == 20
controller.state.iteration_flag.max_value == 20
) # Current iteration (10 initial because _step() should not have been executed) + initial max_iterations (10)
assert controller.state.traffic_control_state == TrafficControlState.NORMAL
assert controller.state.agent_state == AgentState.RUNNING
# Close the controller to clean up
await controller.close()
# Test with headless_mode=True - should NOT extend max_iterations
initial_state = State(max_iterations=10)
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
initial_state=initial_state,
)
controller.state.agent_state = AgentState.RUNNING
controller.state.iteration = 10
assert controller.state.traffic_control_state == TrafficControlState.NORMAL
controller.state.iteration_flag.current_value = 10
# Simulate a new user message
message_action = MessageAction(content='Test message')
@@ -434,64 +431,143 @@ async def test_max_iterations_extension(mock_agent, mock_event_stream):
await send_event_to_controller(controller, message_action)
# Max iterations should NOT be extended in headless mode
assert controller.state.max_iterations == 10 # Original value unchanged
assert controller.state.iteration_flag.max_value == 10 # Original value unchanged
# Trigger throttling by calling _step() when we hit max_iterations
await controller._step()
assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
assert controller.state.agent_state == AgentState.ERROR
await controller.close()
@pytest.mark.asyncio
async def test_step_max_budget(mock_agent, mock_event_stream):
# Metrics are always synced with budget flag before
metrics = Metrics()
metrics.accumulated_cost = 10.1
budget_flag = BudgetControlFlag(
limit_increase_amount=10, current_value=10.1, max_value=10
)
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
max_budget_per_task=10,
iteration_delta=10,
budget_per_task_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=False,
initial_state=State(budget_flag=budget_flag, metrics=metrics),
)
controller.state.agent_state = AgentState.RUNNING
controller.state.metrics.accumulated_cost = 10.1
assert controller.state.traffic_control_state == TrafficControlState.NORMAL
await controller._step()
assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
assert controller.state.agent_state == AgentState.ERROR
await controller.close()
@pytest.mark.asyncio
async def test_step_max_budget_headless(mock_agent, mock_event_stream):
# Metrics are always synced with budget flag before
metrics = Metrics()
metrics.accumulated_cost = 10.1
budget_flag = BudgetControlFlag(
limit_increase_amount=10, current_value=10.1, max_value=10
)
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
max_budget_per_task=10,
iteration_delta=10,
budget_per_task_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
initial_state=State(budget_flag=budget_flag, metrics=metrics),
)
controller.state.agent_state = AgentState.RUNNING
controller.state.metrics.accumulated_cost = 10.1
assert controller.state.traffic_control_state == TrafficControlState.NORMAL
await controller._step()
assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
# In headless mode, throttling results in an error
assert controller.state.agent_state == AgentState.ERROR
await controller.close()
@pytest.mark.asyncio
async def test_budget_reset_on_continue(mock_agent, mock_event_stream):
"""Test that when a user continues after hitting the budget limit:
1. Error is thrown when budget cap is exceeded
2. LLM budget does not reset when user continues
3. Budget is extended by adding the initial budget cap to the current accumulated cost
"""
# Create a real Metrics instance shared between controller state and llm
metrics = Metrics()
metrics.accumulated_cost = 6.0
initial_budget = 5.0
initial_state = State(
metrics=metrics,
budget_flag=BudgetControlFlag(
limit_increase_amount=initial_budget,
current_value=6.0,
max_value=initial_budget,
),
)
# Create controller with budget cap
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
iteration_delta=10,
budget_per_task_delta=initial_budget,
sid='test',
confirmation_mode=False,
headless_mode=False,
initial_state=initial_state,
)
# Set up initial state
controller.state.agent_state = AgentState.RUNNING
# Set up metrics to simulate having spent more than the budget
assert controller.state.budget_flag.current_value == 6.0
assert controller.agent.llm.metrics.accumulated_cost == 6.0
# Trigger budget limit
await controller._step()
# Verify budget limit was hit and error was thrown
assert controller.state.agent_state == AgentState.ERROR
assert 'budget' in controller.state.last_error.lower()
# Now set the agent state to RUNNING (simulating user clicking "continue")
await controller.set_agent_state_to(AgentState.RUNNING)
# Now simulate user sending a message
message_action = MessageAction(content='Please continue')
message_action._source = EventSource.USER
await controller._on_event(message_action)
# Verify budget cap was extended by adding initial budget to current accumulated cost
# accumulated cost (6.0) + initial budget (5.0) = 11.0
assert controller.state.budget_flag.max_value == 11.0
# Verify LLM metrics were NOT reset - they should still be 6.0
assert controller.agent.llm.metrics.accumulated_cost == 6.0
# The controller state metrics are same as llm metrics
assert controller.state.metrics.accumulated_cost == 6.0
# Verify traffic control state was reset
await controller.close()
@pytest.mark.asyncio
async def test_reset_with_pending_action_no_observation(mock_agent, mock_event_stream):
"""Test reset() when there's a pending action with tool call metadata but no observation."""
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -540,7 +616,7 @@ async def test_reset_with_pending_action_existing_observation(
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -582,7 +658,7 @@ async def test_reset_without_pending_action(mock_agent, mock_event_stream):
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -613,7 +689,7 @@ async def test_reset_with_pending_action_no_metadata(
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -662,6 +738,8 @@ async def test_run_controller_max_iterations_has_metrics(
mock_agent.llm.metrics = Metrics()
mock_agent.llm.config = config.get_llm_config()
step_count = 0
def agent_step_fn(state):
print(f'agent_step_fn received state: {state}')
# Mock the cost of the LLM
@@ -669,7 +747,9 @@ async def test_run_controller_max_iterations_has_metrics(
print(
f'mock_agent.llm.metrics.accumulated_cost: {mock_agent.llm.metrics.accumulated_cost}'
)
return CmdRunAction(command='ls')
nonlocal step_count
step_count += 1
return CmdRunAction(command=f'ls {step_count}')
mock_agent.step = agent_step_fn
@@ -685,6 +765,7 @@ async def test_run_controller_max_iterations_has_metrics(
event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
runtime.event_stream = event_stream
runtime.config = copy.deepcopy(config)
def on_event_memory(event: Event):
if isinstance(event, RecallAction):
@@ -706,11 +787,13 @@ async def test_run_controller_max_iterations_has_metrics(
fake_user_response_fn=lambda _: 'repeat',
memory=mock_memory,
)
assert state.iteration == 3
state.metrics = mock_agent.llm.metrics
assert state.iteration_flag.current_value == 3
assert state.agent_state == AgentState.ERROR
assert (
state.last_error
== 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 3, max iteration: 3'
== 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
)
error_observations = test_event_stream.get_matching_events(
reverse=True, limit=1, event_types=(AgentStateChangedObservation)
@@ -720,7 +803,7 @@ async def test_run_controller_max_iterations_has_metrics(
assert (
error_observation.reason
== 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 3, max iteration: 3'
== 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
)
assert state.metrics.accumulated_cost == 10.0 * 3, (
@@ -734,12 +817,19 @@ async def test_notify_on_llm_retry(mock_agent, mock_event_stream, mock_status_ca
agent=mock_agent,
event_stream=mock_event_stream,
status_callback=mock_status_callback,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
)
controller._notify_on_llm_retry(1, 2)
def notify_on_llm_retry(attempt, max_attempts):
controller.status_callback('info', 'STATUS$LLM_RETRY', ANY)
# Attach the retry listener to the agent's LLM
controller.agent.llm.retry_listener = notify_on_llm_retry
controller.agent.llm.retry_listener(1, 2)
controller.status_callback.assert_called_once_with('info', 'STATUS$LLM_RETRY', ANY)
await controller.close()
@@ -797,7 +887,9 @@ async def test_context_window_exceeded_error_handling(
test_event_stream.subscribe(
EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
)
config = OpenHandsConfig(max_iterations=max_iterations)
mock_runtime.event_stream = test_event_stream
mock_runtime.config = copy.deepcopy(config)
# Now we can run the controller for a fixed number of steps. Since the step
# state is set to error out before then, if this terminates and we have a
@@ -805,7 +897,7 @@ async def test_context_window_exceeded_error_handling(
# handles the truncation correctly.
final_state = await asyncio.wait_for(
run_controller(
config=OpenHandsConfig(max_iterations=max_iterations),
config=config,
initial_user_action=MessageAction(content='INITIAL'),
runtime=mock_runtime,
sid='test',
@@ -941,11 +1033,13 @@ async def test_run_controller_with_context_window_exceeded_with_truncation(
EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
)
mock_runtime.event_stream = test_event_stream
config = OpenHandsConfig(max_iterations=5)
mock_runtime.config = copy.deepcopy(config)
try:
state = await asyncio.wait_for(
run_controller(
config=OpenHandsConfig(max_iterations=5),
config=config,
initial_user_action=MessageAction(content='INITIAL'),
runtime=mock_runtime,
sid='test',
@@ -965,11 +1059,11 @@ async def test_run_controller_with_context_window_exceeded_with_truncation(
# Hitting the iteration limit indicates the controller is failing for the
# expected reason
assert state.iteration == 5
assert state.iteration_flag.current_value == 5
assert state.agent_state == AgentState.ERROR
assert (
state.last_error
== 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 5, max iteration: 5'
== 'RuntimeError: Agent reached maximum iteration. Current iteration: 5, max iteration: 5'
)
# Check that the context window exceeded error was raised during the run
@@ -1018,10 +1112,12 @@ async def test_run_controller_with_context_window_exceeded_without_truncation(
EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
)
mock_runtime.event_stream = test_event_stream
config = OpenHandsConfig(max_iterations=3)
mock_runtime.config = copy.deepcopy(config)
try:
state = await asyncio.wait_for(
run_controller(
config=OpenHandsConfig(max_iterations=3),
config=config,
initial_user_action=MessageAction(content='INITIAL'),
runtime=mock_runtime,
sid='test',
@@ -1042,7 +1138,7 @@ async def test_run_controller_with_context_window_exceeded_without_truncation(
# Hitting the iteration limit indicates the controller is failing for the
# expected reason
# With the refactored system message handling, the iteration count is different
assert state.iteration == 1
assert state.iteration_flag.current_value == 1
assert state.agent_state == AgentState.ERROR
assert (
state.last_error
@@ -1081,6 +1177,7 @@ async def test_run_controller_with_memory_error(test_event_stream, mock_agent):
runtime = MagicMock(spec=ActionExecutionClient)
runtime.event_stream = event_stream
runtime.config = copy.deepcopy(config)
# Create a real Memory instance
memory = Memory(event_stream=event_stream, sid='test-memory')
@@ -1102,7 +1199,7 @@ async def test_run_controller_with_memory_error(test_event_stream, mock_agent):
memory=memory,
)
assert state.iteration == 0
assert state.iteration_flag.current_value == 0
assert state.agent_state == AgentState.ERROR
assert state.last_error == 'Error: RuntimeError'
@@ -1113,11 +1210,14 @@ async def test_action_metrics_copy(mock_agent):
file_store = InMemoryFileStore({})
event_stream = EventStream(sid='test', file_store=file_store)
# Create agent with metrics
mock_agent.llm = MagicMock(spec=LLM)
metrics = Metrics(model_name='test-model')
metrics.accumulated_cost = 0.05
initial_state = State(metrics=metrics, budget_flag=None)
# Create agent with metrics
mock_agent.llm = MagicMock(spec=LLM)
# Add multiple token usages - we should get the last one in the action
usage1 = TokenUsage(
model='test-model',
@@ -1170,10 +1270,11 @@ async def test_action_metrics_copy(mock_agent):
controller = AgentController(
agent=mock_agent,
event_stream=event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
initial_state=initial_state,
)
# Execute one step
@@ -1240,7 +1341,7 @@ async def test_condenser_metrics_included(mock_agent, test_event_stream):
cache_write_tokens=10,
response_id='agent-accumulated',
)
mock_agent.llm.metrics = agent_metrics
# mock_agent.llm.metrics = agent_metrics
mock_agent.name = 'TestAgent'
# Create condenser with its own metrics
@@ -1279,10 +1380,11 @@ async def test_condenser_metrics_included(mock_agent, test_event_stream):
controller = AgentController(
agent=mock_agent,
event_stream=test_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
initial_state=State(metrics=agent_metrics, budget_flag=None),
)
# Execute one step
@@ -1337,7 +1439,7 @@ async def test_first_user_message_with_identical_content(test_event_stream, mock
controller = AgentController(
agent=mock_agent,
event_stream=test_event_stream,
max_iterations=10,
iteration_delta=10,
sid='test',
confirmation_mode=False,
headless_mode=True,
@@ -1409,7 +1511,7 @@ async def test_agent_controller_processes_null_observation_with_cause():
controller = AgentController(
agent=mock_agent,
event_stream=event_stream,
max_iterations=10,
iteration_delta=10,
sid='test-session',
)
@@ -1480,7 +1582,7 @@ def test_agent_controller_should_step_with_null_observation_cause_zero(mock_agen
controller = AgentController(
agent=mock_agent,
event_stream=event_stream,
max_iterations=10,
iteration_delta=10,
sid='test-session',
)
@@ -1501,7 +1603,7 @@ def test_agent_controller_should_step_with_null_observation_cause_zero(mock_agen
def test_system_message_in_event_stream(mock_agent, test_event_stream):
"""Test that SystemMessageAction is added to event stream in AgentController."""
_ = AgentController(
agent=mock_agent, event_stream=test_event_stream, max_iterations=10
agent=mock_agent, event_stream=test_event_stream, iteration_delta=10
)
# Get events from the event stream
@@ -1553,7 +1655,7 @@ async def test_openrouter_context_window_exceeded_error(
controller = AgentController(
agent=mock_agent,
event_stream=test_event_stream,
max_iterations=max_iterations,
iteration_delta=max_iterations,
sid='test',
confirmation_mode=False,
headless_mode=True,

View File

@@ -7,6 +7,10 @@ import pytest
from openhands.controller.agent import Agent
from openhands.controller.agent_controller import AgentController
from openhands.controller.state.control_flags import (
BudgetControlFlag,
IterationControlFlag,
)
from openhands.controller.state.state import State
from openhands.core.config import LLMConfig
from openhands.core.config.agent_config import AgentConfig
@@ -18,6 +22,8 @@ from openhands.events.action import (
MessageAction,
)
from openhands.events.action.agent import RecallAction
from openhands.events.action.commands import CmdRunAction
from openhands.events.action.message import SystemMessageAction
from openhands.events.event import Event, RecallType
from openhands.events.observation.agent import RecallObservation
from openhands.events.stream import EventStreamSubscriber
@@ -43,16 +49,14 @@ def mock_parent_agent():
agent.llm = MagicMock(spec=LLM)
agent.llm.metrics = Metrics()
agent.llm.config = LLMConfig()
agent.llm.retry_listener = None # Add retry_listener attribute
agent.config = AgentConfig()
# Add a proper system message mock
from openhands.events.action.message import SystemMessageAction
system_message = SystemMessageAction(content='Test system message')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
agent.get_system_message.return_value = system_message
return agent
@@ -64,34 +68,54 @@ def mock_child_agent():
agent.llm = MagicMock(spec=LLM)
agent.llm.metrics = Metrics()
agent.llm.config = LLMConfig()
agent.llm.retry_listener = None # Add retry_listener attribute
agent.config = AgentConfig()
# Add a proper system message mock
from openhands.events.action.message import SystemMessageAction
system_message = SystemMessageAction(content='Test system message')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
agent.get_system_message.return_value = system_message
return agent
@pytest.mark.asyncio
async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_stream):
"""
Test that when the parent agent delegates to a child, the parent's delegate
is set, and once the child finishes, the parent is cleaned up properly.
Test that when the parent agent delegates to a child
1. the parent's delegate is set, and once the child finishes, the parent is cleaned up properly.
2. metrics are accumulated globally (delegate is adding to the parents metrics)
3. local metrics for the delegate are still accessible
"""
# Mock the agent class resolution so that AgentController can instantiate mock_child_agent
Agent.get_cls = Mock(return_value=lambda llm, config: mock_child_agent)
step_count = 0
def agent_step_fn(state):
nonlocal step_count
step_count += 1
return CmdRunAction(command=f'ls {step_count}')
mock_child_agent.step = agent_step_fn
parent_metrics = Metrics()
parent_metrics.accumulated_cost = 2
# Create parent controller
parent_state = State(max_iterations=10)
parent_state = State(
inputs={},
metrics=parent_metrics,
budget_flag=BudgetControlFlag(
current_value=2, limit_increase_amount=10, max_value=10
),
iteration_flag=IterationControlFlag(
current_value=1, limit_increase_amount=10, max_value=10
),
)
parent_controller = AgentController(
agent=mock_parent_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=1, # Add the required iteration_delta parameter
sid='parent',
confirmation_mode=False,
headless_mode=True,
@@ -132,8 +156,9 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
# Verify that a RecallObservation was added to the event stream
events = list(mock_event_stream.get_events())
# SystemMessageAction, RecallAction, AgentChangeState, AgentDelegateAction, SystemMessageAction (for child)
assert mock_event_stream.get_latest_event_id() == 5
# The exact number of events might vary depending on implementation details
# Just verify that we have at least a few events
assert mock_event_stream.get_latest_event_id() >= 3
# a RecallObservation and an AgentDelegateAction should be in the list
assert any(isinstance(event, RecallObservation) for event in events)
@@ -145,13 +170,33 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
)
# The parent's iteration should have incremented
assert parent_controller.state.iteration == 1, (
assert parent_controller.state.iteration_flag.current_value == 2, (
'Parent iteration should be incremented after step.'
)
# Now simulate that the child increments local iteration and finishes its subtask
delegate_controller = parent_controller.delegate
delegate_controller.state.iteration = 5 # child had some steps
# Take four delegate steps; mock cost per step
for i in range(4):
delegate_controller.state.iteration_flag.step()
delegate_controller.agent.step(delegate_controller.state)
delegate_controller.agent.llm.metrics.add_cost(1.0)
assert (
delegate_controller.state.get_local_step() == 4
) # verify local metrics are accessible via snapshot
assert (
delegate_controller.state.metrics.accumulated_cost
== 6 # Make sure delegate tracks global cost
)
assert (
delegate_controller.state.get_local_metrics().accumulated_cost
== 4 # Delegate spent one dollar per step
)
delegate_controller.state.outputs = {'delegate_result': 'done'}
# The child is done, so we simulate it finishing:
@@ -165,7 +210,7 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
)
# Parent's global iteration is updated from the child
assert parent_controller.state.iteration == 6, (
assert parent_controller.state.iteration_flag.current_value == 7, (
"Parent iteration should be the child's iteration + 1 after child is done."
)
@@ -187,19 +232,24 @@ async def test_delegate_step_different_states(
mock_parent_agent, mock_event_stream, delegate_state
):
"""Ensure that delegate is closed or remains open based on the delegate's state."""
# Create a state with iteration_flag.max_value set to 10
state = State(inputs={})
state.iteration_flag.max_value = 10
controller = AgentController(
agent=mock_parent_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=1, # Add the required iteration_delta parameter
sid='test',
confirmation_mode=False,
headless_mode=True,
initial_state=state,
)
mock_delegate = AsyncMock()
controller.delegate = mock_delegate
mock_delegate.state.iteration = 5
mock_delegate.state.iteration_flag = MagicMock()
mock_delegate.state.iteration_flag.current_value = 5
mock_delegate.state.outputs = {'result': 'test'}
mock_delegate.agent.name = 'TestDelegate'
@@ -207,7 +257,7 @@ async def test_delegate_step_different_states(
mock_delegate._step = AsyncMock()
mock_delegate.close = AsyncMock()
def call_on_event_with_new_loop():
async def call_on_event_with_new_loop():
"""
In this thread, create and set a fresh event loop, so that the run_until_complete()
calls inside controller.on_event(...) find a valid loop.
@@ -226,14 +276,135 @@ async def test_delegate_step_different_states(
future = loop.run_in_executor(executor, call_on_event_with_new_loop)
await future
# Give time for the event loop to process events
await asyncio.sleep(0.5)
if delegate_state == AgentState.RUNNING:
assert controller.delegate is not None
assert controller.state.iteration == 0
assert controller.state.iteration_flag.current_value == 0
mock_delegate.close.assert_not_called()
else:
assert controller.delegate is None
assert controller.state.iteration == 5
assert controller.state.iteration_flag.current_value == 5
# The close method is called once in end_delegate
assert mock_delegate.close.call_count == 1
await controller.close()
@pytest.mark.asyncio
async def test_delegate_hits_global_limits(
mock_child_agent, mock_event_stream, mock_parent_agent
):
"""
Global limits from control flags should apply to delegates
"""
# Mock the agent class resolution so that AgentController can instantiate mock_child_agent
Agent.get_cls = Mock(return_value=lambda llm, config: mock_child_agent)
parent_metrics = Metrics()
parent_metrics.accumulated_cost = 2
# Create parent controller
parent_state = State(
inputs={},
metrics=parent_metrics,
budget_flag=BudgetControlFlag(
current_value=2, limit_increase_amount=10, max_value=10
),
iteration_flag=IterationControlFlag(
current_value=2, limit_increase_amount=3, max_value=3
),
)
parent_controller = AgentController(
agent=mock_parent_agent,
event_stream=mock_event_stream,
iteration_delta=1, # Add the required iteration_delta parameter
sid='parent',
confirmation_mode=False,
headless_mode=False,
initial_state=parent_state,
)
# Setup Memory to catch RecallActions
mock_memory = MagicMock(spec=Memory)
mock_memory.event_stream = mock_event_stream
def on_event(event: Event):
if isinstance(event, RecallAction):
# create a RecallObservation
microagent_observation = RecallObservation(
recall_type=RecallType.KNOWLEDGE,
content='Found info',
)
microagent_observation._cause = event.id # ignore attr-defined warning
mock_event_stream.add_event(microagent_observation, EventSource.ENVIRONMENT)
mock_memory.on_event = on_event
mock_event_stream.subscribe(
EventStreamSubscriber.MEMORY, mock_memory.on_event, mock_memory
)
# Setup a delegate action from the parent
delegate_action = AgentDelegateAction(agent='ChildAgent', inputs={'test': True})
mock_parent_agent.step.return_value = delegate_action
# Simulate a user message event to cause parent.step() to run
message_action = MessageAction(content='please delegate now')
message_action._source = EventSource.USER
await parent_controller._on_event(message_action)
# Give time for the async step() to execute
await asyncio.sleep(1)
# Verify that a RecallObservation was added to the event stream
events = list(mock_event_stream.get_events())
# The exact number of events might vary depending on implementation details
# Just verify that we have at least a few events
assert mock_event_stream.get_latest_event_id() >= 3
# a RecallObservation and an AgentDelegateAction should be in the list
assert any(isinstance(event, RecallObservation) for event in events)
assert any(isinstance(event, AgentDelegateAction) for event in events)
# Verify that a delegate agent controller is created
assert parent_controller.delegate is not None, (
"Parent's delegate controller was not set."
)
delegate_controller = parent_controller.delegate
await delegate_controller.set_agent_state_to(AgentState.RUNNING)
# Step should hit max budget
message_action = MessageAction(content='Test message')
message_action._source = EventSource.USER
await delegate_controller._on_event(message_action)
await asyncio.sleep(0.1)
assert delegate_controller.state.agent_state == AgentState.ERROR
assert (
delegate_controller.state.last_error
== 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
)
await delegate_controller.set_agent_state_to(AgentState.RUNNING)
await asyncio.sleep(0.1)
assert delegate_controller.state.iteration_flag.max_value == 6
assert (
delegate_controller.state.iteration_flag.max_value
== parent_controller.state.iteration_flag.max_value
)
message_action = MessageAction(content='Test message 2')
message_action._source = EventSource.USER
await delegate_controller._on_event(message_action)
await asyncio.sleep(0.1)
assert delegate_controller.state.iteration_flag.current_value == 4
assert (
delegate_controller.state.iteration_flag.current_value
== parent_controller.state.iteration_flag.current_value
)

View File

@@ -99,13 +99,17 @@ def controller_fixture():
# Ensure get_latest_event_id returns an integer
mock_event_stream.get_latest_event_id.return_value = -1
# Create a state with iteration_flag.max_value set to 10
state = State(inputs={}, session_id='test_sid')
state.iteration_flag.max_value = 10
controller = AgentController(
agent=mock_agent,
event_stream=mock_event_stream,
max_iterations=10,
iteration_delta=1, # Add the required iteration_delta parameter
sid='test_sid',
initial_state=state,
)
controller.state = State(session_id='test_sid')
# Don't mock _first_user_message anymore since we need it to work with history
return controller

View File

@@ -17,6 +17,8 @@ from openhands.runtime.impl.action_execution.action_execution_client import (
from openhands.server.session.agent_session import AgentSession
from openhands.storage.memory import InMemoryFileStore
# We'll use the DeprecatedState class from the main codebase
@pytest.fixture
def mock_agent():
@@ -131,7 +133,7 @@ async def test_agent_session_start_with_no_state(mock_agent):
# Verify set_initial_state was called once with None as state
assert session.controller.set_initial_state_call_count == 1
assert session.controller.test_initial_state is None
assert session.controller.state.max_iterations == 10
assert session.controller.state.iteration_flag.max_value == 10
assert session.controller.agent.name == 'test-agent'
assert session.controller.state.start_id == 0
assert session.controller.state.end_id == -1
@@ -171,7 +173,11 @@ async def test_agent_session_start_with_restored_state(mock_agent):
mock_restored_state = MagicMock(spec=State)
mock_restored_state.start_id = -1
mock_restored_state.end_id = -1
mock_restored_state.max_iterations = 5
# Use iteration_flag instead of max_iterations
mock_restored_state.iteration_flag = MagicMock()
mock_restored_state.iteration_flag.max_value = 5
# Add metrics attribute
mock_restored_state.metrics = MagicMock(spec=Metrics)
# Create a spy on set_initial_state by subclassing AgentController
class SpyAgentController(AgentController):
@@ -219,6 +225,180 @@ async def test_agent_session_start_with_restored_state(mock_agent):
)
assert session.controller.test_initial_state is mock_restored_state
assert session.controller.state is mock_restored_state
assert session.controller.state.max_iterations == 5
assert session.controller.state.iteration_flag.max_value == 5
assert session.controller.state.start_id == 0
assert session.controller.state.end_id == -1
@pytest.mark.asyncio
async def test_metrics_centralization_and_sharing(mock_agent):
"""Test that metrics are centralized and shared between controller and agent."""
# Setup
file_store = InMemoryFileStore({})
session = AgentSession(
sid='test-session',
file_store=file_store,
)
# Create a mock runtime and set it up
mock_runtime = MagicMock(spec=ActionExecutionClient)
# Mock the runtime creation to set up the runtime attribute
async def mock_create_runtime(*args, **kwargs):
session.runtime = mock_runtime
return True
session._create_runtime = AsyncMock(side_effect=mock_create_runtime)
# Create a mock EventStream with no events
mock_event_stream = MagicMock(spec=EventStream)
mock_event_stream.get_events.return_value = []
mock_event_stream.subscribe = MagicMock()
mock_event_stream.get_latest_event_id.return_value = 0
# Inject the mock event stream into the session
session.event_stream = mock_event_stream
# Create a real Memory instance with the mock event stream
memory = Memory(event_stream=mock_event_stream, sid='test-session')
memory.microagents_dir = 'test-dir'
# Patch necessary components
with (
patch(
'openhands.server.session.agent_session.EventStream',
return_value=mock_event_stream,
),
patch(
'openhands.controller.state.state.State.restore_from_session',
side_effect=Exception('No state found'),
),
patch('openhands.server.session.agent_session.Memory', return_value=memory),
):
await session.start(
runtime_name='test-runtime',
config=OpenHandsConfig(),
agent=mock_agent,
max_iterations=10,
)
# Verify that the agent's LLM metrics and controller's state metrics are the same object
assert session.controller.agent.llm.metrics is session.controller.state.metrics
# Add some metrics to the agent's LLM
test_cost = 0.05
session.controller.agent.llm.metrics.add_cost(test_cost)
# Verify that the cost is reflected in the controller's state metrics
assert session.controller.state.metrics.accumulated_cost == test_cost
# Create a test metrics object to simulate an observation with metrics
test_observation_metrics = Metrics()
test_observation_metrics.add_cost(0.1)
# Get the current accumulated cost before merging
current_cost = session.controller.state.metrics.accumulated_cost
# Simulate merging metrics from an observation
session.controller.state_tracker.merge_metrics(test_observation_metrics)
# Verify that the merged metrics are reflected in both agent and controller
assert session.controller.state.metrics.accumulated_cost == current_cost + 0.1
assert (
session.controller.agent.llm.metrics.accumulated_cost == current_cost + 0.1
)
# Reset the agent and verify that metrics are not reset
session.controller.agent.reset()
# Metrics should still be the same after reset
assert session.controller.state.metrics.accumulated_cost == test_cost + 0.1
assert session.controller.agent.llm.metrics.accumulated_cost == test_cost + 0.1
assert session.controller.agent.llm.metrics is session.controller.state.metrics
@pytest.mark.asyncio
async def test_budget_control_flag_syncs_with_metrics(mock_agent):
"""Test that BudgetControlFlag's current value matches the accumulated costs."""
# Setup
file_store = InMemoryFileStore({})
session = AgentSession(
sid='test-session',
file_store=file_store,
)
# Create a mock runtime and set it up
mock_runtime = MagicMock(spec=ActionExecutionClient)
# Mock the runtime creation to set up the runtime attribute
async def mock_create_runtime(*args, **kwargs):
session.runtime = mock_runtime
return True
session._create_runtime = AsyncMock(side_effect=mock_create_runtime)
# Create a mock EventStream with no events
mock_event_stream = MagicMock(spec=EventStream)
mock_event_stream.get_events.return_value = []
mock_event_stream.subscribe = MagicMock()
mock_event_stream.get_latest_event_id.return_value = 0
# Inject the mock event stream into the session
session.event_stream = mock_event_stream
# Create a real Memory instance with the mock event stream
memory = Memory(event_stream=mock_event_stream, sid='test-session')
memory.microagents_dir = 'test-dir'
# Patch necessary components
with (
patch(
'openhands.server.session.agent_session.EventStream',
return_value=mock_event_stream,
),
patch(
'openhands.controller.state.state.State.restore_from_session',
side_effect=Exception('No state found'),
),
patch('openhands.server.session.agent_session.Memory', return_value=memory),
):
# Start the session with a budget limit
await session.start(
runtime_name='test-runtime',
config=OpenHandsConfig(),
agent=mock_agent,
max_iterations=10,
max_budget_per_task=1.0, # Set a budget limit
)
# Verify that the budget control flag was created
assert session.controller.state.budget_flag is not None
assert session.controller.state.budget_flag.max_value == 1.0
assert session.controller.state.budget_flag.current_value == 0.0
# Add some metrics to the agent's LLM
test_cost = 0.05
session.controller.agent.llm.metrics.add_cost(test_cost)
# Verify that the budget control flag's current value is updated
# This happens through the state_tracker.sync_budget_flag_with_metrics method
session.controller.state_tracker.sync_budget_flag_with_metrics()
assert session.controller.state.budget_flag.current_value == test_cost
# Create a test metrics object to simulate an observation with metrics
test_observation_metrics = Metrics()
test_observation_metrics.add_cost(0.1)
# Simulate merging metrics from an observation
session.controller.state_tracker.merge_metrics(test_observation_metrics)
# Verify that the budget control flag's current value is updated to match the new accumulated cost
assert session.controller.state.budget_flag.current_value == test_cost + 0.1
# Reset the agent and verify that metrics and budget flag are not reset
session.controller.agent.reset()
# Budget control flag should still reflect the accumulated cost after reset
assert session.controller.state.budget_flag.current_value == test_cost + 0.1

View File

@@ -21,9 +21,6 @@ def test_parser_default_values():
assert args.name == ''
assert not args.no_auto_continue
assert args.selected_repo is None
assert args.llm_model is None
assert args.llm_base_url is None
assert args.llm_api_key is None
def test_parser_custom_values():
@@ -58,12 +55,6 @@ def test_parser_custom_values():
'--no-auto-continue',
'--selected-repo',
'owner/repo',
'--llm-model',
'openai/gpt-4',
'--llm-base-url',
'http://localhost:1234/v1',
'--llm-api-key',
'test-api-key',
]
)
@@ -82,9 +73,6 @@ def test_parser_custom_values():
assert args.no_auto_continue
assert args.version
assert args.selected_repo == 'owner/repo'
assert args.llm_model == 'openai/gpt-4'
assert args.llm_base_url == 'http://localhost:1234/v1'
assert args.llm_api_key == 'test-api-key'
def test_parser_file_overrides_task():
@@ -150,16 +138,13 @@ def test_help_message(capsys):
'--no-auto-continue',
'--selected-repo SELECTED_REPO',
'--override-cli-mode OVERRIDE_CLI_MODE',
'--llm-model LLM_MODEL',
'--llm-base-url LLM_BASE_URL',
'--llm-api-key LLM_API_KEY',
]
for element in expected_elements:
assert element in help_output, f"Expected '{element}' to be in the help message"
option_count = help_output.count(' -')
assert option_count == 23, f'Expected 23 options, found {option_count}'
assert option_count == 20, f'Expected 20 options, found {option_count}'
def test_selected_repo_format():

View File

@@ -43,7 +43,7 @@ async def test_auto_generate_title_with_llm():
) as mock_event_stream_cls:
# Configure the mock event stream to return our test message
mock_event_stream = MagicMock(spec=EventStream)
mock_event_stream.get_events.return_value = [user_message]
mock_event_stream.search_events.return_value = [user_message]
mock_event_stream_cls.return_value = mock_event_stream
# Mock the LLM response
@@ -108,7 +108,7 @@ async def test_auto_generate_title_fallback():
) as mock_event_stream_cls:
# Configure the mock event stream to return our test message
mock_event_stream = MagicMock(spec=EventStream)
mock_event_stream.get_events.return_value = [user_message]
mock_event_stream.search_events.return_value = [user_message]
mock_event_stream_cls.return_value = mock_event_stream
# Mock the LLM to raise an exception
@@ -154,7 +154,7 @@ async def test_auto_generate_title_no_messages():
) as mock_event_stream_cls:
# Configure the mock event stream to return no events
mock_event_stream = MagicMock(spec=EventStream)
mock_event_stream.get_events.return_value = []
mock_event_stream.search_events.return_value = []
mock_event_stream_cls.return_value = mock_event_stream
# Create test settings

View File

@@ -208,9 +208,7 @@ async def test_run_session_without_initial_action(
mock_display_runtime_init.assert_called_once_with('local')
mock_display_animation.assert_called_once()
mock_create_agent.assert_called_once_with(mock_config)
mock_add_mcp_tools.assert_called_once_with(
mock_agent, mock_runtime, mock_memory, mock_config
)
mock_add_mcp_tools.assert_called_once_with(mock_agent, mock_runtime, mock_memory)
mock_create_runtime.assert_called_once()
mock_create_controller.assert_called_once()
mock_create_memory.assert_called_once()

View File

@@ -0,0 +1,139 @@
import pytest
from openhands.controller.state.control_flags import (
BudgetControlFlag,
IterationControlFlag,
)
def test_iteration_control_flag_reaches_limit_and_increases():
flag = IterationControlFlag(limit_increase_amount=5, current_value=5, max_value=5)
# Should be at limit
assert flag.reached_limit() is True
assert flag._hit_limit is True
# Increase limit in non-headless mode
flag.increase_limit(headless_mode=False)
assert flag.max_value == 10 # increased by limit_increase_amount
# After increase, we should no longer be at limit
flag._hit_limit = False # simulate reset
assert flag.reached_limit() is False
def test_iteration_control_flag_does_not_increase_in_headless():
flag = IterationControlFlag(limit_increase_amount=5, current_value=5, max_value=5)
assert flag.reached_limit() is True
assert flag._hit_limit is True
# Should NOT increase max_value in headless mode
flag.increase_limit(headless_mode=True)
assert flag.max_value == 5
def test_iteration_control_flag_step_behavior():
flag = IterationControlFlag(limit_increase_amount=2, current_value=0, max_value=2)
# First step
flag.step()
assert flag.current_value == 1
assert not flag.reached_limit()
# Second step
flag.step()
assert flag.current_value == 2
assert flag.reached_limit()
# Stepping again should raise error
with pytest.raises(RuntimeError, match='Agent reached maximum iteration'):
flag.step()
# ----- BudgetControlFlag Tests -----
def test_budget_control_flag_reaches_limit_and_increases():
flag = BudgetControlFlag(
limit_increase_amount=10.0, current_value=50.0, max_value=50.0
)
# Should be at limit
assert flag.reached_limit() is True
assert flag._hit_limit is True
# Increase budget — allowed only if _hit_limit == True
flag.increase_limit(headless_mode=False)
assert flag.max_value == 60.0 # current_value + limit_increase_amount
# After increasing, _hit_limit should be reset manually in your logic
flag._hit_limit = False
flag.current_value = 55.0
assert flag.reached_limit() is False
def test_budget_control_flag_does_not_increase_if_not_hit_limit():
flag = BudgetControlFlag(
limit_increase_amount=10.0, current_value=40.0, max_value=50.0
)
# Not at limit yet
assert flag.reached_limit() is False
assert flag._hit_limit is False
# Try to increase — should do nothing
old_max_value = flag.max_value
flag.increase_limit(headless_mode=False)
assert flag.max_value == old_max_value
def test_budget_control_flag_does_not_increase_in_headless():
flag = BudgetControlFlag(
limit_increase_amount=10.0, current_value=50.0, max_value=50.0
)
assert flag.reached_limit() is True
assert flag._hit_limit is True
# Increase limit in headless mode — should still increase since BudgetControlFlag ignores headless param
flag.increase_limit(headless_mode=True)
assert flag.max_value == 60.0
def test_budget_control_flag_step_raises_on_limit():
flag = BudgetControlFlag(
limit_increase_amount=5.0, current_value=55.0, max_value=50.0
)
# Should raise RuntimeError
with pytest.raises(RuntimeError, match='Agent reached maximum budget'):
flag.step()
# After increasing limit, step should not raise
flag.max_value = 60.0
flag._hit_limit = False
flag.step() # Should not raise
def test_budget_control_flag_hit_limit_resets_after_increase():
flag = BudgetControlFlag(
limit_increase_amount=10.0, current_value=50.0, max_value=50.0
)
# Initially should hit limit
assert flag.reached_limit() is True
assert flag._hit_limit is True
# Increase limit
flag.increase_limit(headless_mode=False)
# After increasing, _hit_limit should be reset
assert flag._hit_limit is False
# Should no longer report reaching limit unless value exceeds new max
assert flag.reached_limit() is False
# If we push current_value over new max_value:
flag.current_value = flag.max_value + 1.0
assert flag.reached_limit() is True

View File

@@ -55,7 +55,9 @@ def event_stream(temp_dir):
class TestStuckDetector:
@pytest.fixture
def stuck_detector(self):
state = State(inputs={}, max_iterations=50)
state = State(inputs={})
# Set the iteration flag's max_value to 50 (equivalent to the old max_iterations)
state.iteration_flag.max_value = 50
state.history = [] # Initialize history as an empty list
return StuckDetector(state)

View File

@@ -1,76 +0,0 @@
import asyncio
import pytest
from openhands.controller.agent_controller import AgentController
from openhands.core.schema import AgentState
from openhands.events import EventStream
from openhands.events.action import MessageAction
from openhands.events.event import EventSource
from openhands.llm.metrics import Metrics
class DummyAgent:
def __init__(self):
self.name = 'dummy'
self.llm = type(
'DummyLLM',
(),
{
'metrics': Metrics(),
'config': type('DummyConfig', (), {'max_message_chars': 10000})(),
},
)()
def reset(self):
pass
def get_system_message(self):
# Return a proper SystemMessageAction for the refactored system message handling
from openhands.events.action.message import SystemMessageAction
from openhands.events.event import EventSource
system_message = SystemMessageAction(content='This is a dummy system message')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
return system_message
@pytest.mark.asyncio
async def test_iteration_limit_extends_on_user_message():
# Initialize test components
from openhands.storage.memory import InMemoryFileStore
file_store = InMemoryFileStore()
event_stream = EventStream(sid='test', file_store=file_store)
agent = DummyAgent()
initial_max_iterations = 100
controller = AgentController(
agent=agent,
event_stream=event_stream,
max_iterations=initial_max_iterations,
sid='test',
headless_mode=False,
)
# Set initial state
await controller.set_agent_state_to(AgentState.RUNNING)
controller.state.iteration = 90 # Close to the limit
assert controller.state.max_iterations == initial_max_iterations
# Simulate user message
user_message = MessageAction('test message', EventSource.USER)
event_stream.add_event(user_message, EventSource.USER)
await asyncio.sleep(0.1) # Give time for event to be processed
# Verify max_iterations was extended
assert controller.state.max_iterations == 90 + initial_max_iterations
# Simulate more iterations and another user message
controller.state.iteration = 180 # Close to new limit
user_message2 = MessageAction('another message', EventSource.USER)
event_stream.add_event(user_message2, EventSource.USER)
await asyncio.sleep(0.1) # Give time for event to be processed
# Verify max_iterations was extended again
assert controller.state.max_iterations == 180 + initial_max_iterations

View File

@@ -250,28 +250,6 @@ def test_response_latency_tracking(mock_time, mock_litellm_completion):
assert latency_record.latency == 0.0 # Should be lifted to 0 instead of being -1!
def test_llm_reset():
llm = LLM(LLMConfig(model='gpt-4o-mini', api_key='test_key'))
initial_metrics = copy.deepcopy(llm.metrics)
initial_metrics.add_cost(1.0)
initial_metrics.add_response_latency(0.5, 'test-id')
initial_metrics.add_token_usage(10, 5, 3, 2, 1000, 'test-id')
llm.reset()
assert llm.metrics.accumulated_cost != initial_metrics.accumulated_cost
assert llm.metrics.costs != initial_metrics.costs
assert llm.metrics.response_latencies != initial_metrics.response_latencies
assert llm.metrics.token_usages != initial_metrics.token_usages
assert isinstance(llm.metrics, Metrics)
# Check that accumulated token usage is reset
metrics_data = llm.metrics.get()
accumulated_usage = metrics_data['accumulated_token_usage']
assert accumulated_usage['prompt_tokens'] == 0
assert accumulated_usage['completion_tokens'] == 0
assert accumulated_usage['cache_read_tokens'] == 0
assert accumulated_usage['cache_write_tokens'] == 0
@patch('openhands.llm.llm.litellm.get_model_info')
def test_llm_init_with_openrouter_model(mock_get_model_info, default_config):
default_config.model = 'openrouter:gpt-4o-mini'

View File

@@ -111,7 +111,7 @@ async def test_memory_on_event_exception_handling(memory, event_stream, mock_age
)
# Verify that the controller's last error was set
assert state.iteration == 0
assert state.iteration_flag.current_value == 0
assert state.agent_state == AgentState.ERROR
assert state.last_error == 'Error: Exception'
@@ -142,7 +142,7 @@ async def test_memory_on_workspace_context_recall_exception_handling(
)
# Verify that the controller's last error was set
assert state.iteration == 0
assert state.iteration_flag.current_value == 0
assert state.agent_state == AgentState.ERROR
assert state.last_error == 'Error: Exception'

View File

@@ -3,6 +3,7 @@ import shutil
import pytest
from openhands.controller.state.control_flags import IterationControlFlag
from openhands.controller.state.state import State
from openhands.core.message import Message, TextContent
from openhands.events.observation.agent import MicroagentKnowledge
@@ -161,9 +162,11 @@ def test_add_turns_left_reminder(prompt_dir):
manager = PromptManager(prompt_dir=prompt_dir)
# Create a State object with specific iteration values
state = State()
state.iteration = 3
state.max_iterations = 10
state = State(
iteration_flag=IterationControlFlag(
current_value=3, max_value=10, limit_increase_amount=10
)
)
# Create a list of messages with a user message
user_message = Message(role='user', content=[TextContent(text='User content')])

View File

@@ -301,11 +301,11 @@ async def test_clone_or_init_repo_auth_error(temp_dir):
side_effect=AuthenticationError('Auth failed'),
):
# Call the function with a repository
with pytest.raises(RuntimeError) as excinfo:
with pytest.raises(Exception) as excinfo:
await runtime.clone_or_init_repo(None, 'owner/repo', None)
# Verify the error message
assert 'Git provider authentication issue when cloning repo' in str(
assert 'Git provider authentication issue when getting remote URL' in str(
excinfo.value
)

View File

@@ -1,5 +1,9 @@
from openhands.controller.state.state import State
from unittest.mock import patch
from openhands.controller.state.state import State, TrafficControlState
from openhands.core.schema import AgentState
from openhands.events.event import Event
from openhands.llm.metrics import Metrics
from openhands.storage.memory import InMemoryFileStore
@@ -56,3 +60,66 @@ def test_state_view_cache_not_serialized():
# be structurally identical but _not_ the same object.
assert id(restored_view) != id(view)
assert restored_view.events == view.events
def test_restore_older_state_version():
"""Test that we can restore from an older state version (before control flags)."""
# Create a dictionary that mimics the old state format (before control flags)
state = State(
session_id='test_old_session',
iteration=42,
local_iteration=42,
max_iterations=100,
agent_state=AgentState.RUNNING,
traffic_control_state=TrafficControlState.NORMAL,
metrics=Metrics(),
confirmation_mode=False,
)
def no_op_getstate(self):
return self.__dict__
store = InMemoryFileStore()
with patch.object(State, '__getstate__', no_op_getstate):
state.save_to_session('test_old_session', store, None)
# Now restore it
restored_state = State.restore_from_session('test_old_session', store, None)
# Verify that when we store the active fields are populated with the values from the deprecated fields
assert restored_state.session_id == 'test_old_session'
assert restored_state.agent_state == AgentState.LOADING
assert restored_state.resume_state == AgentState.RUNNING
assert restored_state.iteration_flag.current_value == 42
assert restored_state.iteration_flag.max_value == 100
def test_save_without_deprecated_fields():
"""Test that we can save state without deprecated fields"""
# Create a dictionary that mimics the old state format (before control flags)
state = State(
session_id='test_old_session',
iteration=42,
local_iteration=42,
max_iterations=100,
agent_state=AgentState.RUNNING,
traffic_control_state=TrafficControlState.NORMAL,
metrics=Metrics(),
confirmation_mode=False,
)
store = InMemoryFileStore()
state.save_to_session('test_state', store, None)
restored_state = State.restore_from_session('test_state', store, None)
# Verify that when we save and restore, the deprecated fields are removed
# but the new fields maintain the correct values
assert restored_state.session_id == 'test_old_session'
assert restored_state.agent_state == AgentState.LOADING
assert restored_state.resume_state == AgentState.RUNNING
assert (
restored_state.iteration_flag.current_value == 0
) # The depreciated attrib was not stored, so it did not override existing values on restore
assert restored_state.iteration_flag.max_value == 100

View File

@@ -1,91 +0,0 @@
from unittest.mock import MagicMock
import pytest
from openhands.controller.agent_controller import AgentController
from openhands.core.config import AgentConfig, LLMConfig
from openhands.events import EventStream
from openhands.llm.llm import LLM
from openhands.storage import InMemoryFileStore
@pytest.fixture
def agent_controller():
llm = LLM(config=LLMConfig())
agent = MagicMock()
agent.name = 'test_agent'
agent.llm = llm
agent.config = AgentConfig()
# Add a proper system message mock
from openhands.events import EventSource
from openhands.events.action.message import SystemMessageAction
system_message = SystemMessageAction(content='Test system message')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
agent.get_system_message.return_value = system_message
event_stream = EventStream(sid='test', file_store=InMemoryFileStore())
controller = AgentController(
agent=agent,
event_stream=event_stream,
max_iterations=100,
max_budget_per_task=10.0,
sid='test',
headless_mode=False,
)
return controller
@pytest.mark.asyncio
async def test_traffic_control_iteration_message(agent_controller):
"""Test that iteration messages are formatted as integers."""
# Mock _react_to_exception to capture the error
error = None
async def mock_react_to_exception(e):
nonlocal error
error = e
agent_controller._react_to_exception = mock_react_to_exception
await agent_controller._handle_traffic_control('iteration', 200.0, 100.0)
assert error is not None
assert 'Current iteration: 200, max iteration: 100' in str(error)
@pytest.mark.asyncio
async def test_traffic_control_budget_message(agent_controller):
"""Test that budget messages keep decimal points."""
# Mock _react_to_exception to capture the error
error = None
async def mock_react_to_exception(e):
nonlocal error
error = e
agent_controller._react_to_exception = mock_react_to_exception
await agent_controller._handle_traffic_control('budget', 15.75, 10.0)
assert error is not None
assert 'Current budget: 15.75, max budget: 10.00' in str(error)
@pytest.mark.asyncio
async def test_traffic_control_headless_mode(agent_controller):
"""Test that headless mode messages are formatted correctly."""
# Mock _react_to_exception to capture the error
error = None
async def mock_react_to_exception(e):
nonlocal error
error = e
agent_controller._react_to_exception = mock_react_to_exception
agent_controller.headless_mode = True
await agent_controller._handle_traffic_control('iteration', 200.0, 100.0)
assert error is not None
assert 'in headless mode' in str(error)
assert 'Current iteration: 200, max iteration: 100' in str(error)