mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-04-29 03:00:45 -04:00
Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 04112e4a24 | |||
| 7b19ee0d5e | |||
| 4b6f2aeb4d | |||
| 0023eb0982 | |||
| c3ab4b480b | |||
| 35f7efb9d7 | |||
| 14498c5e25 | |||
| cdb9aeb9ba | |||
| 318883e5e0 | |||
| 767b6ce600 | |||
| 3ccc96d794 |
+23
-10
@@ -1,8 +1,10 @@
|
||||
# Development Guide
|
||||
|
||||
This guide is for people working on OpenHands and editing the source code.
|
||||
If you wish to contribute your changes, check out the [CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md) on how to clone and setup the project
|
||||
initially before moving on. Otherwise, you can clone the OpenHands project directly.
|
||||
If you wish to contribute your changes, check out the
|
||||
[CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md)
|
||||
on how to clone and setup the project initially before moving on. Otherwise,
|
||||
you can clone the OpenHands project directly.
|
||||
|
||||
## Start the Server for Development
|
||||
|
||||
@@ -19,9 +21,20 @@ initially before moving on. Otherwise, you can clone the OpenHands project direc
|
||||
|
||||
Make sure you have all these dependencies installed before moving on to `make build`.
|
||||
|
||||
#### Dev container
|
||||
|
||||
There is a [dev container](https://containers.dev/) available which provides a
|
||||
pre-configured environment with all the necessary dependencies installed if you
|
||||
are using a [supported editor or tool](https://containers.dev/supporting). For
|
||||
example, if you are using Visual Studio Code (VS Code) with the
|
||||
[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
|
||||
extension installed, you can open the project in a dev container by using the
|
||||
_Dev Container: Reopen in Container_ command from the Command Palette
|
||||
(Ctrl+Shift+P).
|
||||
|
||||
#### Develop without sudo access
|
||||
|
||||
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
|
||||
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
|
||||
`conda` or `mamba` to manage the packages for you:
|
||||
|
||||
```bash
|
||||
@@ -37,7 +50,7 @@ mamba install conda-forge::poetry
|
||||
|
||||
### 2. Build and Setup The Environment
|
||||
|
||||
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
|
||||
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
|
||||
that OpenHands is ready to run on your system:
|
||||
|
||||
```bash
|
||||
@@ -54,11 +67,11 @@ To configure the LM of your choice, run:
|
||||
make setup-config
|
||||
```
|
||||
|
||||
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
|
||||
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
|
||||
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
|
||||
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
|
||||
please set the model in the UI.
|
||||
|
||||
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
|
||||
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
|
||||
variables in your terminal. The final configurations are set from highest to lowest priority:
|
||||
Environment variables > config.toml variables > default variables
|
||||
|
||||
@@ -77,14 +90,14 @@ make run
|
||||
|
||||
#### Option B: Individual Server Startup
|
||||
|
||||
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
|
||||
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
|
||||
backend-related tasks or configurations.
|
||||
|
||||
```bash
|
||||
make start-backend
|
||||
```
|
||||
|
||||
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
|
||||
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
|
||||
components or interface enhancements.
|
||||
```bash
|
||||
make start-frontend
|
||||
@@ -120,7 +133,7 @@ poetry run pytest ./tests/unit/test_*.py
|
||||
|
||||
### 9. Use existing Docker image
|
||||
|
||||
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
|
||||
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
|
||||
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
|
||||
|
||||
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.39-nikolaik`
|
||||
|
||||
+19
@@ -0,0 +1,19 @@
|
||||
[ ] Rename `Conversation` in openhands/server to `ServerConversation`
|
||||
[ ] Replace all instances of `sid` in openhands/* to `conversation_id`
|
||||
[ ] Make EventStream take in a `conversation_id` in its constructor.
|
||||
* remove `conversation_id` from all methods on EventStream and use self.conversation_id instead.
|
||||
* fix all callers of EventStream to pass in `conversation_id` in the constructor and remove it from the method calls.
|
||||
[ ] Rename AppConfig to OpenHandsConfig
|
||||
[ ] Create a new class `Conversation` in openhands/core/ that will be the main interface for conversations.
|
||||
* Its constructor will take in a:
|
||||
* conversation_id (string)
|
||||
* Runtime
|
||||
* LLM
|
||||
* EventStream
|
||||
* AgentController
|
||||
* No logic, it's just a dataclass
|
||||
[ ] Add a new OpenHands class to openhands/core/ which will take care of creating Conversations
|
||||
* Constructor is ONLY an OpenHandsConfig
|
||||
* Only one method: `create_conversation()`
|
||||
* This will create a Runtime, LLM, EventStream, and AgentController, and return a Conversation object.
|
||||
* These objects will be created according to the OpenHandsConfig passed in to the constructor.
|
||||
@@ -325,6 +325,15 @@ classpath = "my_package.my_module.MyCustomAgent"
|
||||
# Useful when deploying OpenHands in a remote machine where you need to expose a specific port.
|
||||
#vscode_port = 41234
|
||||
|
||||
# Volume mounts in the format 'host_path:container_path[:mode]'
|
||||
# e.g. '/my/host/dir:/workspace:rw'
|
||||
# Multiple mounts can be specified using commas
|
||||
# e.g. '/path1:/workspace/path1,/path2:/workspace/path2:ro'
|
||||
|
||||
# Configure volumes under the [sandbox] section:
|
||||
# [sandbox]
|
||||
# volumes = "/my/host/dir:/workspace:rw,/path2:/workspace/path2:ro"
|
||||
|
||||
#################################### Security ###################################
|
||||
# Configuration for security features
|
||||
##############################################################################
|
||||
|
||||
@@ -0,0 +1,84 @@
|
||||
# Using OpenHands as a library
|
||||
|
||||
|
||||
## Hello World
|
||||
```python
|
||||
import asyncio
|
||||
from openhands.core.config import OpenHandsConfig, LLMConfig, AgentConfig
|
||||
from openhands.core.setup import run_agent
|
||||
|
||||
async def run_openhands_agent():
|
||||
final_state = await run_agent(
|
||||
config=OpenHandsConfig(
|
||||
llm=LLMConfig(
|
||||
model="claude-sonnet-4-20250514",
|
||||
api_key="your_api_key_here", # Replace with your actual API key
|
||||
),
|
||||
),
|
||||
initial_user_message="Flip a coin",
|
||||
context_message="You build simple programs and run them.",
|
||||
)
|
||||
|
||||
return final_state
|
||||
|
||||
# Run the async function
|
||||
if __name__ == "__main__":
|
||||
final_state = asyncio.run(run_openhands_agent())
|
||||
print("Agent execution completed!")
|
||||
```
|
||||
|
||||
## Using the internals
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from openhands.controller.agent import Agent
|
||||
from openhands.core.config import OpenHandsConfig, LLMConfig, AgentConfig
|
||||
from openhands.events.action import MessageAction
|
||||
from openhands.llm.llm import LLM
|
||||
from openhands.core.setup import (
|
||||
create_runtime,
|
||||
create_memory,
|
||||
generate_sid,
|
||||
)
|
||||
from openhands.core.main import run_controller
|
||||
|
||||
async def run_openhands_agent():
|
||||
config = OpenHandsConfig(
|
||||
runtime="local",
|
||||
file_store="memory",
|
||||
llm=LLMConfig(
|
||||
model="claude-sonnet-4-20250514", # Choose your preferred model
|
||||
api_key="your_api_key_here", # Replace with your actual API key
|
||||
temperature=0.0, # Set temperature to 0 for deterministic output
|
||||
),
|
||||
agent=AgentConfig(
|
||||
enable_browsing=False,
|
||||
),
|
||||
)
|
||||
|
||||
oh = OpenHands(config=config)
|
||||
|
||||
conversation = oh.create_conversation(
|
||||
conversation_id='hello-world',
|
||||
)
|
||||
await conversation.runtime.connect()
|
||||
|
||||
def on_event(event: Event) -> None:
|
||||
print(f"Event received: {event}")
|
||||
conversation.event_stream.subscribe(EventStreamSubscriber.MAIN, on_event)
|
||||
|
||||
initial_user_action = MessageAction(content="Flip a coin")
|
||||
conversation.event_stream.add_event(initial_user_action, EventSource.USER)
|
||||
|
||||
while conversation.state.agent_state not in end_states:
|
||||
await asyncio.sleep(1)
|
||||
|
||||
await runtime.close()
|
||||
|
||||
return conversation.state
|
||||
|
||||
# Run the async function
|
||||
if __name__ == "__main__":
|
||||
final_state = asyncio.run(run_openhands_agent())
|
||||
print("Agent execution completed!")
|
||||
```
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"items": ["python/python"],
|
||||
"items": ["python/python", "python/using-openhands-as-library"],
|
||||
"label": "Backend",
|
||||
"type": "category"
|
||||
}
|
||||
|
||||
@@ -0,0 +1,399 @@
|
||||
# Using OpenHands as a Library
|
||||
|
||||
OpenHands can be used as a Python library in your own applications. This guide will show you how to integrate OpenHands into your Python projects, allowing you to build custom applications that leverage OpenHands' powerful agent capabilities.
|
||||
|
||||
## Installation
|
||||
|
||||
First, install the OpenHands library from PyPI:
|
||||
|
||||
```bash
|
||||
pip install openhands-ai
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
Here's a simple example of how to use OpenHands in your Python code:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from openhands.controller.agent import Agent
|
||||
from openhands.core.config import AppConfig, LLMConfig, AgentConfig
|
||||
from openhands.events.action import MessageAction
|
||||
from openhands.llm.llm import LLM
|
||||
from openhands.core.setup import (
|
||||
create_runtime,
|
||||
create_memory,
|
||||
generate_sid,
|
||||
)
|
||||
from openhands.core.main import run_controller
|
||||
|
||||
async def run_openhands_agent():
|
||||
# 1. Create configuration
|
||||
config = AppConfig(
|
||||
runtime="local", # Use local runtime
|
||||
file_store="memory", # Store events in memory
|
||||
)
|
||||
|
||||
# 2. Configure LLM
|
||||
llm_config = LLMConfig(
|
||||
model="claude-sonnet-4-20250514", # Choose your preferred model
|
||||
api_key="your_api_key_here", # Replace with your actual API key
|
||||
temperature=0.0,
|
||||
)
|
||||
config.set_llm_config(llm_config)
|
||||
|
||||
# 3. Configure Agent
|
||||
agent_config = AgentConfig(
|
||||
enable_browsing=False, # Disable browsing for this example
|
||||
)
|
||||
config.set_agent_config(agent_config)
|
||||
|
||||
# 4. Create Agent
|
||||
agent = Agent(
|
||||
llm=LLM(config=llm_config),
|
||||
config=agent_config,
|
||||
)
|
||||
|
||||
# 5. Generate a session ID
|
||||
sid = generate_sid(config)
|
||||
|
||||
# 6. Create Runtime
|
||||
runtime = create_runtime(
|
||||
config=config,
|
||||
sid=sid,
|
||||
headless_mode=True,
|
||||
agent=agent,
|
||||
)
|
||||
|
||||
# 7. Connect to the runtime
|
||||
await runtime.connect()
|
||||
|
||||
# 8. Create Memory
|
||||
memory = create_memory(
|
||||
runtime=runtime,
|
||||
event_stream=runtime.event_stream,
|
||||
sid=sid,
|
||||
)
|
||||
|
||||
# 9. Define the initial task
|
||||
initial_user_action = MessageAction(content="Write a Python function that calculates the factorial of a number")
|
||||
|
||||
# 10. Run the agent
|
||||
final_state = await run_controller(
|
||||
config=config,
|
||||
initial_user_action=initial_user_action,
|
||||
sid=sid,
|
||||
runtime=runtime,
|
||||
agent=agent,
|
||||
memory=memory,
|
||||
headless_mode=True,
|
||||
exit_on_message=True, # Exit when the agent asks for user input
|
||||
)
|
||||
|
||||
# 11. Close the runtime
|
||||
await runtime.close()
|
||||
|
||||
return final_state
|
||||
|
||||
# Run the async function
|
||||
if __name__ == "__main__":
|
||||
final_state = asyncio.run(run_openhands_agent())
|
||||
print("Agent execution completed!")
|
||||
```
|
||||
|
||||
## Components Overview
|
||||
|
||||
### AppConfig
|
||||
|
||||
The `AppConfig` class is the main configuration object for OpenHands. It contains settings for the runtime, agent, LLM, and more.
|
||||
|
||||
```python
|
||||
from openhands.core.config import AppConfig
|
||||
|
||||
config = AppConfig(
|
||||
runtime="local", # Options: "local", "docker", "e2b", "modal", etc.
|
||||
file_store="memory", # Options: "memory", "local", etc.
|
||||
file_store_path="/path/to/store", # Only needed for "local" file_store
|
||||
max_iterations=100, # Maximum number of agent iterations
|
||||
)
|
||||
```
|
||||
|
||||
### LLMConfig
|
||||
|
||||
The `LLMConfig` class configures the language model used by the agent.
|
||||
|
||||
```python
|
||||
from openhands.core.config import LLMConfig
|
||||
|
||||
llm_config = LLMConfig(
|
||||
model="claude-sonnet-4-20250514", # Model name
|
||||
api_key="your_api_key_here", # API key
|
||||
temperature=0.0, # Temperature for generation
|
||||
max_output_tokens=4096, # Maximum tokens in the response
|
||||
)
|
||||
```
|
||||
|
||||
### AgentConfig
|
||||
|
||||
The `AgentConfig` class configures the agent's behavior and available tools.
|
||||
|
||||
```python
|
||||
from openhands.core.config import AgentConfig
|
||||
|
||||
agent_config = AgentConfig(
|
||||
enable_browsing=True, # Enable web browsing
|
||||
enable_cmd=True, # Enable bash commands
|
||||
enable_editor=True, # Enable file editing
|
||||
enable_jupyter=True, # Enable Jupyter notebook
|
||||
enable_think=True, # Enable thinking tool
|
||||
enable_finish=True, # Enable finish tool
|
||||
)
|
||||
```
|
||||
|
||||
### Agent
|
||||
|
||||
The `Agent` class represents the AI agent that will perform tasks.
|
||||
|
||||
```python
|
||||
from openhands.controller.agent import Agent
|
||||
from openhands.llm.llm import LLM
|
||||
|
||||
agent = Agent(
|
||||
llm=LLM(config=llm_config),
|
||||
config=agent_config,
|
||||
)
|
||||
```
|
||||
|
||||
### Runtime
|
||||
|
||||
The runtime is the environment where the agent executes commands and interacts with the system.
|
||||
|
||||
```python
|
||||
from openhands.core.setup import create_runtime
|
||||
|
||||
runtime = create_runtime(
|
||||
config=config,
|
||||
sid=sid,
|
||||
headless_mode=True,
|
||||
agent=agent,
|
||||
)
|
||||
```
|
||||
|
||||
### Memory
|
||||
|
||||
The memory component manages the agent's context and conversation history.
|
||||
|
||||
```python
|
||||
from openhands.core.setup import create_memory
|
||||
|
||||
memory = create_memory(
|
||||
runtime=runtime,
|
||||
event_stream=runtime.event_stream,
|
||||
sid=sid,
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Sandbox Configuration
|
||||
|
||||
You can customize the sandbox environment by configuring the `SandboxConfig`:
|
||||
|
||||
```python
|
||||
from openhands.core.config import SandboxConfig
|
||||
|
||||
sandbox_config = SandboxConfig(
|
||||
selected_repo="username/repo", # GitHub repository to clone
|
||||
base_image="ubuntu:22.04", # Base Docker image
|
||||
)
|
||||
config.sandbox = sandbox_config
|
||||
```
|
||||
|
||||
### Security Configuration
|
||||
|
||||
Configure security settings using the `SecurityConfig`:
|
||||
|
||||
```python
|
||||
from openhands.core.config import SecurityConfig
|
||||
|
||||
security_config = SecurityConfig(
|
||||
confirmation_mode=False, # Whether to require confirmation for actions
|
||||
security_analyzer="default", # Security analyzer to use
|
||||
)
|
||||
config.security = security_config
|
||||
```
|
||||
|
||||
### Custom Agent Response Handling
|
||||
|
||||
You can provide a custom function to handle agent responses:
|
||||
|
||||
```python
|
||||
def custom_response_handler(state):
|
||||
# Process the agent's state and generate a response
|
||||
return "Continue with your current approach"
|
||||
|
||||
final_state = await run_controller(
|
||||
config=config,
|
||||
initial_user_action=initial_user_action,
|
||||
fake_user_response_fn=custom_response_handler,
|
||||
)
|
||||
```
|
||||
|
||||
## Building a Complete Application
|
||||
|
||||
Here's an example of a more complete application that uses OpenHands to assist with code generation:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import os
|
||||
from openhands.controller.agent import Agent
|
||||
from openhands.core.config import AppConfig, LLMConfig, AgentConfig, SandboxConfig
|
||||
from openhands.events.action import MessageAction
|
||||
from openhands.llm.llm import LLM
|
||||
from openhands.core.setup import create_runtime, create_memory, generate_sid
|
||||
from openhands.core.main import run_controller
|
||||
from openhands.events import EventStreamSubscriber
|
||||
from openhands.events.observation import AgentStateChangedObservation
|
||||
from openhands.core.schema import AgentState
|
||||
|
||||
class CodeAssistant:
|
||||
def __init__(self, api_key, model="claude-sonnet-4-20250514"):
|
||||
self.api_key = api_key
|
||||
self.model = model
|
||||
self.config = None
|
||||
self.agent = None
|
||||
self.runtime = None
|
||||
self.memory = None
|
||||
self.sid = None
|
||||
self.event_stream = None
|
||||
|
||||
async def initialize(self):
|
||||
# Create configuration
|
||||
self.config = AppConfig(
|
||||
runtime="docker",
|
||||
file_store="memory",
|
||||
)
|
||||
|
||||
# Configure LLM
|
||||
llm_config = LLMConfig(
|
||||
model=self.model,
|
||||
api_key=self.api_key,
|
||||
temperature=0.0,
|
||||
)
|
||||
self.config.set_llm_config(llm_config)
|
||||
|
||||
# Configure Agent
|
||||
agent_config = AgentConfig(
|
||||
enable_browsing=True,
|
||||
enable_cmd=True,
|
||||
enable_editor=True,
|
||||
enable_jupyter=True,
|
||||
)
|
||||
self.config.set_agent_config(agent_config)
|
||||
|
||||
# Configure Sandbox
|
||||
sandbox_config = SandboxConfig(
|
||||
base_image="ubuntu:22.04",
|
||||
)
|
||||
self.config.sandbox = sandbox_config
|
||||
|
||||
# Create Agent
|
||||
self.agent = Agent(
|
||||
llm=LLM(config=llm_config),
|
||||
config=agent_config,
|
||||
)
|
||||
|
||||
# Generate a session ID
|
||||
self.sid = generate_sid(self.config)
|
||||
|
||||
# Create Runtime
|
||||
self.runtime = create_runtime(
|
||||
config=self.config,
|
||||
sid=self.sid,
|
||||
headless_mode=True,
|
||||
agent=self.agent,
|
||||
)
|
||||
|
||||
# Connect to the runtime
|
||||
await self.runtime.connect()
|
||||
|
||||
# Create Memory
|
||||
self.memory = create_memory(
|
||||
runtime=self.runtime,
|
||||
event_stream=self.runtime.event_stream,
|
||||
sid=self.sid,
|
||||
)
|
||||
|
||||
self.event_stream = self.runtime.event_stream
|
||||
|
||||
async def run_task(self, task_description, callback=None):
|
||||
# Define the initial task
|
||||
initial_user_action = MessageAction(content=task_description)
|
||||
|
||||
# Set up event callback if provided
|
||||
if callback:
|
||||
def on_event(event):
|
||||
if isinstance(event, AgentStateChangedObservation):
|
||||
callback(event)
|
||||
|
||||
self.event_stream.subscribe(
|
||||
EventStreamSubscriber.MAIN,
|
||||
on_event,
|
||||
self.sid
|
||||
)
|
||||
|
||||
# Run the agent
|
||||
final_state = await run_controller(
|
||||
config=self.config,
|
||||
initial_user_action=initial_user_action,
|
||||
sid=self.sid,
|
||||
runtime=self.runtime,
|
||||
agent=self.agent,
|
||||
memory=self.memory,
|
||||
headless_mode=True,
|
||||
exit_on_message=True,
|
||||
)
|
||||
|
||||
return final_state
|
||||
|
||||
async def close(self):
|
||||
if self.runtime:
|
||||
await self.runtime.close()
|
||||
|
||||
# Example usage
|
||||
async def main():
|
||||
# Initialize the code assistant
|
||||
assistant = CodeAssistant(api_key=os.environ.get("ANTHROPIC_API_KEY"))
|
||||
await assistant.initialize()
|
||||
|
||||
# Define a callback to process events
|
||||
def event_callback(event):
|
||||
if isinstance(event, AgentStateChangedObservation):
|
||||
print(f"Agent state changed to: {event.agent_state}")
|
||||
|
||||
# Run a task
|
||||
task = """
|
||||
Create a simple Flask API with the following endpoints:
|
||||
1. GET /users - Returns a list of users
|
||||
2. GET /users/{id} - Returns a specific user
|
||||
3. POST /users - Creates a new user
|
||||
|
||||
Use SQLite as the database and implement proper error handling.
|
||||
"""
|
||||
|
||||
final_state = await assistant.run_task(task, callback=event_callback)
|
||||
|
||||
# Close the assistant
|
||||
await assistant.close()
|
||||
|
||||
print("Task completed!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
Using OpenHands as a library gives you the flexibility to integrate AI agents into your own applications. You can customize the agent's behavior, runtime environment, and how it interacts with your application.
|
||||
|
||||
For more advanced usage, refer to the OpenHands source code and API documentation. The library is highly customizable and can be adapted to a wide range of use cases.
|
||||
@@ -331,6 +331,8 @@ The agent configuration options are defined in the `[agent]` and `[agent.<agent_
|
||||
|
||||
The sandbox configuration options are defined in the `[sandbox]` section of the `config.toml` file.
|
||||
|
||||
|
||||
|
||||
To use these with the docker command, pass in `-e SANDBOX_<option>`. Example: `-e SANDBOX_TIMEOUT`.
|
||||
|
||||
### Execution
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
|
||||
|
||||
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
|
||||
|
||||
**UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
|
||||
|
||||
**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
|
||||
|
||||
@@ -0,0 +1,92 @@
|
||||
# SWE-Interact Benchmark
|
||||
|
||||
This document explains how to use the [Interactive SWE-Bench](https://arxiv.org/abs/2502.13069) benchmark scripts for running and evaluating interactive software engineering tasks.
|
||||
|
||||
## Setting things up
|
||||
After following the [README](./README.md) to set up the environment, you would need to additionally add LLM configurations for simulated human users. In the original [paper](https://arxiv.org/abs/2502.13069), we use gpt-4o as the simulated human user. You can add the following to your `config.toml` file:
|
||||
|
||||
```toml
|
||||
[llm.fake_user]
|
||||
model="litellm_proxy/gpt-4o-2024-08-06"
|
||||
api_key="<your-api-key>"
|
||||
temperature = 0.0
|
||||
base_url = "https://llm-proxy.eval.all-hands.dev"
|
||||
```
|
||||
|
||||
## Running the Benchmark
|
||||
|
||||
The main script for running the benchmark is `run_infer_interact.sh`. Here's how to use it:
|
||||
|
||||
```bash
|
||||
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh <model_config> <commit_hash> <agent> <eval_limit> <max_iter> <num_workers> <split>
|
||||
```
|
||||
|
||||
### Parameters:
|
||||
|
||||
- `model_config`: Path to the LLM configuration file (e.g., `llm.claude-3-7-sonnet`)
|
||||
- `commit_hash`: Git commit hash to use (e.g., `HEAD`)
|
||||
- `agent`: The agent class to use (e.g., `CodeActAgent`)
|
||||
- `eval_limit`: Number of examples to evaluate (e.g., `500`)
|
||||
- `max_iter`: Maximum number of iterations per task (e.g., `100`)
|
||||
- `num_workers`: Number of parallel workers (e.g., `1`)
|
||||
- `split`: Dataset split to use (e.g., `test`)
|
||||
|
||||
### Example:
|
||||
|
||||
```bash
|
||||
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh llm.claude-3-7-sonnet HEAD CodeActAgent 500 100 1 test
|
||||
```
|
||||
|
||||
### Additional Environment Variables:
|
||||
|
||||
You can customize the behavior using these environment variables:
|
||||
|
||||
- `RUN_WITH_BROWSING`: Enable/disable web browsing (default: false)
|
||||
- `USE_HINT_TEXT`: Enable/disable hint text (default: false)
|
||||
- `EVAL_CONDENSER`: Specify a condenser configuration
|
||||
- `EXP_NAME`: Add a custom experiment name to the output
|
||||
- `N_RUNS`: Number of runs to perform (default: 1)
|
||||
- `SKIP_RUNS`: Comma-separated list of run numbers to skip
|
||||
|
||||
## Evaluating Results
|
||||
|
||||
After running the benchmark, you can evaluate the results using `eval_infer.sh`:
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh <output_file> <instance_id> <dataset> <split>
|
||||
```
|
||||
|
||||
### Parameters:
|
||||
|
||||
- `output_file`: Path to the output JSONL file
|
||||
- `instance_id`: The specific instance ID to evaluate
|
||||
- `dataset`: Dataset name (e.g., `cmu-lti/interactive-swe`)
|
||||
- `split`: Dataset split (e.g., `test`)
|
||||
|
||||
### Example:
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/cmu-lti__interactive-swe-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.39.0-no-hint-run_1/output.jsonl sphinx-doc__sphinx-8721 cmu-lti/interactive-swe test
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
The benchmark outputs are stored in the `evaluation/evaluation_outputs/outputs/` directory with the following structure:
|
||||
|
||||
```
|
||||
evaluation/evaluation_outputs/outputs/
|
||||
└── cmu-lti__interactive-swe-{split}/
|
||||
└── {agent}/
|
||||
└── {model}-{date}_maxiter_{max_iter}_N_{version}-{options}-run_{run_number}/
|
||||
└── output.jsonl
|
||||
```
|
||||
|
||||
Where:
|
||||
- `{split}` is the dataset split (e.g., test)
|
||||
- `{agent}` is the agent class name
|
||||
- `{model}` is the model name
|
||||
- `{date}` is the run date
|
||||
- `{max_iter}` is the maximum iterations
|
||||
- `{version}` is the OpenHands version
|
||||
- `{options}` includes any additional options (e.g., no-hint, with-browsing)
|
||||
- `{run_number}` is the run number
|
||||
+411
@@ -0,0 +1,411 @@
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
|
||||
import pandas as pd
|
||||
from datasets import load_dataset
|
||||
from litellm import completion as litellm_completion
|
||||
|
||||
import openhands.agenthub
|
||||
from evaluation.benchmarks.swe_bench.run_infer import (
|
||||
AgentFinishedCritic,
|
||||
complete_runtime,
|
||||
filter_dataset,
|
||||
get_config,
|
||||
initialize_runtime,
|
||||
)
|
||||
from evaluation.benchmarks.swe_bench.run_infer import (
|
||||
get_instruction as base_get_instruction,
|
||||
)
|
||||
from evaluation.utils.shared import (
|
||||
EvalException,
|
||||
EvalMetadata,
|
||||
EvalOutput,
|
||||
make_metadata,
|
||||
prepare_dataset,
|
||||
reset_logger_for_multiprocessing,
|
||||
run_evaluation,
|
||||
)
|
||||
from openhands.controller.state.state import State
|
||||
from openhands.core.config import (
|
||||
get_llm_config_arg,
|
||||
get_parser,
|
||||
)
|
||||
from openhands.core.config.condenser_config import NoOpCondenserConfig
|
||||
from openhands.core.config.utils import get_condenser_config_arg
|
||||
from openhands.core.logger import openhands_logger as logger
|
||||
from openhands.core.main import create_runtime, run_controller
|
||||
from openhands.events.action import MessageAction
|
||||
from openhands.events.serialization.event import event_from_dict, event_to_dict
|
||||
from openhands.utils.async_utils import call_async_from_sync
|
||||
|
||||
USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
|
||||
USE_INSTANCE_IMAGE = os.environ.get('USE_INSTANCE_IMAGE', 'false').lower() == 'true'
|
||||
RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'false'
|
||||
|
||||
|
||||
class FakeUser:
|
||||
def __init__(self, issue, hints, files):
|
||||
self.system_message = f"""
|
||||
You are a GitHub user reporting an issue. Here are the details of your issue and environment:
|
||||
|
||||
Issue: {issue}
|
||||
|
||||
Hints: {hints}
|
||||
|
||||
Files relative to your current directory: {files}
|
||||
|
||||
Your task is to respond to questions from a coder who is trying to solve your issue. The coder has a summarized version of the issue you have. Follow these rules:
|
||||
1. If the coder asks a question that is directly related to the information in the issue you have, provide that information.
|
||||
2. Always stay in character as a user reporting an issue, not as an AI assistant.
|
||||
3. Keep your responses concise and to the point.
|
||||
4. The coder has limited turns to solve the issue. Do not interact with the coder beyond 3 turns.
|
||||
|
||||
Respond with "I don't have that information" if the question is unrelated or you're unsure.
|
||||
"""
|
||||
self.chat_history = [{'role': 'system', 'content': self.system_message}]
|
||||
self.turns = 0
|
||||
# Get LLM config from config.toml
|
||||
self.llm_config = get_llm_config_arg(
|
||||
'llm.fake_user'
|
||||
) # You can change 'fake_user' to any config name you want
|
||||
|
||||
def generate_reply(self, question):
|
||||
if self.turns > 3:
|
||||
return 'Please continue working on the task. Do NOT ask for more help.'
|
||||
self.chat_history.append({'role': 'user', 'content': question.content})
|
||||
|
||||
response = litellm_completion(
|
||||
model=self.llm_config.model,
|
||||
messages=self.chat_history,
|
||||
api_key=self.llm_config.api_key.get_secret_value(),
|
||||
temperature=self.llm_config.temperature,
|
||||
base_url=self.llm_config.base_url,
|
||||
)
|
||||
|
||||
reply = response.choices[0].message.content
|
||||
self.chat_history.append({'role': 'assistant', 'content': reply})
|
||||
self.turns += 1
|
||||
return reply
|
||||
|
||||
|
||||
# Global variable for fake user
|
||||
fake_user = None
|
||||
|
||||
|
||||
def get_fake_user_response(state: State) -> str:
|
||||
global fake_user
|
||||
if not fake_user:
|
||||
return 'Please continue working on the task.'
|
||||
last_agent_message = state.get_last_agent_message()
|
||||
if last_agent_message:
|
||||
return fake_user.generate_reply(last_agent_message)
|
||||
return 'Please continue working on the task.'
|
||||
|
||||
|
||||
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
|
||||
'CodeActAgent': get_fake_user_response,
|
||||
}
|
||||
|
||||
|
||||
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
|
||||
instance_copy = instance.copy()
|
||||
instance_copy.problem_statement = f'{instance.problem_statement}\n\nHints:\nThe user has not provided all the necessary details about the issue, and there are some hidden details that are helpful. Please ask the user specific questions using non-code commands to gather the relevant information that the user has to help you solve the issue. Ensure you have all the details you require to solve the issue.'
|
||||
return base_get_instruction(instance_copy, metadata)
|
||||
|
||||
|
||||
def process_instance(
|
||||
instance: pd.Series,
|
||||
metadata: EvalMetadata,
|
||||
reset_logger: bool = True,
|
||||
) -> EvalOutput:
|
||||
config = get_config(instance, metadata)
|
||||
global fake_user
|
||||
original_issue = instance.original_issue
|
||||
issue = str(original_issue)
|
||||
fake_user = FakeUser(issue=issue, hints=instance.hints_text, files=instance.files)
|
||||
|
||||
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
|
||||
if reset_logger:
|
||||
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
|
||||
reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
|
||||
else:
|
||||
logger.info(f'Starting evaluation for instance {instance.instance_id}.')
|
||||
|
||||
runtime = create_runtime(config)
|
||||
call_async_from_sync(runtime.connect)
|
||||
|
||||
try:
|
||||
initialize_runtime(runtime, instance, metadata)
|
||||
|
||||
message_action = get_instruction(instance, metadata)
|
||||
|
||||
# Here's how you can run the agent (similar to the `main` function) and get the final task state
|
||||
state: State | None = asyncio.run(
|
||||
run_controller(
|
||||
config=config,
|
||||
initial_user_action=message_action,
|
||||
runtime=runtime,
|
||||
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
|
||||
metadata.agent_class
|
||||
],
|
||||
)
|
||||
)
|
||||
|
||||
# if fatal error, throw EvalError to trigger re-run
|
||||
if (
|
||||
state
|
||||
and state.last_error
|
||||
and 'fatal error during agent execution' in state.last_error
|
||||
and 'stuck in a loop' not in state.last_error
|
||||
):
|
||||
raise EvalException('Fatal error detected: ' + state.last_error)
|
||||
|
||||
# Get git patch
|
||||
return_val = complete_runtime(runtime, instance)
|
||||
git_patch = return_val['git_patch']
|
||||
logger.info(
|
||||
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
|
||||
)
|
||||
finally:
|
||||
runtime.close()
|
||||
|
||||
# Prepare test result
|
||||
test_result = {
|
||||
'git_patch': git_patch,
|
||||
}
|
||||
|
||||
if state is None:
|
||||
raise ValueError('State should not be None.')
|
||||
|
||||
histories = [event_to_dict(event) for event in state.history]
|
||||
metrics = state.metrics.get() if state.metrics else None
|
||||
|
||||
# Save the output
|
||||
instruction = message_action.content
|
||||
if message_action.image_urls:
|
||||
instruction += (
|
||||
'\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
|
||||
)
|
||||
output = EvalOutput(
|
||||
instance_id=instance.instance_id,
|
||||
instruction=instruction,
|
||||
instance=instance.to_dict(),
|
||||
test_result=test_result,
|
||||
metadata=metadata,
|
||||
history=histories,
|
||||
metrics=metrics,
|
||||
error=state.last_error if state and state.last_error else None,
|
||||
)
|
||||
return output
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = get_parser()
|
||||
parser.add_argument(
|
||||
'--dataset',
|
||||
type=str,
|
||||
default='cmu-lti/interactive-swe',
|
||||
help='dataset to evaluate on',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--split',
|
||||
type=str,
|
||||
default='test',
|
||||
help='split to evaluate on',
|
||||
)
|
||||
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
# Load dataset from huggingface datasets
|
||||
dataset = load_dataset(args.dataset, split=args.split)
|
||||
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
|
||||
logger.info(
|
||||
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
|
||||
)
|
||||
llm_config = None
|
||||
if args.llm_config:
|
||||
llm_config = get_llm_config_arg(args.llm_config)
|
||||
llm_config.log_completions = True
|
||||
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
|
||||
llm_config.modify_params = False
|
||||
|
||||
if llm_config is None:
|
||||
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
|
||||
|
||||
# Get condenser config from environment variable
|
||||
condenser_name = os.environ.get('EVAL_CONDENSER')
|
||||
if condenser_name:
|
||||
condenser_config = get_condenser_config_arg(condenser_name)
|
||||
if condenser_config is None:
|
||||
raise ValueError(
|
||||
f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
|
||||
)
|
||||
else:
|
||||
# If no specific condenser config is provided via env var, default to NoOpCondenser
|
||||
condenser_config = NoOpCondenserConfig()
|
||||
logger.debug(
|
||||
'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
|
||||
)
|
||||
|
||||
details = {'mode': 'interact'}
|
||||
_agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
|
||||
|
||||
dataset_descrption = (
|
||||
args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
|
||||
)
|
||||
metadata = make_metadata(
|
||||
llm_config,
|
||||
dataset_descrption,
|
||||
args.agent_cls,
|
||||
args.max_iterations,
|
||||
args.eval_note,
|
||||
args.eval_output_dir,
|
||||
details=details,
|
||||
condenser_config=condenser_config,
|
||||
)
|
||||
|
||||
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
|
||||
print(f'### OUTPUT FILE: {output_file} ###')
|
||||
|
||||
# Run evaluation in iterative mode:
|
||||
# If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
|
||||
ITERATIVE_EVAL_MODE = (
|
||||
os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
|
||||
)
|
||||
ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
|
||||
os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
|
||||
)
|
||||
|
||||
if not ITERATIVE_EVAL_MODE:
|
||||
# load the dataset
|
||||
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
|
||||
if len(instances) > 0 and not isinstance(
|
||||
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
|
||||
):
|
||||
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
|
||||
instances[col] = instances[col].apply(lambda x: str(x))
|
||||
run_evaluation(
|
||||
instances,
|
||||
metadata,
|
||||
output_file,
|
||||
args.eval_num_workers,
|
||||
process_instance,
|
||||
timeout_seconds=8
|
||||
* 60
|
||||
* 60, # 8 hour PER instance should be more than enough
|
||||
max_retries=5,
|
||||
)
|
||||
else:
|
||||
critic = AgentFinishedCritic()
|
||||
|
||||
def get_cur_output_file_path(attempt: int) -> str:
|
||||
return (
|
||||
f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
|
||||
)
|
||||
|
||||
eval_ids = None
|
||||
for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
|
||||
cur_output_file = get_cur_output_file_path(attempt)
|
||||
logger.info(
|
||||
f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
|
||||
)
|
||||
|
||||
# For deterministic eval, we set temperature to 0.1 for (>1) attempt
|
||||
# so hopefully we get slightly different results
|
||||
if attempt > 1 and metadata.llm_config.temperature == 0:
|
||||
logger.info(
|
||||
f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
|
||||
)
|
||||
metadata.llm_config.temperature = 0.1
|
||||
|
||||
# Load instances - at first attempt, we evaluate all instances
|
||||
# On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
|
||||
instances = prepare_dataset(
|
||||
swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
|
||||
)
|
||||
if len(instances) > 0 and not isinstance(
|
||||
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
|
||||
):
|
||||
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
|
||||
instances[col] = instances[col].apply(lambda x: str(x))
|
||||
|
||||
# Run evaluation - but save them to cur_output_file
|
||||
logger.info(
|
||||
f'Evaluating {len(instances)} instances for attempt {attempt}...'
|
||||
)
|
||||
run_evaluation(
|
||||
instances,
|
||||
metadata,
|
||||
cur_output_file,
|
||||
args.eval_num_workers,
|
||||
process_instance,
|
||||
timeout_seconds=8
|
||||
* 60
|
||||
* 60, # 8 hour PER instance should be more than enough
|
||||
max_retries=5,
|
||||
)
|
||||
|
||||
# When eval is done, we update eval_ids to the instances that failed the current attempt
|
||||
instances_failed = []
|
||||
logger.info(
|
||||
f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
|
||||
)
|
||||
with open(cur_output_file, 'r') as f:
|
||||
for line in f:
|
||||
instance = json.loads(line)
|
||||
try:
|
||||
history = [
|
||||
event_from_dict(event) for event in instance['history']
|
||||
]
|
||||
critic_result = critic.evaluate(
|
||||
history, instance['test_result'].get('git_patch', '')
|
||||
)
|
||||
if not critic_result.success:
|
||||
instances_failed.append(instance['instance_id'])
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f'Error loading history for instance {instance["instance_id"]}: {e}'
|
||||
)
|
||||
instances_failed.append(instance['instance_id'])
|
||||
logger.info(
|
||||
f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
|
||||
)
|
||||
eval_ids = instances_failed
|
||||
|
||||
# If no instances failed, we break
|
||||
if len(instances_failed) == 0:
|
||||
break
|
||||
|
||||
# Then we should aggregate the results from all attempts into the original output file
|
||||
# and remove the intermediate files
|
||||
logger.info(
|
||||
'Aggregating results from all attempts into the original output file...'
|
||||
)
|
||||
fout = open(output_file, 'w')
|
||||
added_instance_ids = set()
|
||||
for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
|
||||
cur_output_file = get_cur_output_file_path(attempt)
|
||||
if not os.path.exists(cur_output_file):
|
||||
logger.warning(
|
||||
f'Intermediate output file {cur_output_file} does not exist. Skipping...'
|
||||
)
|
||||
continue
|
||||
|
||||
with open(cur_output_file, 'r') as f:
|
||||
for line in f:
|
||||
instance = json.loads(line)
|
||||
# Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
|
||||
if (
|
||||
instance['instance_id'] not in added_instance_ids
|
||||
and instance['test_result'].get('git_patch', '').strip()
|
||||
):
|
||||
fout.write(line)
|
||||
added_instance_ids.add(instance['instance_id'])
|
||||
logger.info(
|
||||
f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
|
||||
)
|
||||
fout.close()
|
||||
logger.info(
|
||||
f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
|
||||
)
|
||||
@@ -0,0 +1,131 @@
|
||||
#!/usr/bin/env bash
|
||||
set -eo pipefail
|
||||
|
||||
source "evaluation/utils/version_control.sh"
|
||||
|
||||
MODEL_CONFIG=$1
|
||||
COMMIT_HASH=$2
|
||||
AGENT=$3
|
||||
EVAL_LIMIT=$4
|
||||
MAX_ITER=$5
|
||||
NUM_WORKERS=$6
|
||||
SPLIT=$8
|
||||
N_RUNS=$9
|
||||
|
||||
|
||||
if [ -z "$NUM_WORKERS" ]; then
|
||||
NUM_WORKERS=1
|
||||
echo "Number of workers not specified, use default $NUM_WORKERS"
|
||||
fi
|
||||
checkout_eval_branch
|
||||
|
||||
if [ -z "$AGENT" ]; then
|
||||
echo "Agent not specified, use default CodeActAgent"
|
||||
AGENT="CodeActAgent"
|
||||
fi
|
||||
|
||||
if [ -z "$MAX_ITER" ]; then
|
||||
echo "MAX_ITER not specified, use default 100"
|
||||
MAX_ITER=100
|
||||
fi
|
||||
|
||||
if [ -z "$RUN_WITH_BROWSING" ]; then
|
||||
echo "RUN_WITH_BROWSING not specified, use default false"
|
||||
RUN_WITH_BROWSING=false
|
||||
fi
|
||||
|
||||
|
||||
if [ -z "$DATASET" ]; then
|
||||
echo "DATASET not specified, use default cmu-lti/interactive-swe"
|
||||
DATASET="cmu-lti/interactive-swe"
|
||||
fi
|
||||
|
||||
if [ -z "$SPLIT" ]; then
|
||||
echo "SPLIT not specified, use default test"
|
||||
SPLIT="test"
|
||||
fi
|
||||
|
||||
if [ -n "$EVAL_CONDENSER" ]; then
|
||||
echo "Using Condenser Config: $EVAL_CONDENSER"
|
||||
else
|
||||
echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
|
||||
fi
|
||||
|
||||
export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
|
||||
echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
|
||||
|
||||
get_openhands_version
|
||||
|
||||
echo "AGENT: $AGENT"
|
||||
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
|
||||
echo "MODEL_CONFIG: $MODEL_CONFIG"
|
||||
echo "DATASET: $DATASET"
|
||||
echo "SPLIT: $SPLIT"
|
||||
echo "MAX_ITER: $MAX_ITER"
|
||||
echo "NUM_WORKERS: $NUM_WORKERS"
|
||||
echo "COMMIT_HASH: $COMMIT_HASH"
|
||||
echo "EVAL_CONDENSER: $EVAL_CONDENSER"
|
||||
|
||||
# Default to NOT use Hint
|
||||
if [ -z "$USE_HINT_TEXT" ]; then
|
||||
export USE_HINT_TEXT=false
|
||||
fi
|
||||
echo "USE_HINT_TEXT: $USE_HINT_TEXT"
|
||||
EVAL_NOTE="$OPENHANDS_VERSION"
|
||||
# if not using Hint, add -no-hint to the eval note
|
||||
if [ "$USE_HINT_TEXT" = false ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-no-hint"
|
||||
fi
|
||||
|
||||
if [ "$RUN_WITH_BROWSING" = true ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-with-browsing"
|
||||
fi
|
||||
|
||||
if [ -n "$EXP_NAME" ]; then
|
||||
EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
|
||||
fi
|
||||
# Add condenser config to eval note if provided
|
||||
if [ -n "$EVAL_CONDENSER" ]; then
|
||||
EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
|
||||
fi
|
||||
|
||||
function run_eval() {
|
||||
local eval_note="${1}"
|
||||
COMMAND="poetry run python evaluation/benchmarks/swe_bench/run_infer_interact.py \
|
||||
--agent-cls $AGENT \
|
||||
--llm-config $MODEL_CONFIG \
|
||||
--max-iterations $MAX_ITER \
|
||||
--eval-num-workers $NUM_WORKERS \
|
||||
--eval-note $eval_note \
|
||||
--dataset $DATASET \
|
||||
--split $SPLIT"
|
||||
|
||||
if [ -n "$EVAL_LIMIT" ]; then
|
||||
echo "EVAL_LIMIT: $EVAL_LIMIT"
|
||||
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
|
||||
fi
|
||||
|
||||
# Run the command
|
||||
eval $COMMAND
|
||||
}
|
||||
|
||||
unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
|
||||
if [ -z "$N_RUNS" ]; then
|
||||
N_RUNS=1
|
||||
echo "N_RUNS not specified, use default $N_RUNS"
|
||||
fi
|
||||
|
||||
# Skip runs if the run number is in the SKIP_RUNS list
|
||||
# read from env variable SKIP_RUNS as a comma separated list of run numbers
|
||||
SKIP_RUNS=(${SKIP_RUNS//,/ })
|
||||
for i in $(seq 1 $N_RUNS); do
|
||||
if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
|
||||
echo "Skipping run $i"
|
||||
continue
|
||||
fi
|
||||
current_eval_note="$EVAL_NOTE-run_$i"
|
||||
echo "EVAL_NOTE: $current_eval_note"
|
||||
run_eval $current_eval_note
|
||||
done
|
||||
|
||||
checkout_original_branch
|
||||
@@ -1,6 +1,11 @@
|
||||
import { OpenHandsAction } from "#/types/core/actions";
|
||||
import { OpenHandsEventType } from "#/types/core/base";
|
||||
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
|
||||
import {
|
||||
isCommandAction,
|
||||
isCommandObservation,
|
||||
isOpenHandsAction,
|
||||
isOpenHandsObservation,
|
||||
} from "#/types/core/guards";
|
||||
import { OpenHandsObservation } from "#/types/core/observations";
|
||||
|
||||
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
|
||||
@@ -15,11 +20,21 @@ export const shouldRenderEvent = (
|
||||
event: OpenHandsAction | OpenHandsObservation,
|
||||
) => {
|
||||
if (isOpenHandsAction(event)) {
|
||||
if (isCommandAction(event) && event.source === "user") {
|
||||
// For user commands, we always hide them from the chat interface
|
||||
return false;
|
||||
}
|
||||
|
||||
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
|
||||
return !noRenderList.includes(event.action);
|
||||
}
|
||||
|
||||
if (isOpenHandsObservation(event)) {
|
||||
if (isCommandObservation(event) && event.source === "user") {
|
||||
// For user commands, we always hide them from the chat interface
|
||||
return false;
|
||||
}
|
||||
|
||||
return !COMMON_NO_RENDER_LIST.includes(event.observation);
|
||||
}
|
||||
|
||||
|
||||
@@ -2,32 +2,10 @@ import React from "react";
|
||||
import { OpenHandsAction } from "#/types/core/actions";
|
||||
import { OpenHandsObservation } from "#/types/core/observations";
|
||||
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
|
||||
import { OpenHandsEventType } from "#/types/core/base";
|
||||
import { EventMessage } from "./event-message";
|
||||
import { ChatMessage } from "./chat-message";
|
||||
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
|
||||
|
||||
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
|
||||
"system",
|
||||
"agent_state_changed",
|
||||
"change_agent_state",
|
||||
];
|
||||
|
||||
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
|
||||
|
||||
const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
|
||||
if (isOpenHandsAction(event)) {
|
||||
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
|
||||
return !noRenderList.includes(event.action);
|
||||
}
|
||||
|
||||
if (isOpenHandsObservation(event)) {
|
||||
return !COMMON_NO_RENDER_LIST.includes(event.observation);
|
||||
}
|
||||
|
||||
return true;
|
||||
};
|
||||
|
||||
interface MessagesProps {
|
||||
messages: (OpenHandsAction | OpenHandsObservation)[];
|
||||
isAwaitingUserConfirmation: boolean;
|
||||
@@ -54,7 +32,7 @@ export const Messages: React.FC<MessagesProps> = React.memo(
|
||||
|
||||
return (
|
||||
<>
|
||||
{messages.filter(shouldRenderEvent).map((message, index) => (
|
||||
{messages.map((message, index) => (
|
||||
<EventMessage
|
||||
key={index}
|
||||
event={message}
|
||||
|
||||
@@ -1,21 +1,14 @@
|
||||
import { useQueries, useQuery } from "@tanstack/react-query";
|
||||
import axios from "axios";
|
||||
import React from "react";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { RootState } from "#/store";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
|
||||
export const useActiveHost = () => {
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const [activeHost, setActiveHost] = React.useState<string | null>(null);
|
||||
const { conversationId } = useConversationId();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
|
||||
const { data } = useQuery({
|
||||
queryKey: [conversationId, "hosts"],
|
||||
@@ -23,7 +16,7 @@ export const useActiveHost = () => {
|
||||
const hosts = await OpenHands.getWebHosts(conversationId);
|
||||
return { hosts };
|
||||
},
|
||||
enabled,
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
initialData: { hosts: [] },
|
||||
meta: {
|
||||
disableToast: true,
|
||||
|
||||
@@ -1,23 +1,15 @@
|
||||
import { useQuery } from "@tanstack/react-query";
|
||||
import React from "react";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { GitChange } from "#/api/open-hands.types";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
|
||||
export const useGetGitChanges = () => {
|
||||
const { conversationId } = useConversationId();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const [orderedChanges, setOrderedChanges] = React.useState<GitChange[]>([]);
|
||||
const previousDataRef = React.useRef<GitChange[]>(null);
|
||||
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
|
||||
const result = useQuery({
|
||||
queryKey: ["file_changes", conversationId],
|
||||
@@ -25,7 +17,7 @@ export const useGetGitChanges = () => {
|
||||
retry: false,
|
||||
staleTime: 1000 * 60 * 5, // 5 minutes
|
||||
gcTime: 1000 * 60 * 15, // 15 minutes
|
||||
enabled,
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
meta: {
|
||||
disableToast: true,
|
||||
},
|
||||
|
||||
@@ -1,13 +1,10 @@
|
||||
import { useQuery } from "@tanstack/react-query";
|
||||
import { useTranslation } from "react-i18next";
|
||||
import { useSelector } from "react-redux";
|
||||
import OpenHands from "#/api/open-hands";
|
||||
import { useConversationId } from "#/hooks/use-conversation-id";
|
||||
import { I18nKey } from "#/i18n/declaration";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
|
||||
import { useActiveConversation } from "./use-active-conversation";
|
||||
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
|
||||
|
||||
// Define the return type for the VS Code URL query
|
||||
interface VSCodeUrlResult {
|
||||
@@ -18,11 +15,7 @@ interface VSCodeUrlResult {
|
||||
export const useVSCodeUrl = () => {
|
||||
const { t } = useTranslation();
|
||||
const { conversationId } = useConversationId();
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
const enabled =
|
||||
conversation?.status === "RUNNING" &&
|
||||
RUNTIME_INACTIVE_STATES.includes(curAgentState);
|
||||
const runtimeIsReady = useRuntimeIsReady();
|
||||
|
||||
return useQuery<VSCodeUrlResult>({
|
||||
queryKey: ["vscode_url", conversationId],
|
||||
@@ -40,7 +33,7 @@ export const useVSCodeUrl = () => {
|
||||
error: t(I18nKey.VSCODE$URL_NOT_AVAILABLE),
|
||||
};
|
||||
},
|
||||
enabled,
|
||||
enabled: runtimeIsReady && !!conversationId,
|
||||
refetchOnMount: true,
|
||||
retry: 3,
|
||||
});
|
||||
|
||||
@@ -0,0 +1,19 @@
|
||||
import { useSelector } from "react-redux";
|
||||
import { RootState } from "#/store";
|
||||
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
|
||||
import { useActiveConversation } from "./query/use-active-conversation";
|
||||
|
||||
/**
|
||||
* Hook to determine if the runtime is ready for operations
|
||||
*
|
||||
* @returns boolean indicating if the runtime is ready
|
||||
*/
|
||||
export const useRuntimeIsReady = (): boolean => {
|
||||
const { data: conversation } = useActiveConversation();
|
||||
const { curAgentState } = useSelector((state: RootState) => state.agent);
|
||||
|
||||
return (
|
||||
conversation?.status === "RUNNING" &&
|
||||
!RUNTIME_INACTIVE_STATES.includes(curAgentState)
|
||||
);
|
||||
};
|
||||
@@ -20,7 +20,7 @@ export interface SystemMessageAction extends OpenHandsActionEvent<"system"> {
|
||||
}
|
||||
|
||||
export interface CommandAction extends OpenHandsActionEvent<"run"> {
|
||||
source: "agent";
|
||||
source: "agent" | "user";
|
||||
args: {
|
||||
command: string;
|
||||
security_risk: ActionSecurityRisk;
|
||||
|
||||
@@ -4,6 +4,7 @@ import {
|
||||
AssistantMessageAction,
|
||||
OpenHandsAction,
|
||||
SystemMessageAction,
|
||||
CommandAction,
|
||||
} from "./actions";
|
||||
import {
|
||||
AgentStateChangeObservation,
|
||||
@@ -41,6 +42,10 @@ export const isErrorObservation = (
|
||||
): event is ErrorObservation =>
|
||||
isOpenHandsObservation(event) && event.observation === "error";
|
||||
|
||||
export const isCommandAction = (
|
||||
event: OpenHandsParsedEvent,
|
||||
): event is CommandAction => isOpenHandsAction(event) && event.action === "run";
|
||||
|
||||
export const isAgentStateChangeObservation = (
|
||||
event: OpenHandsParsedEvent,
|
||||
): event is AgentStateChangeObservation =>
|
||||
|
||||
@@ -11,7 +11,7 @@ export interface AgentStateChangeObservation
|
||||
}
|
||||
|
||||
export interface CommandObservation extends OpenHandsObservationEvent<"run"> {
|
||||
source: "agent";
|
||||
source: "agent" | "user";
|
||||
extras: {
|
||||
command: string;
|
||||
hidden?: boolean;
|
||||
|
||||
@@ -582,7 +582,7 @@ def _extract_and_validate_params(
|
||||
found_params = set()
|
||||
for param_match in param_matches:
|
||||
param_name = param_match.group(1)
|
||||
param_value = param_match.group(2).strip()
|
||||
param_value = param_match.group(2)
|
||||
|
||||
# Validate parameter is allowed
|
||||
if allowed_params and param_name not in allowed_params:
|
||||
|
||||
@@ -1013,12 +1013,12 @@ if __name__ == '__main__':
|
||||
|
||||
if not os.path.exists(full_path):
|
||||
# if user just removed a folder, prevent server error 500 in UI
|
||||
return []
|
||||
return JSONResponse(content=[])
|
||||
|
||||
try:
|
||||
# Check if the directory exists
|
||||
if not os.path.exists(full_path) or not os.path.isdir(full_path):
|
||||
return []
|
||||
return JSONResponse(content=[])
|
||||
|
||||
entries = os.listdir(full_path)
|
||||
|
||||
@@ -1047,11 +1047,11 @@ if __name__ == '__main__':
|
||||
|
||||
# Combine sorted directories and files
|
||||
sorted_entries = directories + files
|
||||
return sorted_entries
|
||||
return JSONResponse(content=sorted_entries)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f'Error listing files: {e}')
|
||||
return []
|
||||
return JSONResponse(content=[])
|
||||
|
||||
logger.debug(f'Starting action execution API on port {args.port}')
|
||||
run(app, host='0.0.0.0', port=args.port)
|
||||
|
||||
@@ -48,6 +48,34 @@ def create_provider_tokens_object(
|
||||
return MappingProxyType(provider_information)
|
||||
|
||||
|
||||
async def setup_init_convo_settings(
|
||||
user_id: str | None, providers_set: list[ProviderType]
|
||||
) -> ConversationInitData:
|
||||
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
|
||||
settings = await settings_store.load()
|
||||
|
||||
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
|
||||
user_secrets: UserSecrets | None = await secrets_store.load()
|
||||
|
||||
if not settings:
|
||||
raise ConnectionRefusedError(
|
||||
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
|
||||
)
|
||||
|
||||
session_init_args: dict = {}
|
||||
session_init_args = {**settings.__dict__, **session_init_args}
|
||||
|
||||
git_provider_tokens = create_provider_tokens_object(providers_set)
|
||||
if server_config.app_mode != AppMode.SAAS and user_secrets:
|
||||
git_provider_tokens = user_secrets.provider_tokens
|
||||
|
||||
session_init_args['git_provider_tokens'] = git_provider_tokens
|
||||
if user_secrets:
|
||||
session_init_args['custom_secrets'] = user_secrets.custom_secrets
|
||||
|
||||
return ConversationInitData(**session_init_args)
|
||||
|
||||
|
||||
@sio.event
|
||||
async def connect(connection_id: str, environ: dict) -> None:
|
||||
try:
|
||||
@@ -85,30 +113,7 @@ async def connect(connection_id: str, environ: dict) -> None:
|
||||
conversation_id, cookies_str, authorization_header
|
||||
)
|
||||
|
||||
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
|
||||
settings = await settings_store.load()
|
||||
|
||||
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
|
||||
user_secrets: UserSecrets | None = await secrets_store.load()
|
||||
|
||||
if not settings:
|
||||
raise ConnectionRefusedError(
|
||||
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
|
||||
)
|
||||
session_init_args: dict = {}
|
||||
if settings:
|
||||
session_init_args = {**settings.__dict__, **session_init_args}
|
||||
|
||||
git_provider_tokens = create_provider_tokens_object(providers_set)
|
||||
if server_config.app_mode != AppMode.SAAS and user_secrets:
|
||||
git_provider_tokens = user_secrets.provider_tokens
|
||||
|
||||
session_init_args['git_provider_tokens'] = git_provider_tokens
|
||||
if user_secrets:
|
||||
session_init_args['custom_secrets'] = user_secrets.custom_secrets
|
||||
|
||||
conversation_init_data = ConversationInitData(**session_init_args)
|
||||
|
||||
conversation_init_data = await setup_init_convo_settings(user_id, providers_set)
|
||||
agent_loop_info = await conversation_manager.join_conversation(
|
||||
conversation_id,
|
||||
connection_id,
|
||||
|
||||
@@ -188,10 +188,7 @@ async def load_custom_secrets_names(
|
||||
) -> GETCustomSecrets | JSONResponse:
|
||||
try:
|
||||
if not user_secrets:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
content={'error': 'User secrets not found'},
|
||||
)
|
||||
return GETCustomSecrets(custom_secrets=[])
|
||||
|
||||
custom_secrets: list[CustomSecretWithoutValueModel] = []
|
||||
if user_secrets.custom_secrets:
|
||||
@@ -220,31 +217,30 @@ async def create_custom_secret(
|
||||
) -> JSONResponse:
|
||||
try:
|
||||
existing_secrets = await secrets_store.load()
|
||||
if existing_secrets:
|
||||
custom_secrets = dict(existing_secrets.custom_secrets)
|
||||
custom_secrets = dict(existing_secrets.custom_secrets) if existing_secrets else {}
|
||||
|
||||
secret_name = incoming_secret.name
|
||||
secret_value = incoming_secret.value
|
||||
secret_description = incoming_secret.description
|
||||
secret_name = incoming_secret.name
|
||||
secret_value = incoming_secret.value
|
||||
secret_description = incoming_secret.description
|
||||
|
||||
if secret_name in custom_secrets:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
content={'message': f'Secret {secret_name} already exists'},
|
||||
)
|
||||
|
||||
custom_secrets[secret_name] = CustomSecret(
|
||||
secret=secret_value,
|
||||
description=secret_description or '',
|
||||
if secret_name in custom_secrets:
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
content={'message': f'Secret {secret_name} already exists'},
|
||||
)
|
||||
|
||||
# Create a new UserSecrets that preserves provider tokens
|
||||
updated_user_secrets = UserSecrets(
|
||||
custom_secrets=custom_secrets,
|
||||
provider_tokens=existing_secrets.provider_tokens,
|
||||
)
|
||||
custom_secrets[secret_name] = CustomSecret(
|
||||
secret=secret_value,
|
||||
description=secret_description or '',
|
||||
)
|
||||
|
||||
await secrets_store.store(updated_user_secrets)
|
||||
# Create a new UserSecrets that preserves provider tokens
|
||||
updated_user_secrets = UserSecrets(
|
||||
custom_secrets=custom_secrets,
|
||||
provider_tokens=existing_secrets.provider_tokens if existing_secrets else {},
|
||||
)
|
||||
|
||||
await secrets_store.store(updated_user_secrets)
|
||||
|
||||
return JSONResponse(
|
||||
status_code=status.HTTP_201_CREATED,
|
||||
|
||||
@@ -683,6 +683,29 @@ def test_agent_config_condenser_with_no_enabled():
|
||||
assert isinstance(agent_config.condenser, NoOpCondenserConfig)
|
||||
|
||||
|
||||
def test_sandbox_volumes_toml(default_config, temp_toml_file):
|
||||
"""Test that volumes configuration under [sandbox] works correctly."""
|
||||
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
|
||||
toml_file.write("""
|
||||
[sandbox]
|
||||
volumes = "/home/user/mydir:/workspace:rw,/data:/data:ro"
|
||||
timeout = 1
|
||||
""")
|
||||
|
||||
load_from_toml(default_config, temp_toml_file)
|
||||
finalize_config(default_config)
|
||||
|
||||
# Check that sandbox.volumes is set correctly
|
||||
assert (
|
||||
default_config.sandbox.volumes
|
||||
== '/home/user/mydir:/workspace:rw,/data:/data:ro'
|
||||
)
|
||||
assert default_config.workspace_mount_path == '/home/user/mydir'
|
||||
assert default_config.workspace_mount_path_in_sandbox == '/workspace'
|
||||
assert default_config.workspace_base == '/home/user/mydir'
|
||||
assert default_config.sandbox.timeout == 1
|
||||
|
||||
|
||||
def test_condenser_config_from_toml_basic(default_config, temp_toml_file):
|
||||
"""Test loading basic condenser configuration from TOML."""
|
||||
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
|
||||
|
||||
@@ -652,6 +652,34 @@ NON_FNCALL_RESPONSE_MESSAGE = {
|
||||
<parameter=command>view</parameter>
|
||||
<parameter=path>/test/file.py</parameter>
|
||||
<parameter=view_range>[1, 10]</parameter>
|
||||
</function>""",
|
||||
),
|
||||
# Test case with indented code block to verify indentation is preserved
|
||||
(
|
||||
[
|
||||
{
|
||||
'index': 1,
|
||||
'function': {
|
||||
'arguments': '{"command": "str_replace", "path": "/test/file.py", "old_str": "def example():\\n pass", "new_str": "def example():\\n # This is indented\\n print(\\"hello\\")\\n return True"}',
|
||||
'name': 'str_replace_editor',
|
||||
},
|
||||
'id': 'test_id',
|
||||
'type': 'function',
|
||||
}
|
||||
],
|
||||
"""<function=str_replace_editor>
|
||||
<parameter=command>str_replace</parameter>
|
||||
<parameter=path>/test/file.py</parameter>
|
||||
<parameter=old_str>
|
||||
def example():
|
||||
pass
|
||||
</parameter>
|
||||
<parameter=new_str>
|
||||
def example():
|
||||
# This is indented
|
||||
print("hello")
|
||||
return True
|
||||
</parameter>
|
||||
</function>""",
|
||||
),
|
||||
],
|
||||
|
||||
@@ -138,6 +138,39 @@ async def test_add_custom_secret(test_client, file_secrets_store):
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_custom_secret_with_no_existing_secrets(
|
||||
test_client, file_secrets_store
|
||||
):
|
||||
"""Test creating a custom secret when there are no existing secrets at all."""
|
||||
|
||||
# Don't store any initial settings - this simulates a completely new user
|
||||
# or a situation where the secrets store is empty
|
||||
|
||||
# Make the POST request to add a custom secret
|
||||
add_secret_data = {
|
||||
'name': 'NEW_API_KEY',
|
||||
'value': 'new-api-key-value',
|
||||
'description': 'Test API Key',
|
||||
}
|
||||
response = test_client.post('/api/secrets', json=add_secret_data)
|
||||
assert response.status_code == 201
|
||||
|
||||
# Verify that the settings were stored with the new secret
|
||||
stored_settings = await file_secrets_store.load()
|
||||
|
||||
# Check that the secret was added
|
||||
assert 'NEW_API_KEY' in stored_settings.custom_secrets
|
||||
assert (
|
||||
stored_settings.custom_secrets['NEW_API_KEY'].secret.get_secret_value()
|
||||
== 'new-api-key-value'
|
||||
)
|
||||
assert stored_settings.custom_secrets['NEW_API_KEY'].description == 'Test API Key'
|
||||
|
||||
# Check that provider_tokens is an empty dict, not None
|
||||
assert stored_settings.provider_tokens == {}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_update_existing_custom_secret(test_client, file_secrets_store):
|
||||
"""Test updating an existing custom secret's name and description (cannot change value once set)."""
|
||||
|
||||
Reference in New Issue
Block a user