Compare commits

...

11 Commits

Author SHA1 Message Date
Robert Brennan 04112e4a24 outline refactor 2025-05-27 21:56:24 -04:00
openhands 7b19ee0d5e Add documentation for using OpenHands as a library 2025-05-28 01:06:57 +00:00
Kent Johnson 4b6f2aeb4d docs: Mention dev container in Development.md (#8726) 2025-05-27 18:29:05 -04:00
Rohit Malhotra 0023eb0982 (Hotfix): Handle cases where user secrets store doesn't exist (#8745)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-27 18:26:36 -04:00
Robert Brennan c3ab4b480b Fix TypeError in list_files endpoint while preserving router_error_log functionality (#8744)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-27 18:25:07 -04:00
Xingyao Wang 35f7efb9d7 Fix: Remove strip() from parameter value extraction to preserve indentation (#8739)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-27 20:24:00 +00:00
Xuhui Zhou 14498c5e25 Feature/swe run interact (#8714)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-05-27 19:35:21 +00:00
sp.wack cdb9aeb9ba fix(frontend): Don't show terminal commands in chat interface that are from the user (#8729) 2025-05-27 18:59:32 +00:00
Robert Brennan 318883e5e0 Fix VS Code tab and other runtime-dependent features showing null (#8734)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-27 18:41:31 +00:00
Rohit Malhotra 767b6ce600 [Refactor]: separate args setup logic for restarting conversations (#8679)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-05-27 13:16:33 -04:00
Xingyao Wang 3ccc96d794 Fix(docs): volumes configuration under [sandbox] in config.toml (#8724)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-28 00:30:07 +08:00
27 changed files with 1363 additions and 121 deletions
+23 -10
View File
@@ -1,8 +1,10 @@
# Development Guide
This guide is for people working on OpenHands and editing the source code.
If you wish to contribute your changes, check out the [CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md) on how to clone and setup the project
initially before moving on. Otherwise, you can clone the OpenHands project directly.
If you wish to contribute your changes, check out the
[CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md)
on how to clone and setup the project initially before moving on. Otherwise,
you can clone the OpenHands project directly.
## Start the Server for Development
@@ -19,9 +21,20 @@ initially before moving on. Otherwise, you can clone the OpenHands project direc
Make sure you have all these dependencies installed before moving on to `make build`.
#### Dev container
There is a [dev container](https://containers.dev/) available which provides a
pre-configured environment with all the necessary dependencies installed if you
are using a [supported editor or tool](https://containers.dev/supporting). For
example, if you are using Visual Studio Code (VS Code) with the
[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
extension installed, you can open the project in a dev container by using the
_Dev Container: Reopen in Container_ command from the Command Palette
(Ctrl+Shift+P).
#### Develop without sudo access
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJs`, you can use
`conda` or `mamba` to manage the packages for you:
```bash
@@ -37,7 +50,7 @@ mamba install conda-forge::poetry
### 2. Build and Setup The Environment
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
that OpenHands is ready to run on your system:
```bash
@@ -54,11 +67,11 @@ To configure the LM of your choice, run:
make setup-config
```
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
please set the model in the UI.
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
Note: If you have previously run OpenHands using the docker command, you may have already set some environmental
variables in your terminal. The final configurations are set from highest to lowest priority:
Environment variables > config.toml variables > default variables
@@ -77,14 +90,14 @@ make run
#### Option B: Individual Server Startup
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
backend-related tasks or configurations.
```bash
make start-backend
```
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
components or interface enhancements.
```bash
make start-frontend
@@ -120,7 +133,7 @@ poetry run pytest ./tests/unit/test_*.py
### 9. Use existing Docker image
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.39-nikolaik`
+19
View File
@@ -0,0 +1,19 @@
[ ] Rename `Conversation` in openhands/server to `ServerConversation`
[ ] Replace all instances of `sid` in openhands/* to `conversation_id`
[ ] Make EventStream take in a `conversation_id` in its constructor.
* remove `conversation_id` from all methods on EventStream and use self.conversation_id instead.
* fix all callers of EventStream to pass in `conversation_id` in the constructor and remove it from the method calls.
[ ] Rename AppConfig to OpenHandsConfig
[ ] Create a new class `Conversation` in openhands/core/ that will be the main interface for conversations.
* Its constructor will take in a:
* conversation_id (string)
* Runtime
* LLM
* EventStream
* AgentController
* No logic, it's just a dataclass
[ ] Add a new OpenHands class to openhands/core/ which will take care of creating Conversations
* Constructor is ONLY an OpenHandsConfig
* Only one method: `create_conversation()`
* This will create a Runtime, LLM, EventStream, and AgentController, and return a Conversation object.
* These objects will be created according to the OpenHandsConfig passed in to the constructor.
+9
View File
@@ -325,6 +325,15 @@ classpath = "my_package.my_module.MyCustomAgent"
# Useful when deploying OpenHands in a remote machine where you need to expose a specific port.
#vscode_port = 41234
# Volume mounts in the format 'host_path:container_path[:mode]'
# e.g. '/my/host/dir:/workspace:rw'
# Multiple mounts can be specified using commas
# e.g. '/path1:/workspace/path1,/path2:/workspace/path2:ro'
# Configure volumes under the [sandbox] section:
# [sandbox]
# volumes = "/my/host/dir:/workspace:rw,/path2:/workspace/path2:ro"
#################################### Security ###################################
# Configuration for security features
##############################################################################
+84
View File
@@ -0,0 +1,84 @@
# Using OpenHands as a library
## Hello World
```python
import asyncio
from openhands.core.config import OpenHandsConfig, LLMConfig, AgentConfig
from openhands.core.setup import run_agent
async def run_openhands_agent():
final_state = await run_agent(
config=OpenHandsConfig(
llm=LLMConfig(
model="claude-sonnet-4-20250514",
api_key="your_api_key_here", # Replace with your actual API key
),
),
initial_user_message="Flip a coin",
context_message="You build simple programs and run them.",
)
return final_state
# Run the async function
if __name__ == "__main__":
final_state = asyncio.run(run_openhands_agent())
print("Agent execution completed!")
```
## Using the internals
```python
import asyncio
from openhands.controller.agent import Agent
from openhands.core.config import OpenHandsConfig, LLMConfig, AgentConfig
from openhands.events.action import MessageAction
from openhands.llm.llm import LLM
from openhands.core.setup import (
create_runtime,
create_memory,
generate_sid,
)
from openhands.core.main import run_controller
async def run_openhands_agent():
config = OpenHandsConfig(
runtime="local",
file_store="memory",
llm=LLMConfig(
model="claude-sonnet-4-20250514", # Choose your preferred model
api_key="your_api_key_here", # Replace with your actual API key
temperature=0.0, # Set temperature to 0 for deterministic output
),
agent=AgentConfig(
enable_browsing=False,
),
)
oh = OpenHands(config=config)
conversation = oh.create_conversation(
conversation_id='hello-world',
)
await conversation.runtime.connect()
def on_event(event: Event) -> None:
print(f"Event received: {event}")
conversation.event_stream.subscribe(EventStreamSubscriber.MAIN, on_event)
initial_user_action = MessageAction(content="Flip a coin")
conversation.event_stream.add_event(initial_user_action, EventSource.USER)
while conversation.state.agent_state not in end_states:
await asyncio.sleep(1)
await runtime.close()
return conversation.state
# Run the async function
if __name__ == "__main__":
final_state = asyncio.run(run_openhands_agent())
print("Agent execution completed!")
```
+1 -1
View File
@@ -1,5 +1,5 @@
{
"items": ["python/python"],
"items": ["python/python", "python/using-openhands-as-library"],
"label": "Backend",
"type": "category"
}
@@ -0,0 +1,399 @@
# Using OpenHands as a Library
OpenHands can be used as a Python library in your own applications. This guide will show you how to integrate OpenHands into your Python projects, allowing you to build custom applications that leverage OpenHands' powerful agent capabilities.
## Installation
First, install the OpenHands library from PyPI:
```bash
pip install openhands-ai
```
## Basic Usage
Here's a simple example of how to use OpenHands in your Python code:
```python
import asyncio
from openhands.controller.agent import Agent
from openhands.core.config import AppConfig, LLMConfig, AgentConfig
from openhands.events.action import MessageAction
from openhands.llm.llm import LLM
from openhands.core.setup import (
create_runtime,
create_memory,
generate_sid,
)
from openhands.core.main import run_controller
async def run_openhands_agent():
# 1. Create configuration
config = AppConfig(
runtime="local", # Use local runtime
file_store="memory", # Store events in memory
)
# 2. Configure LLM
llm_config = LLMConfig(
model="claude-sonnet-4-20250514", # Choose your preferred model
api_key="your_api_key_here", # Replace with your actual API key
temperature=0.0,
)
config.set_llm_config(llm_config)
# 3. Configure Agent
agent_config = AgentConfig(
enable_browsing=False, # Disable browsing for this example
)
config.set_agent_config(agent_config)
# 4. Create Agent
agent = Agent(
llm=LLM(config=llm_config),
config=agent_config,
)
# 5. Generate a session ID
sid = generate_sid(config)
# 6. Create Runtime
runtime = create_runtime(
config=config,
sid=sid,
headless_mode=True,
agent=agent,
)
# 7. Connect to the runtime
await runtime.connect()
# 8. Create Memory
memory = create_memory(
runtime=runtime,
event_stream=runtime.event_stream,
sid=sid,
)
# 9. Define the initial task
initial_user_action = MessageAction(content="Write a Python function that calculates the factorial of a number")
# 10. Run the agent
final_state = await run_controller(
config=config,
initial_user_action=initial_user_action,
sid=sid,
runtime=runtime,
agent=agent,
memory=memory,
headless_mode=True,
exit_on_message=True, # Exit when the agent asks for user input
)
# 11. Close the runtime
await runtime.close()
return final_state
# Run the async function
if __name__ == "__main__":
final_state = asyncio.run(run_openhands_agent())
print("Agent execution completed!")
```
## Components Overview
### AppConfig
The `AppConfig` class is the main configuration object for OpenHands. It contains settings for the runtime, agent, LLM, and more.
```python
from openhands.core.config import AppConfig
config = AppConfig(
runtime="local", # Options: "local", "docker", "e2b", "modal", etc.
file_store="memory", # Options: "memory", "local", etc.
file_store_path="/path/to/store", # Only needed for "local" file_store
max_iterations=100, # Maximum number of agent iterations
)
```
### LLMConfig
The `LLMConfig` class configures the language model used by the agent.
```python
from openhands.core.config import LLMConfig
llm_config = LLMConfig(
model="claude-sonnet-4-20250514", # Model name
api_key="your_api_key_here", # API key
temperature=0.0, # Temperature for generation
max_output_tokens=4096, # Maximum tokens in the response
)
```
### AgentConfig
The `AgentConfig` class configures the agent's behavior and available tools.
```python
from openhands.core.config import AgentConfig
agent_config = AgentConfig(
enable_browsing=True, # Enable web browsing
enable_cmd=True, # Enable bash commands
enable_editor=True, # Enable file editing
enable_jupyter=True, # Enable Jupyter notebook
enable_think=True, # Enable thinking tool
enable_finish=True, # Enable finish tool
)
```
### Agent
The `Agent` class represents the AI agent that will perform tasks.
```python
from openhands.controller.agent import Agent
from openhands.llm.llm import LLM
agent = Agent(
llm=LLM(config=llm_config),
config=agent_config,
)
```
### Runtime
The runtime is the environment where the agent executes commands and interacts with the system.
```python
from openhands.core.setup import create_runtime
runtime = create_runtime(
config=config,
sid=sid,
headless_mode=True,
agent=agent,
)
```
### Memory
The memory component manages the agent's context and conversation history.
```python
from openhands.core.setup import create_memory
memory = create_memory(
runtime=runtime,
event_stream=runtime.event_stream,
sid=sid,
)
```
## Advanced Usage
### Custom Sandbox Configuration
You can customize the sandbox environment by configuring the `SandboxConfig`:
```python
from openhands.core.config import SandboxConfig
sandbox_config = SandboxConfig(
selected_repo="username/repo", # GitHub repository to clone
base_image="ubuntu:22.04", # Base Docker image
)
config.sandbox = sandbox_config
```
### Security Configuration
Configure security settings using the `SecurityConfig`:
```python
from openhands.core.config import SecurityConfig
security_config = SecurityConfig(
confirmation_mode=False, # Whether to require confirmation for actions
security_analyzer="default", # Security analyzer to use
)
config.security = security_config
```
### Custom Agent Response Handling
You can provide a custom function to handle agent responses:
```python
def custom_response_handler(state):
# Process the agent's state and generate a response
return "Continue with your current approach"
final_state = await run_controller(
config=config,
initial_user_action=initial_user_action,
fake_user_response_fn=custom_response_handler,
)
```
## Building a Complete Application
Here's an example of a more complete application that uses OpenHands to assist with code generation:
```python
import asyncio
import os
from openhands.controller.agent import Agent
from openhands.core.config import AppConfig, LLMConfig, AgentConfig, SandboxConfig
from openhands.events.action import MessageAction
from openhands.llm.llm import LLM
from openhands.core.setup import create_runtime, create_memory, generate_sid
from openhands.core.main import run_controller
from openhands.events import EventStreamSubscriber
from openhands.events.observation import AgentStateChangedObservation
from openhands.core.schema import AgentState
class CodeAssistant:
def __init__(self, api_key, model="claude-sonnet-4-20250514"):
self.api_key = api_key
self.model = model
self.config = None
self.agent = None
self.runtime = None
self.memory = None
self.sid = None
self.event_stream = None
async def initialize(self):
# Create configuration
self.config = AppConfig(
runtime="docker",
file_store="memory",
)
# Configure LLM
llm_config = LLMConfig(
model=self.model,
api_key=self.api_key,
temperature=0.0,
)
self.config.set_llm_config(llm_config)
# Configure Agent
agent_config = AgentConfig(
enable_browsing=True,
enable_cmd=True,
enable_editor=True,
enable_jupyter=True,
)
self.config.set_agent_config(agent_config)
# Configure Sandbox
sandbox_config = SandboxConfig(
base_image="ubuntu:22.04",
)
self.config.sandbox = sandbox_config
# Create Agent
self.agent = Agent(
llm=LLM(config=llm_config),
config=agent_config,
)
# Generate a session ID
self.sid = generate_sid(self.config)
# Create Runtime
self.runtime = create_runtime(
config=self.config,
sid=self.sid,
headless_mode=True,
agent=self.agent,
)
# Connect to the runtime
await self.runtime.connect()
# Create Memory
self.memory = create_memory(
runtime=self.runtime,
event_stream=self.runtime.event_stream,
sid=self.sid,
)
self.event_stream = self.runtime.event_stream
async def run_task(self, task_description, callback=None):
# Define the initial task
initial_user_action = MessageAction(content=task_description)
# Set up event callback if provided
if callback:
def on_event(event):
if isinstance(event, AgentStateChangedObservation):
callback(event)
self.event_stream.subscribe(
EventStreamSubscriber.MAIN,
on_event,
self.sid
)
# Run the agent
final_state = await run_controller(
config=self.config,
initial_user_action=initial_user_action,
sid=self.sid,
runtime=self.runtime,
agent=self.agent,
memory=self.memory,
headless_mode=True,
exit_on_message=True,
)
return final_state
async def close(self):
if self.runtime:
await self.runtime.close()
# Example usage
async def main():
# Initialize the code assistant
assistant = CodeAssistant(api_key=os.environ.get("ANTHROPIC_API_KEY"))
await assistant.initialize()
# Define a callback to process events
def event_callback(event):
if isinstance(event, AgentStateChangedObservation):
print(f"Agent state changed to: {event.agent_state}")
# Run a task
task = """
Create a simple Flask API with the following endpoints:
1. GET /users - Returns a list of users
2. GET /users/{id} - Returns a specific user
3. POST /users - Creates a new user
Use SQLite as the database and implement proper error handling.
"""
final_state = await assistant.run_task(task, callback=event_callback)
# Close the assistant
await assistant.close()
print("Task completed!")
if __name__ == "__main__":
asyncio.run(main())
```
## Conclusion
Using OpenHands as a library gives you the flexibility to integrate AI agents into your own applications. You can customize the agent's behavior, runtime environment, and how it interacts with your application.
For more advanced usage, refer to the OpenHands source code and API documentation. The library is highly customizable and can be adapted to a wide range of use cases.
@@ -331,6 +331,8 @@ The agent configuration options are defined in the `[agent]` and `[agent.<agent_
The sandbox configuration options are defined in the `[sandbox]` section of the `config.toml` file.
To use these with the docker command, pass in `-e SANDBOX_<option>`. Example: `-e SANDBOX_TIMEOUT`.
### Execution
@@ -2,6 +2,8 @@
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
**UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
**UPDATE (03/27/2025): We now support SWE-Bench multimodal evaluation! Simply use "princeton-nlp/SWE-bench_Multimodal" as the dataset name in the `run_infer.sh` script to evaluate on multimodal instances.**
@@ -0,0 +1,92 @@
# SWE-Interact Benchmark
This document explains how to use the [Interactive SWE-Bench](https://arxiv.org/abs/2502.13069) benchmark scripts for running and evaluating interactive software engineering tasks.
## Setting things up
After following the [README](./README.md) to set up the environment, you would need to additionally add LLM configurations for simulated human users. In the original [paper](https://arxiv.org/abs/2502.13069), we use gpt-4o as the simulated human user. You can add the following to your `config.toml` file:
```toml
[llm.fake_user]
model="litellm_proxy/gpt-4o-2024-08-06"
api_key="<your-api-key>"
temperature = 0.0
base_url = "https://llm-proxy.eval.all-hands.dev"
```
## Running the Benchmark
The main script for running the benchmark is `run_infer_interact.sh`. Here's how to use it:
```bash
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh <model_config> <commit_hash> <agent> <eval_limit> <max_iter> <num_workers> <split>
```
### Parameters:
- `model_config`: Path to the LLM configuration file (e.g., `llm.claude-3-7-sonnet`)
- `commit_hash`: Git commit hash to use (e.g., `HEAD`)
- `agent`: The agent class to use (e.g., `CodeActAgent`)
- `eval_limit`: Number of examples to evaluate (e.g., `500`)
- `max_iter`: Maximum number of iterations per task (e.g., `100`)
- `num_workers`: Number of parallel workers (e.g., `1`)
- `split`: Dataset split to use (e.g., `test`)
### Example:
```bash
bash ./evaluation/benchmarks/swe_bench/scripts/run_infer_interact.sh llm.claude-3-7-sonnet HEAD CodeActAgent 500 100 1 test
```
### Additional Environment Variables:
You can customize the behavior using these environment variables:
- `RUN_WITH_BROWSING`: Enable/disable web browsing (default: false)
- `USE_HINT_TEXT`: Enable/disable hint text (default: false)
- `EVAL_CONDENSER`: Specify a condenser configuration
- `EXP_NAME`: Add a custom experiment name to the output
- `N_RUNS`: Number of runs to perform (default: 1)
- `SKIP_RUNS`: Comma-separated list of run numbers to skip
## Evaluating Results
After running the benchmark, you can evaluate the results using `eval_infer.sh`:
```bash
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh <output_file> <instance_id> <dataset> <split>
```
### Parameters:
- `output_file`: Path to the output JSONL file
- `instance_id`: The specific instance ID to evaluate
- `dataset`: Dataset name (e.g., `cmu-lti/interactive-swe`)
- `split`: Dataset split (e.g., `test`)
### Example:
```bash
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/cmu-lti__interactive-swe-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.39.0-no-hint-run_1/output.jsonl sphinx-doc__sphinx-8721 cmu-lti/interactive-swe test
```
## Output Structure
The benchmark outputs are stored in the `evaluation/evaluation_outputs/outputs/` directory with the following structure:
```
evaluation/evaluation_outputs/outputs/
└── cmu-lti__interactive-swe-{split}/
└── {agent}/
└── {model}-{date}_maxiter_{max_iter}_N_{version}-{options}-run_{run_number}/
└── output.jsonl
```
Where:
- `{split}` is the dataset split (e.g., test)
- `{agent}` is the agent class name
- `{model}` is the model name
- `{date}` is the run date
- `{max_iter}` is the maximum iterations
- `{version}` is the OpenHands version
- `{options}` includes any additional options (e.g., no-hint, with-browsing)
- `{run_number}` is the run number
+411
View File
@@ -0,0 +1,411 @@
import asyncio
import json
import os
import pandas as pd
from datasets import load_dataset
from litellm import completion as litellm_completion
import openhands.agenthub
from evaluation.benchmarks.swe_bench.run_infer import (
AgentFinishedCritic,
complete_runtime,
filter_dataset,
get_config,
initialize_runtime,
)
from evaluation.benchmarks.swe_bench.run_infer import (
get_instruction as base_get_instruction,
)
from evaluation.utils.shared import (
EvalException,
EvalMetadata,
EvalOutput,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
)
from openhands.controller.state.state import State
from openhands.core.config import (
get_llm_config_arg,
get_parser,
)
from openhands.core.config.condenser_config import NoOpCondenserConfig
from openhands.core.config.utils import get_condenser_config_arg
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.events.action import MessageAction
from openhands.events.serialization.event import event_from_dict, event_to_dict
from openhands.utils.async_utils import call_async_from_sync
USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
USE_INSTANCE_IMAGE = os.environ.get('USE_INSTANCE_IMAGE', 'false').lower() == 'true'
RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'false'
class FakeUser:
def __init__(self, issue, hints, files):
self.system_message = f"""
You are a GitHub user reporting an issue. Here are the details of your issue and environment:
Issue: {issue}
Hints: {hints}
Files relative to your current directory: {files}
Your task is to respond to questions from a coder who is trying to solve your issue. The coder has a summarized version of the issue you have. Follow these rules:
1. If the coder asks a question that is directly related to the information in the issue you have, provide that information.
2. Always stay in character as a user reporting an issue, not as an AI assistant.
3. Keep your responses concise and to the point.
4. The coder has limited turns to solve the issue. Do not interact with the coder beyond 3 turns.
Respond with "I don't have that information" if the question is unrelated or you're unsure.
"""
self.chat_history = [{'role': 'system', 'content': self.system_message}]
self.turns = 0
# Get LLM config from config.toml
self.llm_config = get_llm_config_arg(
'llm.fake_user'
) # You can change 'fake_user' to any config name you want
def generate_reply(self, question):
if self.turns > 3:
return 'Please continue working on the task. Do NOT ask for more help.'
self.chat_history.append({'role': 'user', 'content': question.content})
response = litellm_completion(
model=self.llm_config.model,
messages=self.chat_history,
api_key=self.llm_config.api_key.get_secret_value(),
temperature=self.llm_config.temperature,
base_url=self.llm_config.base_url,
)
reply = response.choices[0].message.content
self.chat_history.append({'role': 'assistant', 'content': reply})
self.turns += 1
return reply
# Global variable for fake user
fake_user = None
def get_fake_user_response(state: State) -> str:
global fake_user
if not fake_user:
return 'Please continue working on the task.'
last_agent_message = state.get_last_agent_message()
if last_agent_message:
return fake_user.generate_reply(last_agent_message)
return 'Please continue working on the task.'
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': get_fake_user_response,
}
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
instance_copy = instance.copy()
instance_copy.problem_statement = f'{instance.problem_statement}\n\nHints:\nThe user has not provided all the necessary details about the issue, and there are some hidden details that are helpful. Please ask the user specific questions using non-code commands to gather the relevant information that the user has to help you solve the issue. Ensure you have all the details you require to solve the issue.'
return base_get_instruction(instance_copy, metadata)
def process_instance(
instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
) -> EvalOutput:
config = get_config(instance, metadata)
global fake_user
original_issue = instance.original_issue
issue = str(original_issue)
fake_user = FakeUser(issue=issue, hints=instance.hints_text, files=instance.files)
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
if reset_logger:
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
else:
logger.info(f'Starting evaluation for instance {instance.instance_id}.')
runtime = create_runtime(config)
call_async_from_sync(runtime.connect)
try:
initialize_runtime(runtime, instance, metadata)
message_action = get_instruction(instance, metadata)
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State | None = asyncio.run(
run_controller(
config=config,
initial_user_action=message_action,
runtime=runtime,
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
metadata.agent_class
],
)
)
# if fatal error, throw EvalError to trigger re-run
if (
state
and state.last_error
and 'fatal error during agent execution' in state.last_error
and 'stuck in a loop' not in state.last_error
):
raise EvalException('Fatal error detected: ' + state.last_error)
# Get git patch
return_val = complete_runtime(runtime, instance)
git_patch = return_val['git_patch']
logger.info(
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
)
finally:
runtime.close()
# Prepare test result
test_result = {
'git_patch': git_patch,
}
if state is None:
raise ValueError('State should not be None.')
histories = [event_to_dict(event) for event in state.history]
metrics = state.metrics.get() if state.metrics else None
# Save the output
instruction = message_action.content
if message_action.image_urls:
instruction += (
'\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
)
output = EvalOutput(
instance_id=instance.instance_id,
instruction=instruction,
instance=instance.to_dict(),
test_result=test_result,
metadata=metadata,
history=histories,
metrics=metrics,
error=state.last_error if state and state.last_error else None,
)
return output
if __name__ == '__main__':
parser = get_parser()
parser.add_argument(
'--dataset',
type=str,
default='cmu-lti/interactive-swe',
help='dataset to evaluate on',
)
parser.add_argument(
'--split',
type=str,
default='test',
help='split to evaluate on',
)
args, _ = parser.parse_known_args()
# Load dataset from huggingface datasets
dataset = load_dataset(args.dataset, split=args.split)
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
llm_config.log_completions = True
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
llm_config.modify_params = False
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
# Get condenser config from environment variable
condenser_name = os.environ.get('EVAL_CONDENSER')
if condenser_name:
condenser_config = get_condenser_config_arg(condenser_name)
if condenser_config is None:
raise ValueError(
f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
)
else:
# If no specific condenser config is provided via env var, default to NoOpCondenser
condenser_config = NoOpCondenserConfig()
logger.debug(
'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
)
details = {'mode': 'interact'}
_agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
dataset_descrption = (
args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
)
metadata = make_metadata(
llm_config,
dataset_descrption,
args.agent_cls,
args.max_iterations,
args.eval_note,
args.eval_output_dir,
details=details,
condenser_config=condenser_config,
)
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
print(f'### OUTPUT FILE: {output_file} ###')
# Run evaluation in iterative mode:
# If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
ITERATIVE_EVAL_MODE = (
os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
)
ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
)
if not ITERATIVE_EVAL_MODE:
# load the dataset
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=5,
)
else:
critic = AgentFinishedCritic()
def get_cur_output_file_path(attempt: int) -> str:
return (
f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
)
eval_ids = None
for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
cur_output_file = get_cur_output_file_path(attempt)
logger.info(
f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
)
# For deterministic eval, we set temperature to 0.1 for (>1) attempt
# so hopefully we get slightly different results
if attempt > 1 and metadata.llm_config.temperature == 0:
logger.info(
f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
)
metadata.llm_config.temperature = 0.1
# Load instances - at first attempt, we evaluate all instances
# On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
instances = prepare_dataset(
swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
# Run evaluation - but save them to cur_output_file
logger.info(
f'Evaluating {len(instances)} instances for attempt {attempt}...'
)
run_evaluation(
instances,
metadata,
cur_output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=5,
)
# When eval is done, we update eval_ids to the instances that failed the current attempt
instances_failed = []
logger.info(
f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
)
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
try:
history = [
event_from_dict(event) for event in instance['history']
]
critic_result = critic.evaluate(
history, instance['test_result'].get('git_patch', '')
)
if not critic_result.success:
instances_failed.append(instance['instance_id'])
except Exception as e:
logger.error(
f'Error loading history for instance {instance["instance_id"]}: {e}'
)
instances_failed.append(instance['instance_id'])
logger.info(
f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
)
eval_ids = instances_failed
# If no instances failed, we break
if len(instances_failed) == 0:
break
# Then we should aggregate the results from all attempts into the original output file
# and remove the intermediate files
logger.info(
'Aggregating results from all attempts into the original output file...'
)
fout = open(output_file, 'w')
added_instance_ids = set()
for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
cur_output_file = get_cur_output_file_path(attempt)
if not os.path.exists(cur_output_file):
logger.warning(
f'Intermediate output file {cur_output_file} does not exist. Skipping...'
)
continue
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
# Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
if (
instance['instance_id'] not in added_instance_ids
and instance['test_result'].get('git_patch', '').strip()
):
fout.write(line)
added_instance_ids.add(instance['instance_id'])
logger.info(
f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
)
fout.close()
logger.info(
f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
)
@@ -0,0 +1,131 @@
#!/usr/bin/env bash
set -eo pipefail
source "evaluation/utils/version_control.sh"
MODEL_CONFIG=$1
COMMIT_HASH=$2
AGENT=$3
EVAL_LIMIT=$4
MAX_ITER=$5
NUM_WORKERS=$6
SPLIT=$8
N_RUNS=$9
if [ -z "$NUM_WORKERS" ]; then
NUM_WORKERS=1
echo "Number of workers not specified, use default $NUM_WORKERS"
fi
checkout_eval_branch
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
if [ -z "$MAX_ITER" ]; then
echo "MAX_ITER not specified, use default 100"
MAX_ITER=100
fi
if [ -z "$RUN_WITH_BROWSING" ]; then
echo "RUN_WITH_BROWSING not specified, use default false"
RUN_WITH_BROWSING=false
fi
if [ -z "$DATASET" ]; then
echo "DATASET not specified, use default cmu-lti/interactive-swe"
DATASET="cmu-lti/interactive-swe"
fi
if [ -z "$SPLIT" ]; then
echo "SPLIT not specified, use default test"
SPLIT="test"
fi
if [ -n "$EVAL_CONDENSER" ]; then
echo "Using Condenser Config: $EVAL_CONDENSER"
else
echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
fi
export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
get_openhands_version
echo "AGENT: $AGENT"
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
echo "MODEL_CONFIG: $MODEL_CONFIG"
echo "DATASET: $DATASET"
echo "SPLIT: $SPLIT"
echo "MAX_ITER: $MAX_ITER"
echo "NUM_WORKERS: $NUM_WORKERS"
echo "COMMIT_HASH: $COMMIT_HASH"
echo "EVAL_CONDENSER: $EVAL_CONDENSER"
# Default to NOT use Hint
if [ -z "$USE_HINT_TEXT" ]; then
export USE_HINT_TEXT=false
fi
echo "USE_HINT_TEXT: $USE_HINT_TEXT"
EVAL_NOTE="$OPENHANDS_VERSION"
# if not using Hint, add -no-hint to the eval note
if [ "$USE_HINT_TEXT" = false ]; then
EVAL_NOTE="$EVAL_NOTE-no-hint"
fi
if [ "$RUN_WITH_BROWSING" = true ]; then
EVAL_NOTE="$EVAL_NOTE-with-browsing"
fi
if [ -n "$EXP_NAME" ]; then
EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
fi
# Add condenser config to eval note if provided
if [ -n "$EVAL_CONDENSER" ]; then
EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
fi
function run_eval() {
local eval_note="${1}"
COMMAND="poetry run python evaluation/benchmarks/swe_bench/run_infer_interact.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations $MAX_ITER \
--eval-num-workers $NUM_WORKERS \
--eval-note $eval_note \
--dataset $DATASET \
--split $SPLIT"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
}
unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
if [ -z "$N_RUNS" ]; then
N_RUNS=1
echo "N_RUNS not specified, use default $N_RUNS"
fi
# Skip runs if the run number is in the SKIP_RUNS list
# read from env variable SKIP_RUNS as a comma separated list of run numbers
SKIP_RUNS=(${SKIP_RUNS//,/ })
for i in $(seq 1 $N_RUNS); do
if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
echo "Skipping run $i"
continue
fi
current_eval_note="$EVAL_NOTE-run_$i"
echo "EVAL_NOTE: $current_eval_note"
run_eval $current_eval_note
done
checkout_original_branch
@@ -1,6 +1,11 @@
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsEventType } from "#/types/core/base";
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
import {
isCommandAction,
isCommandObservation,
isOpenHandsAction,
isOpenHandsObservation,
} from "#/types/core/guards";
import { OpenHandsObservation } from "#/types/core/observations";
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
@@ -15,11 +20,21 @@ export const shouldRenderEvent = (
event: OpenHandsAction | OpenHandsObservation,
) => {
if (isOpenHandsAction(event)) {
if (isCommandAction(event) && event.source === "user") {
// For user commands, we always hide them from the chat interface
return false;
}
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
return !noRenderList.includes(event.action);
}
if (isOpenHandsObservation(event)) {
if (isCommandObservation(event) && event.source === "user") {
// For user commands, we always hide them from the chat interface
return false;
}
return !COMMON_NO_RENDER_LIST.includes(event.observation);
}
@@ -2,32 +2,10 @@ import React from "react";
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsObservation } from "#/types/core/observations";
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
import { OpenHandsEventType } from "#/types/core/base";
import { EventMessage } from "./event-message";
import { ChatMessage } from "./chat-message";
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
"system",
"agent_state_changed",
"change_agent_state",
];
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
if (isOpenHandsAction(event)) {
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
return !noRenderList.includes(event.action);
}
if (isOpenHandsObservation(event)) {
return !COMMON_NO_RENDER_LIST.includes(event.observation);
}
return true;
};
interface MessagesProps {
messages: (OpenHandsAction | OpenHandsObservation)[];
isAwaitingUserConfirmation: boolean;
@@ -54,7 +32,7 @@ export const Messages: React.FC<MessagesProps> = React.memo(
return (
<>
{messages.filter(shouldRenderEvent).map((message, index) => (
{messages.map((message, index) => (
<EventMessage
key={index}
event={message}
+3 -10
View File
@@ -1,21 +1,14 @@
import { useQueries, useQuery } from "@tanstack/react-query";
import axios from "axios";
import React from "react";
import { useSelector } from "react-redux";
import OpenHands from "#/api/open-hands";
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
import { RootState } from "#/store";
import { useConversationId } from "#/hooks/use-conversation-id";
import { useActiveConversation } from "./use-active-conversation";
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
export const useActiveHost = () => {
const { curAgentState } = useSelector((state: RootState) => state.agent);
const [activeHost, setActiveHost] = React.useState<string | null>(null);
const { conversationId } = useConversationId();
const { data: conversation } = useActiveConversation();
const enabled =
conversation?.status === "RUNNING" &&
RUNTIME_INACTIVE_STATES.includes(curAgentState);
const runtimeIsReady = useRuntimeIsReady();
const { data } = useQuery({
queryKey: [conversationId, "hosts"],
@@ -23,7 +16,7 @@ export const useActiveHost = () => {
const hosts = await OpenHands.getWebHosts(conversationId);
return { hosts };
},
enabled,
enabled: runtimeIsReady && !!conversationId,
initialData: { hosts: [] },
meta: {
disableToast: true,
@@ -1,23 +1,15 @@
import { useQuery } from "@tanstack/react-query";
import React from "react";
import { useSelector } from "react-redux";
import OpenHands from "#/api/open-hands";
import { useConversationId } from "#/hooks/use-conversation-id";
import { GitChange } from "#/api/open-hands.types";
import { RootState } from "#/store";
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
import { useActiveConversation } from "./use-active-conversation";
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
export const useGetGitChanges = () => {
const { conversationId } = useConversationId();
const { data: conversation } = useActiveConversation();
const [orderedChanges, setOrderedChanges] = React.useState<GitChange[]>([]);
const previousDataRef = React.useRef<GitChange[]>(null);
const { curAgentState } = useSelector((state: RootState) => state.agent);
const enabled =
conversation?.status === "RUNNING" &&
RUNTIME_INACTIVE_STATES.includes(curAgentState);
const runtimeIsReady = useRuntimeIsReady();
const result = useQuery({
queryKey: ["file_changes", conversationId],
@@ -25,7 +17,7 @@ export const useGetGitChanges = () => {
retry: false,
staleTime: 1000 * 60 * 5, // 5 minutes
gcTime: 1000 * 60 * 15, // 15 minutes
enabled,
enabled: runtimeIsReady && !!conversationId,
meta: {
disableToast: true,
},
+3 -10
View File
@@ -1,13 +1,10 @@
import { useQuery } from "@tanstack/react-query";
import { useTranslation } from "react-i18next";
import { useSelector } from "react-redux";
import OpenHands from "#/api/open-hands";
import { useConversationId } from "#/hooks/use-conversation-id";
import { I18nKey } from "#/i18n/declaration";
import { RootState } from "#/store";
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
import { useActiveConversation } from "./use-active-conversation";
import { useRuntimeIsReady } from "#/hooks/use-runtime-is-ready";
// Define the return type for the VS Code URL query
interface VSCodeUrlResult {
@@ -18,11 +15,7 @@ interface VSCodeUrlResult {
export const useVSCodeUrl = () => {
const { t } = useTranslation();
const { conversationId } = useConversationId();
const { data: conversation } = useActiveConversation();
const { curAgentState } = useSelector((state: RootState) => state.agent);
const enabled =
conversation?.status === "RUNNING" &&
RUNTIME_INACTIVE_STATES.includes(curAgentState);
const runtimeIsReady = useRuntimeIsReady();
return useQuery<VSCodeUrlResult>({
queryKey: ["vscode_url", conversationId],
@@ -40,7 +33,7 @@ export const useVSCodeUrl = () => {
error: t(I18nKey.VSCODE$URL_NOT_AVAILABLE),
};
},
enabled,
enabled: runtimeIsReady && !!conversationId,
refetchOnMount: true,
retry: 3,
});
@@ -0,0 +1,19 @@
import { useSelector } from "react-redux";
import { RootState } from "#/store";
import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
import { useActiveConversation } from "./query/use-active-conversation";
/**
* Hook to determine if the runtime is ready for operations
*
* @returns boolean indicating if the runtime is ready
*/
export const useRuntimeIsReady = (): boolean => {
const { data: conversation } = useActiveConversation();
const { curAgentState } = useSelector((state: RootState) => state.agent);
return (
conversation?.status === "RUNNING" &&
!RUNTIME_INACTIVE_STATES.includes(curAgentState)
);
};
+1 -1
View File
@@ -20,7 +20,7 @@ export interface SystemMessageAction extends OpenHandsActionEvent<"system"> {
}
export interface CommandAction extends OpenHandsActionEvent<"run"> {
source: "agent";
source: "agent" | "user";
args: {
command: string;
security_risk: ActionSecurityRisk;
+5
View File
@@ -4,6 +4,7 @@ import {
AssistantMessageAction,
OpenHandsAction,
SystemMessageAction,
CommandAction,
} from "./actions";
import {
AgentStateChangeObservation,
@@ -41,6 +42,10 @@ export const isErrorObservation = (
): event is ErrorObservation =>
isOpenHandsObservation(event) && event.observation === "error";
export const isCommandAction = (
event: OpenHandsParsedEvent,
): event is CommandAction => isOpenHandsAction(event) && event.action === "run";
export const isAgentStateChangeObservation = (
event: OpenHandsParsedEvent,
): event is AgentStateChangeObservation =>
+1 -1
View File
@@ -11,7 +11,7 @@ export interface AgentStateChangeObservation
}
export interface CommandObservation extends OpenHandsObservationEvent<"run"> {
source: "agent";
source: "agent" | "user";
extras: {
command: string;
hidden?: boolean;
+1 -1
View File
@@ -582,7 +582,7 @@ def _extract_and_validate_params(
found_params = set()
for param_match in param_matches:
param_name = param_match.group(1)
param_value = param_match.group(2).strip()
param_value = param_match.group(2)
# Validate parameter is allowed
if allowed_params and param_name not in allowed_params:
+4 -4
View File
@@ -1013,12 +1013,12 @@ if __name__ == '__main__':
if not os.path.exists(full_path):
# if user just removed a folder, prevent server error 500 in UI
return []
return JSONResponse(content=[])
try:
# Check if the directory exists
if not os.path.exists(full_path) or not os.path.isdir(full_path):
return []
return JSONResponse(content=[])
entries = os.listdir(full_path)
@@ -1047,11 +1047,11 @@ if __name__ == '__main__':
# Combine sorted directories and files
sorted_entries = directories + files
return sorted_entries
return JSONResponse(content=sorted_entries)
except Exception as e:
logger.error(f'Error listing files: {e}')
return []
return JSONResponse(content=[])
logger.debug(f'Starting action execution API on port {args.port}')
run(app, host='0.0.0.0', port=args.port)
+29 -24
View File
@@ -48,6 +48,34 @@ def create_provider_tokens_object(
return MappingProxyType(provider_information)
async def setup_init_convo_settings(
user_id: str | None, providers_set: list[ProviderType]
) -> ConversationInitData:
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
settings = await settings_store.load()
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
user_secrets: UserSecrets | None = await secrets_store.load()
if not settings:
raise ConnectionRefusedError(
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
)
session_init_args: dict = {}
session_init_args = {**settings.__dict__, **session_init_args}
git_provider_tokens = create_provider_tokens_object(providers_set)
if server_config.app_mode != AppMode.SAAS and user_secrets:
git_provider_tokens = user_secrets.provider_tokens
session_init_args['git_provider_tokens'] = git_provider_tokens
if user_secrets:
session_init_args['custom_secrets'] = user_secrets.custom_secrets
return ConversationInitData(**session_init_args)
@sio.event
async def connect(connection_id: str, environ: dict) -> None:
try:
@@ -85,30 +113,7 @@ async def connect(connection_id: str, environ: dict) -> None:
conversation_id, cookies_str, authorization_header
)
settings_store = await SettingsStoreImpl.get_instance(config, user_id)
settings = await settings_store.load()
secrets_store = await SecretsStoreImpl.get_instance(config, user_id)
user_secrets: UserSecrets | None = await secrets_store.load()
if not settings:
raise ConnectionRefusedError(
'Settings not found', {'msg_id': 'CONFIGURATION$SETTINGS_NOT_FOUND'}
)
session_init_args: dict = {}
if settings:
session_init_args = {**settings.__dict__, **session_init_args}
git_provider_tokens = create_provider_tokens_object(providers_set)
if server_config.app_mode != AppMode.SAAS and user_secrets:
git_provider_tokens = user_secrets.provider_tokens
session_init_args['git_provider_tokens'] = git_provider_tokens
if user_secrets:
session_init_args['custom_secrets'] = user_secrets.custom_secrets
conversation_init_data = ConversationInitData(**session_init_args)
conversation_init_data = await setup_init_convo_settings(user_id, providers_set)
agent_loop_info = await conversation_manager.join_conversation(
conversation_id,
connection_id,
+20 -24
View File
@@ -188,10 +188,7 @@ async def load_custom_secrets_names(
) -> GETCustomSecrets | JSONResponse:
try:
if not user_secrets:
return JSONResponse(
status_code=status.HTTP_404_NOT_FOUND,
content={'error': 'User secrets not found'},
)
return GETCustomSecrets(custom_secrets=[])
custom_secrets: list[CustomSecretWithoutValueModel] = []
if user_secrets.custom_secrets:
@@ -220,31 +217,30 @@ async def create_custom_secret(
) -> JSONResponse:
try:
existing_secrets = await secrets_store.load()
if existing_secrets:
custom_secrets = dict(existing_secrets.custom_secrets)
custom_secrets = dict(existing_secrets.custom_secrets) if existing_secrets else {}
secret_name = incoming_secret.name
secret_value = incoming_secret.value
secret_description = incoming_secret.description
secret_name = incoming_secret.name
secret_value = incoming_secret.value
secret_description = incoming_secret.description
if secret_name in custom_secrets:
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,
content={'message': f'Secret {secret_name} already exists'},
)
custom_secrets[secret_name] = CustomSecret(
secret=secret_value,
description=secret_description or '',
if secret_name in custom_secrets:
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,
content={'message': f'Secret {secret_name} already exists'},
)
# Create a new UserSecrets that preserves provider tokens
updated_user_secrets = UserSecrets(
custom_secrets=custom_secrets,
provider_tokens=existing_secrets.provider_tokens,
)
custom_secrets[secret_name] = CustomSecret(
secret=secret_value,
description=secret_description or '',
)
await secrets_store.store(updated_user_secrets)
# Create a new UserSecrets that preserves provider tokens
updated_user_secrets = UserSecrets(
custom_secrets=custom_secrets,
provider_tokens=existing_secrets.provider_tokens if existing_secrets else {},
)
await secrets_store.store(updated_user_secrets)
return JSONResponse(
status_code=status.HTTP_201_CREATED,
+23
View File
@@ -683,6 +683,29 @@ def test_agent_config_condenser_with_no_enabled():
assert isinstance(agent_config.condenser, NoOpCondenserConfig)
def test_sandbox_volumes_toml(default_config, temp_toml_file):
"""Test that volumes configuration under [sandbox] works correctly."""
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
toml_file.write("""
[sandbox]
volumes = "/home/user/mydir:/workspace:rw,/data:/data:ro"
timeout = 1
""")
load_from_toml(default_config, temp_toml_file)
finalize_config(default_config)
# Check that sandbox.volumes is set correctly
assert (
default_config.sandbox.volumes
== '/home/user/mydir:/workspace:rw,/data:/data:ro'
)
assert default_config.workspace_mount_path == '/home/user/mydir'
assert default_config.workspace_mount_path_in_sandbox == '/workspace'
assert default_config.workspace_base == '/home/user/mydir'
assert default_config.sandbox.timeout == 1
def test_condenser_config_from_toml_basic(default_config, temp_toml_file):
"""Test loading basic condenser configuration from TOML."""
with open(temp_toml_file, 'w', encoding='utf-8') as toml_file:
+28
View File
@@ -652,6 +652,34 @@ NON_FNCALL_RESPONSE_MESSAGE = {
<parameter=command>view</parameter>
<parameter=path>/test/file.py</parameter>
<parameter=view_range>[1, 10]</parameter>
</function>""",
),
# Test case with indented code block to verify indentation is preserved
(
[
{
'index': 1,
'function': {
'arguments': '{"command": "str_replace", "path": "/test/file.py", "old_str": "def example():\\n pass", "new_str": "def example():\\n # This is indented\\n print(\\"hello\\")\\n return True"}',
'name': 'str_replace_editor',
},
'id': 'test_id',
'type': 'function',
}
],
"""<function=str_replace_editor>
<parameter=command>str_replace</parameter>
<parameter=path>/test/file.py</parameter>
<parameter=old_str>
def example():
pass
</parameter>
<parameter=new_str>
def example():
# This is indented
print("hello")
return True
</parameter>
</function>""",
),
],
+33
View File
@@ -138,6 +138,39 @@ async def test_add_custom_secret(test_client, file_secrets_store):
)
@pytest.mark.asyncio
async def test_create_custom_secret_with_no_existing_secrets(
test_client, file_secrets_store
):
"""Test creating a custom secret when there are no existing secrets at all."""
# Don't store any initial settings - this simulates a completely new user
# or a situation where the secrets store is empty
# Make the POST request to add a custom secret
add_secret_data = {
'name': 'NEW_API_KEY',
'value': 'new-api-key-value',
'description': 'Test API Key',
}
response = test_client.post('/api/secrets', json=add_secret_data)
assert response.status_code == 201
# Verify that the settings were stored with the new secret
stored_settings = await file_secrets_store.load()
# Check that the secret was added
assert 'NEW_API_KEY' in stored_settings.custom_secrets
assert (
stored_settings.custom_secrets['NEW_API_KEY'].secret.get_secret_value()
== 'new-api-key-value'
)
assert stored_settings.custom_secrets['NEW_API_KEY'].description == 'Test API Key'
# Check that provider_tokens is an empty dict, not None
assert stored_settings.provider_tokens == {}
@pytest.mark.asyncio
async def test_update_existing_custom_secret(test_client, file_secrets_store):
"""Test updating an existing custom secret's name and description (cannot change value once set)."""