Compare commits

..

18 Commits

Author SHA1 Message Date
openhands 7561518e4c Fix lint errors 2025-05-19 16:55:19 +00:00
openhands 2cd503f033 Fix merge conflicts with main branch 2025-05-19 16:53:24 +00:00
dependabot[bot] 470687f826 chore(deps): bump the mcp-packages group with 2 updates (#8546)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-19 18:37:11 +02:00
tofarr 38b4d93237 Add Session API Key Authentication for Runtime Communication (#8550) 2025-05-19 09:59:22 -06:00
dependabot[bot] 872b97a3c8 chore(deps): bump the version-all group across 1 directory with 20 updates (#8545)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
2025-05-19 15:49:53 +00:00
sp.wack 14334040f1 chore(frontend): Refactor chat interface-related event handling (#8403) 2025-05-19 15:15:09 +00:00
sp.wack b244138ec5 fix(frontend): Prevent making too many calls to /git/changes on conversation load (#8579) 2025-05-19 18:57:18 +04:00
Xingyao Wang 4a3d2e6859 Fix #8551: Show images produced in Jupyter Notebook to LLM directly (#8552)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-19 14:14:00 +00:00
luolin101 1a3cb16ba6 add Visual SWE-bench benchmark (#7131)
Co-authored-by: tsukimi <yuailun@pku.edu.cn>
Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>
2025-05-19 12:08:46 +07:00
Xingyao Wang 2ecc39ffcc [eval]: disable MCP for SWE-Bench evaluation (#8574)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-05-19 01:32:46 +00:00
Graham Neubig 0b26174d60 Add documentation microagent (#8563)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-18 16:34:15 -04:00
Calvin Smith b0005d4e09 Limit size of browser events (#8559)
Co-authored-by: Calvin Smith <calvin@all-hands.dev>
2025-05-18 11:35:09 -06:00
Graham Neubig 2dc7b37fe8 Fix flaky TestLocalFileStore tests (#8569)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-18 06:37:28 -04:00
openhands 2d434ad49f Rename RuntimeInfo to ContextualInfo 2025-05-17 22:31:31 +00:00
openhands 81c0253d53 Rename memory.set_runtime_info to memory.set_contextual_info 2025-05-17 22:27:48 +00:00
openhands 9cdde313d8 Add context_message parameter to conversation creation endpoint 2025-05-17 22:22:15 +00:00
Carlos Freund 27c18f5bdd build(makefile) Develop in OpenhandsCloud (#7440)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-05-17 14:40:46 +00:00
Graham Neubig 5077fea5c7 Fix: Run setup.sh script in GitHub resolver (#8548)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-17 09:52:34 -04:00
104 changed files with 3255 additions and 2758 deletions
+33
View File
@@ -0,0 +1,33 @@
---
name: documentation
type: knowledge
version: 1.0.0
agent: CodeActAgent
triggers:
- documentation
- docs
- document
---
# Documentation Guidelines
All documentation must be grounded in fact, so you must not make anything up without proper evidence. When you have finished writing documentation, convey to the user what reference source, including web pages, source code, or other sources of documentation you referenced when writing each new fact in the documentation. If you cannot reference a source for anything do not include it in the pull request.
## Best Practices for Documentation
1. **Be Factual**: Only include information that can be verified from reliable sources.
2. **Cite Sources**: Always reference the source of information (code, web pages, official documentation).
3. **Be Clear and Concise**: Use simple language and avoid unnecessary jargon.
4. **Use Examples**: Include practical examples to illustrate concepts.
5. **Structure Properly**: Use headings, lists, and code blocks to organize information.
6. **Keep Updated**: Ensure documentation reflects the current state of the code or system.
## Documentation Process
1. Research and gather information from reliable sources
2. Draft documentation based on verified facts
3. Review for accuracy and completeness
4. Include references for all factual statements
5. Submit only when all information is properly sourced
Remember: If you cannot verify a piece of information, it's better to exclude it than to include potentially incorrect information.
+11 -2
View File
@@ -5,6 +5,7 @@ SHELL=/usr/bin/env bash
BACKEND_HOST ?= "127.0.0.1"
BACKEND_PORT = 3000
BACKEND_HOST_PORT = "$(BACKEND_HOST):$(BACKEND_PORT)"
FRONTEND_HOST ?= "127.0.0.1"
FRONTEND_PORT = 3001
DEFAULT_WORKSPACE_DIR = "./workspace"
DEFAULT_MODEL = "gpt-4o"
@@ -288,6 +289,15 @@ setup-config-prompts:
@read -p "Enter your LLM base URL [mostly used for local LLMs, leave blank if not needed - example: http://localhost:5001/v1/]: " llm_base_url; \
if [[ ! -z "$$llm_base_url" ]]; then echo "base_url=\"$$llm_base_url\"" >> $(CONFIG_FILE).tmp; fi
setup-config-basic:
@printf '%s\n' \
'[core]' \
'workspace_base="./workspace"' \
> config.toml
@echo "$(GREEN)config.toml created.$(RESET)"
openhands-cloud-run:
@$(MAKE) run BACKEND_HOST="0.0.0.0" BACKEND_PORT="12000" FRONTEND_HOST="0.0.0.0" FRONTEND_PORT="12001"
# Develop in container
docker-dev:
@@ -322,5 +332,4 @@ help:
@echo " $(GREEN)help$(RESET) - Display this help message, providing information on available targets."
# Phony targets
.PHONY: build check-dependencies check-python check-npm check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint start-backend start-frontend run run-wsl setup-config setup-config-prompts help
.PHONY: docker-dev docker-run
.PHONY: build check-dependencies check-system check-python check-npm check-nodejs check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint-backend lint-frontend lint test-frontend test build-frontend start-backend start-frontend _run_setup run run-wsl setup-config setup-config-prompts setup-config-basic openhands-cloud-run docker-dev docker-run clean help
-249
View File
@@ -1,249 +0,0 @@
# Agent Mode Toggle Design Document
## Overview
This document outlines the design for implementing a toggle switch between "Read-only mode" and "Execute mode" in the OpenHands application. This feature will allow users to switch between a restricted ReadOnlyAgent that can only explore and analyze code, and the fully capable CodeActAgent that can modify code and execute commands.
## Motivation
Users often want to explore a codebase and discuss implementation details with the agent before making any changes. The ability to switch between read-only and execute modes provides several benefits:
1. **Safety**: Users can ensure no changes are made during the exploration phase
2. **Clarity**: Clear indication of the agent's current capabilities
3. **Control**: Users decide when to transition from planning to execution
4. **Workflow**: Supports a natural workflow of exploration → planning → implementation
## Architecture
The implementation will leverage the existing agent delegation mechanism in OpenHands. When a user toggles the switch:
1. In **Execute Mode** (default): The application uses the standard CodeActAgent
2. In **Read-only Mode**: The application delegates to a ReadOnlyAgent
### Key Components
#### Frontend
1. **Toggle Switch Component**:
- UI element that shows the current mode and allows switching
- Sends appropriate actions to the event stream when toggled
2. **Agent State Tracking**:
- Redux state to track current agent type and delegation status
- Event listeners to update state based on event stream
3. **Visual Indicators**:
- Mode indicator showing current agent mode
- Visual styling differences between modes
#### Backend
1. **Agent Delegation**:
- Uses existing delegation mechanism to switch to ReadOnlyAgent
- User-initiated FinishAction to end delegation and return to CodeActAgent
2. **Event Stream Integration**:
- AgentDelegateAction to start read-only mode
- AgentFinishAction to end read-only mode
- System messages to indicate mode changes
## Implementation Details
### Frontend Implementation
#### Redux State Extension
```typescript
interface AgentState {
curAgentState: AgentState;
currentAgentType: string; // Track the agent type
isDelegated: boolean; // Track if we're in a delegation
// other existing fields...
}
const initialState: AgentState = {
curAgentState: AgentState.IDLE,
currentAgentType: "CodeActAgent", // Default agent type
isDelegated: false,
// other initial values...
};
```
#### Action Generators
```typescript
export const generateDelegateToReadOnlyAction = () => ({
action: ActionType.DELEGATE,
args: {
agent: "ReadOnlyAgent",
inputs: {
task: "Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes."
},
thought: "Switching to read-only mode at user's request"
}
});
export const generateFinishDelegationAction = () => ({
action: ActionType.FINISH,
args: {
message: "Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.",
task_completed: "true",
outputs: {
mode_switch: true
}
}
});
```
#### Toggle Switch Component
```tsx
function AgentModeToggle() {
const { t } = useTranslation();
const dispatch = useDispatch();
const { send } = useWsClient();
// Get agent type from Redux
const { currentAgentType, isDelegated } = useSelector((state: RootState) => state.agent);
// Compute if we're in read-only mode
const isReadOnly = currentAgentType === "ReadOnlyAgent";
const handleToggle = () => {
if (isReadOnly) {
// Currently in read-only mode, switch back to execute mode
send(generateFinishDelegationAction());
} else {
// Currently in execute mode, switch to read-only mode
send(generateDelegateToReadOnlyAction());
}
};
return (
<div className="flex items-center gap-2">
<span className="text-sm font-medium">
{isReadOnly ? "Read-Only Mode" : "Execute Mode"}
</span>
<Switch
checked={isReadOnly}
onChange={handleToggle}
className={`${isReadOnly ? 'bg-amber-600' : 'bg-blue-600'} relative inline-flex h-6 w-11 items-center rounded-full`}
>
<span className="sr-only">Toggle agent mode</span>
<span
className={`${isReadOnly ? 'translate-x-6' : 'translate-x-1'} inline-block h-4 w-4 transform rounded-full bg-white transition`}
/>
</Switch>
</div>
);
}
```
#### Event Listener for State Updates
```typescript
function handleEvent(event) {
// Handle agent delegation events
if (event.action === ActionType.DELEGATE) {
// A delegation is starting
dispatch(setDelegationState(true));
dispatch(setAgentType(event.args.agent));
}
// Handle agent delegate observation (delegation ended)
else if (event.observation === "delegate") {
// Delegation has ended, returning to parent agent
dispatch(setDelegationState(false));
dispatch(setAgentType("CodeActAgent")); // Reset to default agent
}
// Handle other events...
}
```
### Backend Considerations
The backend implementation will leverage the existing agent delegation mechanism:
1. When the user toggles to read-only mode:
- An AgentDelegateAction is sent to the event stream
- The AgentController creates a ReadOnlyAgent delegate
- All subsequent events are handled by the delegate
2. When the user toggles back to execute mode:
- An AgentFinishAction is sent to the event stream
- The delegate agent finishes its task
- The parent AgentController resumes normal operation
No backend code changes are required as we're using the existing delegation mechanism.
## User Experience
1. **Initial State**: The application starts in Execute Mode with CodeActAgent
2. **Mode Switching**:
- User clicks the toggle switch to enter Read-only Mode
- System message indicates the mode change
- Agent capabilities are restricted to read-only tools
- UI shows visual indicators of the current mode
- User clicks the toggle switch again to return to Execute Mode
- System message indicates the return to full capabilities
3. **Visual Indicators**:
- Toggle switch position (left/right)
- Color coding (amber for read-only, blue for execute)
- Mode label text
- System messages in the conversation
## Future Enhancements
1. **Persistent Mode Preference**: Remember the user's preferred starting mode
2. **Context Preservation**: Improve context retention when switching modes
3. **Custom Tool Sets**: Allow users to customize which tools are available in each mode
4. **Mode-specific Prompts**: Optimize agent prompts for each mode
## Implementation Plan
1. **Frontend Implementation**:
- Add Redux state for agent type tracking ✅
- Create toggle switch component ✅
- Implement event listeners for state updates ✅
- Add visual indicators for current mode ✅
- Add notifications for mode changes ✅
2. **Testing**:
- Test mode switching with various conversation states
- Verify proper tool restrictions in read-only mode
- Test persistence across page refreshes
3. **Documentation**:
- Update user documentation to explain the mode toggle feature
- Add developer documentation for the implementation details ✅
## Implementation Status
The agent mode toggle feature has been implemented with the following components:
1. **Redux State**:
- Added `currentAgentType` and `isDelegated` properties to the agent slice
- Default agent type is set to "CodeActAgent"
2. **Agent Mode Service**:
- Created `agent-mode-service.ts` with action generators for delegation
- Implemented `generateDelegateToReadOnlyAction()` and `generateFinishDelegationAction()`
3. **UI Components**:
- Created `AgentModeToggle` component with toggle switch UI
- Integrated toggle into the agent control bar
- Updated agent status bar to display current mode
- Added color coding (amber for read-only, blue for execute)
4. **Event Handling**:
- Updated `use-handle-ws-events.ts` to process agent delegation events
- Added state updates when delegation starts/ends
- Added notifications to inform users of mode changes
5. **Internationalization**:
- Added translations for all UI elements
- Supported multiple languages through i18n
The implementation is complete and ready for testing. The feature allows users to seamlessly switch between read-only and execute modes during a conversation, with clear visual indicators and notifications of the current mode.
-55
View File
@@ -1,55 +0,0 @@
# Agent Mode Toggle
The Agent Mode Toggle feature allows you to switch between two different agent modes:
1. **Execute Mode** (default): Full capabilities with the CodeActAgent, which can modify code and execute commands
2. **Read-only Mode**: Restricted capabilities with the ReadOnlyAgent, which can only explore and analyze code
## Why Use Different Modes?
- **Safety**: Ensure no changes are made during the exploration phase
- **Clarity**: Clear indication of the agent's current capabilities
- **Control**: Decide when to transition from planning to execution
- **Workflow**: Support a natural workflow of exploration → planning → implementation
## How to Use
1. **Toggle Switch**: Click the toggle switch in the agent control bar to switch between modes
- Blue toggle: Execute Mode (default)
- Amber toggle: Read-only Mode
2. **Mode Indicators**:
- The current mode is displayed in the agent status bar
- System messages indicate when the mode changes
## Available Tools in Each Mode
### Execute Mode (CodeActAgent)
All tools are available, including:
- File editing (`str_replace_editor`)
- Command execution (`execute_bash`)
- Python code execution (`execute_ipython_cell`)
- Web browsing (`browser`, `web_read`)
- Thinking and finishing (`think`, `finish`)
### Read-only Mode (ReadOnlyAgent)
Only non-destructive tools are available:
- File viewing (`view`)
- File searching (`grep`, `glob`)
- Web reading (`web_read`)
- Thinking and finishing (`think`, `finish`)
## Best Practices
1. **Start in Read-only Mode** for new codebases to safely explore without making changes
2. **Switch to Execute Mode** when you're ready to implement changes
3. **Return to Read-only Mode** when you want to explore different parts of the codebase
## Technical Details
The agent mode toggle uses OpenHands' agent delegation mechanism:
- When toggling to Read-only Mode, the system delegates to a ReadOnlyAgent
- When toggling back to Execute Mode, the delegation ends and returns to the CodeActAgent
- Context is preserved between mode switches
@@ -261,6 +261,7 @@ def get_config(
enable_jupyter=False,
enable_browsing=RUN_WITH_BROWSING,
enable_llm_editor=False,
enable_mcp=False,
condenser=metadata.condenser_config,
enable_prompt_extensions=False,
)
@@ -0,0 +1,172 @@
# Visual SWE-Bench Evaluation with Docker Image
This folder contains the evaluation harness that we built on top of the original [Visual SWE-Bench benchmark](https://multi-swe-bench.github.io/#/) ([paper](https://arxiv.org/abs/2412.17315)).
The evaluation consists of three steps:
1. Environment setup: [install python environment](../../README.md#development-environment), [configure LLM config](../../README.md#configure-openhands-and-your-llm), and [pull docker](#openhands-visual-swe-bench-instance-level-docker-support).
2. [Run inference](#run-inference-on-visual-swe-bench-instances): Generate a edit patch for each Github issue.
3. [Evaluate patches using Visual SWE-Bench docker](#evaluate-generated-patches).
## Setup Environment and LLM Configuration
Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
## OpenHands Visual SWE-Bench Instance-level Docker Support
OpenHands now support using the official evaluation docker for both **[inference](#run-inference-on-visual-swe-bench-instances) and [evaluation](#evaluate-generated-patches)**.
This is now the default behavior.
## Run Inference on Visual SWE-Bench Instances
Make sure your Docker daemon is running, and you have ample disk space for the [instance-level docker image](#openhands-visual-swe-bench-instance-level-docker-support).
When the `run_infer.sh` script is started, it will automatically pull the relevant Visual SWE-Bench images. For example, for instance ID `networkx__networkx-6503`, it will try to pull our pre-build docker image `sweb.eval.x86_64.networkx_s_networkx-6503` from DockerHub. This image will be used create an OpenHands runtime image where the agent will operate on.
```bash
./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers]
# Example
./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 133 30 1
```
where `model_config` is mandatory, and the rest are optional.
- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
LLM settings, as defined in your `config.toml`.
- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version you would
like to evaluate. It could also be a release tag like `0.6.2`.
- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
to `CodeActAgent`.
- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By
default, the script evaluates the entire Visual SWE-bench set (133 issues). Note:
in order to use `eval_limit`, you must also set `agent`.
- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By
default, it is set to 30.
- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
default, it is set to 1.
There are also two optional environment variables you can set.
```bash
export USE_HINT_TEXT=true # if you want to use hint text in the evaluation. Default to false. Ignore this if you are not sure.
export USE_INSTANCE_IMAGE=true # if you want to use instance-level docker images. Default to true
```
Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,
then your command would be:
```bash
./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
```
### Specify a subset of tasks to run infer
If you would like to specify a list of tasks you'd like to benchmark on, you could
create a `config.toml` under `./evaluation/benchmarks/visual_swe_bench/` folder, and put a list
attribute named `selected_ids`, e.g.
```toml
selected_ids = ['astropy__astropy-13838', 'matplotlib__matplotlib-21617', 'plotly__plotly.py-1966']
```
Then only these tasks (rows whose `instance_id` is in the above list) will be evaluated.
In this case, `eval_limit` option applies to tasks that are in the `selected_ids` list.
After running the inference, you will obtain a `output.jsonl` (by default it will be saved to `evaluation/evaluation_outputs`).
## Evaluate Generated Patches
### Download Docker Images
**(Recommended for reproducibility)** If you have extra local space (e.g., 200GB), you can try pull the instance-level docker images we've prepared by running:
```bash
evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh instance
```
If you want to save disk space a bit, while speeding up the image pre-build process, you can pull the environment-level docker images:
```bash
evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh env
```
If you want to evaluate on the full SWE-Bench test set:
```bash
evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh instance full
```
### Run evaluation
With `output.jsonl` file, you can run `eval_infer.sh` to evaluate generated patches, and produce a fine-grained report.
**This evaluation is performed using the official dockerized evaluation announced.**
> If you want to evaluate existing results, you should first run this to clone existing outputs
>
>```bash
>git clone https://huggingface.co/spaces/OpenHands/evaluation evaluation/evaluation_outputs
>```
NOTE, you should have already pulled the instance-level OR env-level docker images following [this section](#openhands-visual-swe-bench-instance-level-docker-support).
Then you can run the following:
```bash
./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh $YOUR_OUTPUT_JSONL [instance_id]
# Example
./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/luolin101__Visual-SWE-bench-test/CodeActAgent/gpt-4-1106-preview_maxiter_50_N_v1.0/output.jsonl
```
The script now accepts optional arguments:
- `instance_id`: Specify a single instance to evaluate (optional)
For example, to evaluate a specific instance with a custom dataset and split:
```bash
./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh $YOUR_OUTPUT_JSONL instance_123
```
> You can also pass in a JSONL with SWE-Bench format to `./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh`, where each line is a JSON of `{"model_patch": "XXX", "model_name_or_path": "YYY", "instance_id": "ZZZ"}`.
The final results will be saved to `evaluation/evaluation_outputs/outputs/visual_swe_bench/CodeActAgent/gpt-4-1106-preview_maxiter_50_N_v1.0/` with the following files/directory:
- `README.md`: a report showing what are the instances that passed, failed, etc.
- `report.json`: a JSON file that contains keys like `"resolved_ids"` pointing to instance IDs that are resolved by the agent.
- `logs/`: a directory of test logs
## Visualize Results
First you need to clone `https://huggingface.co/spaces/OpenHands/evaluation` and add your own running results from openhands into the `outputs` of the cloned repo.
```bash
git clone https://huggingface.co/spaces/OpenHands/evaluation
```
**(optional) setup streamlit environment with conda**:
```bash
cd evaluation
conda create -n streamlit python=3.10
conda activate streamlit
pip install -r requirements.txt
```
**run the visualizer**:
Then, in a separate Python environment with `streamlit` library, you can run the following:
```bash
# Make sure you are inside the cloned `evaluation` repo
conda activate streamlit # if you follow the optional conda env setup above
streamlit app.py --server.port 8501 --server.address 0.0.0.0
```
Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
## Submit your evaluation results
You can start your own fork of [our huggingface evaluation outputs](https://huggingface.co/spaces/OpenHands/evaluation) and submit a PR of your evaluation results following the guide [here](https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-and-discussions).
@@ -0,0 +1,641 @@
import asyncio
import json
import os
import tempfile
from typing import Any
import pandas as pd
import toml
from datasets import load_dataset
import openhands.agenthub
from evaluation.benchmarks.swe_bench.resource.mapping import (
get_instance_resource_factor,
)
from evaluation.utils.shared import (
EvalException,
EvalMetadata,
EvalOutput,
assert_and_raise,
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
update_llm_config_for_completions_logging,
)
from openhands.controller.state.state import State
from openhands.core.config import (
AgentConfig,
AppConfig,
get_llm_config_arg,
get_parser,
)
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.events.action import CmdRunAction, MessageAction
from openhands.events.observation import CmdOutputObservation, ErrorObservation
from openhands.events.serialization.event import event_to_dict
from openhands.runtime.base import Runtime
from openhands.utils.async_utils import call_async_from_sync
from openhands.utils.shutdown_listener import sleep_if_should_continue
USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'true'
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
}
def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
return f'{instance.repo}__{instance.version}'.replace('/', '__')
def get_instruction(instance: pd.Series, metadata: EvalMetadata):
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
# Instruction based on Anthropic's official trajectory
# https://github.com/eschluntz/swe-bench-experiments/tree/main/evaluation/verified/20241022_tools_claude-3-5-sonnet-updated/trajs
instruction = (
'<uploaded_files>\n'
f'/workspace/{workspace_dir_name}\n'
'</uploaded_files>\n'
f"I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:\n\n"
f'<issue_description>\n'
f'{instance.problem_statement}\n'
'</issue_description>\n\n'
'Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?\n'
"I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!\n"
"Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.\n"
'Your task is to make the minimal changes to non-test files in the /workspace directory to ensure the <issue_description> is satisfied.\n'
'Follow these steps to resolve the issue:\n'
'1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.\n'
'2. Create a script to reproduce the error and execute it with `python <filename.py>` using the BashTool, to confirm the error\n'
'3. Edit the sourcecode of the repo to resolve the issue\n'
'4. Rerun your reproduce script and confirm that the error is fixed!\n'
'5. Think about edgecases, add comprehensive tests for them in your reproduce script, and run them to make sure your fix handles them as well\n'
f'6. Once you are done with the initial implementation, please carefully re-read the problem description and check the difference between the current code and the base commit {instance["base_commit"]}. Do you think that the issue has been completely and comprehensively solved? Write tests to check the correctness of the solution, specifically focusing on tests that may point out any remaining problems that are not yet solved. Run all of the tests in the repo and check if any of them fail, and if they do fix the code. Repeat this process of carefully reading the problem description and current implementation, testing, and fixing any problems until you are confident that the current implementation is correct. Find and run any tests in the repo that are related to:\n'
' - The issue you are fixing\n'
' - The files you modified\n'
' - The functions you changed\n'
' Make sure all these tests pass with your changes.\n'
"Your thinking should be thorough and so it's fine if it's very long.\n"
)
if RUN_WITH_BROWSING:
instruction += (
'<IMPORTANT!>\nYou SHOULD NEVER attempt to browse the web. </IMPORTANT!>\n'
)
return instruction
# TODO: migrate all swe-bench docker to ghcr.io/openhands
DOCKER_IMAGE_PREFIX = os.environ.get('EVAL_DOCKER_IMAGE_PREFIX', 'docker.io/xingyaoww/')
logger.info(f'Using docker image prefix: {DOCKER_IMAGE_PREFIX}')
def get_instance_docker_image(instance_id: str, official_image: bool = False) -> str:
image_name = 'sweb.eval.x86_64.' + instance_id
image_name = image_name.replace(
'__', '_s_'
) # to comply with docker image naming convention
other_list = [
'plotly__plotly.py-4083',
'plotly__plotly.py-2600',
'plotly__plotly.py-2591',
'plotly__plotly.py-1966',
'networkx__networkx-6503',
'networkx__networkx-6098',
'networkx__networkx-5616',
'networkx__networkx-5354',
'networkx__networkx-5058',
'networkx__networkx-4378',
'networkx__networkx-3764',
'vega__altair-2785',
'vega__altair-1092',
'vega__altair-974',
'vega__altair-830',
'matplotlib__matplotlib-27754',
'matplotlib__matplotlib-26926',
'matplotlib__matplotlib-26788',
'matplotlib__matplotlib-26586',
'sympy__sympy-26941',
'mwaskom__seaborn-3458',
'mwaskom__seaborn-3454',
]
if instance_id in other_list:
return ('docker.io/luolin101/'.rstrip('/') + '/' + image_name).lower()
return (DOCKER_IMAGE_PREFIX.rstrip('/') + '/' + image_name).lower()
def get_config(
instance: pd.Series,
metadata: EvalMetadata,
) -> AppConfig:
# We use a different instance image for the each instance of swe-bench eval
use_official_image = bool(
'verified' in metadata.dataset.lower() or 'lite' in metadata.dataset.lower()
)
base_container_image = get_instance_docker_image(
instance['instance_id'], use_official_image
)
logger.info(
f'Using instance container image: {base_container_image}. '
f'Please make sure this image exists. '
f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
)
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = base_container_image
sandbox_config.enable_auto_lint = True
sandbox_config.use_host_network = False
# Add platform to the sandbox config to solve issue 4401
sandbox_config.platform = 'linux/amd64'
sandbox_config.remote_runtime_resource_factor = get_instance_resource_factor(
dataset_name=metadata.dataset,
instance_id=instance['instance_id'],
)
config = AppConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
metadata.llm_config, metadata.eval_output_dir, instance['instance_id']
)
)
agent_config = AgentConfig(
enable_jupyter=False,
enable_browsing=RUN_WITH_BROWSING,
enable_llm_editor=False,
condenser=metadata.condenser_config,
enable_prompt_extensions=False,
)
config.set_agent_config(agent_config)
return config
def initialize_runtime(
runtime: Runtime,
instance: pd.Series, # this argument is not required
):
"""Initialize the runtime for the agent.
This function is called before the runtime is used to run the agent.
"""
logger.info('-' * 30)
logger.info('BEGIN Runtime Initialization Fn')
logger.info('-' * 30)
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
obs: CmdOutputObservation
# Set instance id
action = CmdRunAction(
command=f"""echo 'export SWE_INSTANCE_ID={instance['instance_id']}' >> ~/.bashrc && echo 'export PIP_CACHE_DIR=~/.cache/pip' >> ~/.bashrc && echo "alias git='git --no-pager'" >> ~/.bashrc"""
)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0, f'Failed to export SWE_INSTANCE_ID: {str(obs)}'
)
action = CmdRunAction(command="""export USER=$(whoami); echo USER=${USER} """)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to export USER: {str(obs)}')
# inject the init script
script_dir = os.path.dirname(__file__)
# inject the instance info
action = CmdRunAction(command='mkdir -p /swe_util/eval_data/instances')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to create /swe_util/eval_data/instances: {str(obs)}',
)
swe_instance_json_name = 'swe-bench-instance.json'
with tempfile.TemporaryDirectory() as temp_dir:
# Construct the full path for the desired file name within the temporary directory
temp_file_path = os.path.join(temp_dir, swe_instance_json_name)
# Write to the file with the desired name within the temporary directory
with open(temp_file_path, 'w') as f:
if not isinstance(instance, dict):
json.dump([instance.to_dict()], f)
else:
json.dump([instance], f)
# Copy the file to the desired location
runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')
# inject the instance swe entry
runtime.copy_to(
str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
'/swe_util/',
)
action = CmdRunAction(command='cat ~/.bashrc')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to cat ~/.bashrc: {str(obs)}')
action = CmdRunAction(command='source ~/.bashrc')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if isinstance(obs, ErrorObservation):
logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')
action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
)
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
)
action = CmdRunAction(command='git reset --hard')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to git reset --hard: {str(obs)}')
action = CmdRunAction(
command='for remote_name in $(git remote); do git remote remove "${remote_name}"; done'
)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to remove git remotes: {str(obs)}')
action = CmdRunAction(command='which python')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0 and 'testbed' in obs.content,
f'Expected to find python interpreter from testbed, but got: {str(obs)}',
)
logger.info('-' * 30)
logger.info('END Runtime Initialization Fn')
logger.info('-' * 30)
def complete_runtime(
runtime: Runtime,
instance: pd.Series, # this argument is not required, but it is used to get the workspace_dir_name
) -> dict[str, Any]:
"""Complete the runtime for the agent.
This function is called before the runtime is used to run the agent.
If you need to do something in the sandbox to get the correctness metric after
the agent has run, modify this function.
"""
logger.info('-' * 30)
logger.info('BEGIN Runtime Completion Fn')
logger.info('-' * 30)
obs: CmdOutputObservation
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if obs.exit_code == -1:
# The previous command is still running
# We need to kill previous command
logger.info('The previous command is still running, trying to kill it...')
action = CmdRunAction(command='C-c')
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
# Then run the command again
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
)
action = CmdRunAction(command='git config --global core.pager ""')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git config --global core.pager "": {str(obs)}',
)
# First check for any git repositories in subdirectories
action = CmdRunAction(command='find . -type d -name .git -not -path "./.git"')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to find git repositories: {str(obs)}',
)
git_dirs = [p for p in obs.content.strip().split('\n') if p]
if git_dirs:
# Remove all .git directories in subdirectories
for git_dir in git_dirs:
action = CmdRunAction(command=f'rm -rf "{git_dir}"')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to remove git directory {git_dir}: {str(obs)}',
)
# add all files
action = CmdRunAction(command='git add -A')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git add -A: {str(obs)}',
)
n_retries = 0
git_patch = None
while n_retries < 5:
action = CmdRunAction(
command=f'git diff --no-color --cached {instance["base_commit"]}'
)
action.set_hard_timeout(max(300 + 100 * n_retries, 600))
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
n_retries += 1
if isinstance(obs, CmdOutputObservation):
if obs.exit_code == 0:
git_patch = obs.content.strip()
break
else:
logger.info('Failed to get git diff, retrying...')
sleep_if_should_continue(10)
elif isinstance(obs, ErrorObservation):
logger.error(f'Error occurred: {obs.content}. Retrying...')
sleep_if_should_continue(10)
else:
assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
logger.info('-' * 30)
logger.info('END Runtime Completion Fn')
logger.info('-' * 30)
return {'git_patch': git_patch}
def process_instance(
instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
runtime_failure_count: int = 0,
) -> EvalOutput:
config = get_config(instance, metadata)
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
if reset_logger:
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
else:
logger.info(f'Starting evaluation for instance {instance.instance_id}.')
# Increase resource_factor with increasing attempt_id
if runtime_failure_count > 0:
config.sandbox.remote_runtime_resource_factor = min(
config.sandbox.remote_runtime_resource_factor * (2**runtime_failure_count),
8,
)
logger.warning(
f'This is the {runtime_failure_count + 1}th attempt for instance {instance.instance_id}, setting resource factor to {config.sandbox.remote_runtime_resource_factor}'
)
runtime = create_runtime(config)
call_async_from_sync(runtime.connect)
try:
initialize_runtime(runtime, instance)
instruction = get_instruction(instance, metadata)
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State | None = asyncio.run(
run_controller(
config=config,
initial_user_action=MessageAction(content=instruction),
runtime=runtime,
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
metadata.agent_class
],
)
)
# if fatal error, throw EvalError to trigger re-run
if is_fatal_evaluation_error(state.last_error):
raise EvalException('Fatal error detected: ' + state.last_error)
# ======= THIS IS SWE-Bench specific =======
# Get git patch
return_val = complete_runtime(runtime, instance)
git_patch = return_val['git_patch']
logger.info(
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
)
finally:
runtime.close()
# ==========================================
# ======= Attempt to evaluate the agent's edits =======
# we use eval_infer.sh to evaluate the agent's edits, not here
# because the agent may alter the environment / testcases
test_result = {
'git_patch': git_patch,
}
# If you are working on some simpler benchmark that only evaluates the final model output (e.g., in a MessageAction)
# You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
if state is None:
raise ValueError('State should not be None.')
# NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
histories = [event_to_dict(event) for event in state.history]
metrics = get_metrics(state)
# Save the output
output = EvalOutput(
instance_id=instance.instance_id,
instruction=instruction,
instance=instance.to_dict(), # SWE Bench specific
test_result=test_result,
metadata=metadata,
history=histories,
metrics=metrics,
error=state.last_error if state and state.last_error else None,
)
return output
def filter_dataset(dataset: pd.DataFrame, filter_column: str) -> pd.DataFrame:
file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.toml')
if os.path.exists(file_path):
with open(file_path, 'r') as file:
data = toml.load(file)
if 'selected_ids' in data:
selected_ids = data['selected_ids']
logger.info(
f'Filtering {len(selected_ids)} tasks from "selected_ids"...'
)
subset = dataset[dataset[filter_column].isin(selected_ids)]
logger.info(f'Retained {subset.shape[0]} tasks after filtering')
return subset
skip_ids = os.environ.get('SKIP_IDS', '').split(',')
if len(skip_ids) > 0:
logger.info(f'Filtering {len(skip_ids)} tasks from "SKIP_IDS"...')
return dataset[~dataset[filter_column].isin(skip_ids)]
return dataset
# A list of instances that are known to be tricky to infer
# (will cause runtime failure even with resource factor = 8)
SWEGYM_EXCLUDE_IDS = [
'dask__dask-10422',
'pandas-dev__pandas-50548',
'pandas-dev__pandas-53672',
'pandas-dev__pandas-54174',
'pandas-dev__pandas-55518',
'pandas-dev__pandas-58383',
'pydata__xarray-6721',
'pytest-dev__pytest-10081',
'pytest-dev__pytest-7236',
]
if __name__ == '__main__':
parser = get_parser()
parser.add_argument(
'--dataset',
type=str,
default='princeton-nlp/SWE-bench',
help='data set to evaluate on, either full-test or lite-test',
)
parser.add_argument(
'--split',
type=str,
default='test',
help='split to evaluate on',
)
args, _ = parser.parse_known_args()
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
# so we don't need to manage file uploading to OpenHands's repo
dataset = load_dataset(args.dataset, split=args.split)
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
if 'SWE-Gym' in args.dataset:
swe_bench_tests = swe_bench_tests[
~swe_bench_tests['instance_id'].isin(SWEGYM_EXCLUDE_IDS)
]
logger.info(
f'{len(swe_bench_tests)} tasks left after excluding SWE-Gym excluded tasks'
)
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
llm_config.log_completions = True
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
llm_config.modify_params = False
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
details = {}
_agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
dataset_descrption = (
args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
)
metadata = make_metadata(
llm_config,
dataset_descrption,
args.agent_cls,
args.max_iterations,
args.eval_note,
args.eval_output_dir,
details=details,
)
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
print(f'### OUTPUT FILE: {output_file} ###')
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8 * 60 * 60, # 8 hour PER instance should be more than enough
max_retries=5,
)
@@ -0,0 +1,157 @@
xingyaoww/sweb.eval.x86_64.astropy_s_astropy-11693:latest
xingyaoww/sweb.eval.x86_64.astropy_s_astropy-13838:latest
xingyaoww/sweb.eval.x86_64.astropy_s_astropy-14295:latest
xingyaoww/sweb.eval.x86_64.astropy_s_astropy-8292:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13908:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13980:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13983:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13984:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-14043:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-14623:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-19763:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20470:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20518:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20584:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20761:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20826:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21443:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21490:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21550:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21568:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21617:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22865:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22871:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22931:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23047:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23111:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23412:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24088:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24177:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24189:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24570:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24691:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24749:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24768:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24849:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24870:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24971:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25287:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25334:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25340:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25346:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25405:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25499:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25565:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25640:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25667:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25779:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-26078:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-26466:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2576:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2846:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2979:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3180:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3187:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3202:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3216:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3217:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3276:latest
xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3394:latest
xingyaoww/sweb.eval.x86_64.pydata_s_xarray-4182:latest
xingyaoww/sweb.eval.x86_64.pydata_s_xarray-5682:latest
xingyaoww/sweb.eval.x86_64.pylint-dev_s_pylint-4551:latest
xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-13087:latest
xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-13618:latest
xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-14067:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10048:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10097:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10191:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10435:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-11266:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-11502:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-7615:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-7757:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8028:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8056:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8075:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8120:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8265:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8278:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8620:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8621:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8638:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8658:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9229:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9230:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9289:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9320:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9350:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9464:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9673:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9698:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9797:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9982:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9987:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9997:latest
xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9999:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-11787:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-11788:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-13264:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-13840:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15151:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15304:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15625:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15976:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-16003:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-17067:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-17115:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-18922:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-21769:latest
xingyaoww/sweb.eval.x86_64.sympy_s_sympy-24723:latest
luolin101/sweb.eval.x86_64.plotly_s_plotly.py-4083:latest
luolin101/sweb.eval.x86_64.plotly_s_plotly.py-2600:latest
luolin101/sweb.eval.x86_64.plotly_s_plotly.py-2591:latest
luolin101/sweb.eval.x86_64.plotly_s_plotly.py-1966:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-6503:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-6098:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-5616:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-5354:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-5058:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-4378:latest
luolin101/sweb.eval.x86_64.networkx_s_networkx-3764:latest
luolin101/sweb.eval.x86_64.vega_s_altair-2785:latest
luolin101/sweb.eval.x86_64.vega_s_altair-1092:latest
luolin101/sweb.eval.x86_64.vega_s_altair-974:latest
luolin101/sweb.eval.x86_64.vega_s_altair-830:latest
luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-27754:latest
luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26926:latest
luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26788:latest
luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26586:latest
luolin101/sweb.eval.x86_64.sympy_s_sympy-26941:latest
luolin101/sweb.eval.x86_64.mwaskom_s_seaborn-3458:latest
luolin101/sweb.eval.x86_64.mwaskom_s_seaborn-3454:latest
xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25631:latest
xingyaoww/sweb.env.x86_64.428468730904ff6b4232aa:latest
xingyaoww/sweb.env.x86_64.89a9e6df7ab7bcb9e010c8:latest
xingyaoww/sweb.env.x86_64.15374367de368534f261e3:latest
xingyaoww/sweb.env.x86_64.6b007979cf533f0f3016e8:latest
xingyaoww/sweb.env.x86_64.b382c45e0a94d34ef0fc86:latest
xingyaoww/sweb.env.x86_64.7037e8c448a4b8ebfe9b13:latest
xingyaoww/sweb.env.x86_64.31244378a92e3bcce809ac:latest
xingyaoww/sweb.env.x86_64.efa6065ed5bf204410fd53:latest
xingyaoww/sweb.env.x86_64.a0efca7a0fe6719dbf65c2:latest
xingyaoww/sweb.env.x86_64.502d8fc6ebccd881244091:latest
luolin101/sweb.env.x86_64.eb002359cfcbe2edb56088:latest
xingyaoww/sweb.env.x86_64.d905bb51fb68acc5d4221b:latest
xingyaoww/sweb.env.x86_64.aa92880033da20ca313928:latest
luolin101/sweb.env.x86_64.c6d251a05e0af7688b64fd:latest
xingyaoww/sweb.env.x86_64.c795f4b88616b8462021ed:latest
luolin101/sweb.env.x86_64.1e5a06e76ee016d067d77e:latest
luolin101/sweb.env.x86_64.2e03d8e4d4bd373937a9ef:latest
luolin101/sweb.env.x86_64.4c16026920d27ea78f3b7a:latest
luolin101/sweb.env.x86_64.d15120dfdbda9831e9646b:latest
luolin101/sweb.env.x86_64.c581ba273c3275679773dd:latest
luolin101/sweb.env.x86_64.dc800a1bbe275c5de0c4aa:latest
luolin101/sweb.env.x86_64.59bd7d84a0939c7caba7e6:latest
xingyaoww/sweb.env.x86_64.0d80c7dec81ee2f2f513e2:latest
xingyaoww/sweb.base.x86_64:latest
@@ -0,0 +1,62 @@
#!/bin/bash
set -e
LEVEL=$1
# three levels:
# - base, keyword "sweb.base"
# - env, keyword "sweb.env"
# - instance, keyword "sweb.eval"
SET=$2
if [ -z "$LEVEL" ]; then
echo "Usage: $0 <cache_level> <set>"
echo "cache_level: base, env, or instance"
echo "set: lite, full"
exit 1
fi
if [ -z "$SET" ]; then
echo "Usage: $0 <cache_level> <set>"
echo "cache_level: base, env, or instance"
echo "set: lite, full, default is lite"
SET="lite"
fi
if [ "$SET" == "full" ]; then
IMAGE_FILE="$(dirname "$0")/all-visualswebench-full-instance-images.txt"
else
IMAGE_FILE="$(dirname "$0")/all-visualswebench-full-instance-images.txt"
fi
# Define a pattern based on the level
case $LEVEL in
base)
PATTERN="sweb.base"
;;
env)
PATTERN="sweb.base\|sweb.env"
;;
instance)
PATTERN="sweb.base\|sweb.env\|sweb.eval"
;;
*)
echo "Invalid cache level: $LEVEL"
echo "Valid levels are: base, env, instance"
exit 1
;;
esac
echo "Pulling docker images for [$LEVEL] level"
echo "Pattern: $PATTERN"
echo "Image file: $IMAGE_FILE"
# Read each line from the file, filter by pattern, and pull the docker image
grep "$PATTERN" "$IMAGE_FILE" | while IFS= read -r image; do
echo "Pulling $image into $image"
docker pull $image
# replace _s_ to __ in the image name
renamed_image=$(echo "$image" | sed 's|.*/||; s/_s_/__/g')
docker tag $image $renamed_image
done
@@ -0,0 +1,141 @@
#!/bin/bash
PROCESS_FILEPATH=$1
if [ -z "$PROCESS_FILEPATH" ]; then
echo "Error: PROCESS_FILEPATH is empty. Usage: ./eval_infer.sh <output_file> [instance_id] [dataset_name] [split]"
exit 1
fi
if [ ! -f $PROCESS_FILEPATH ]; then
echo "Error: $PROCESS_FILEPATH is not a file"
exit 1
fi
# If instance_id is empty, it means we want to eval on the whole $PROCESS_FILEPATH
# otherwise, we want to eval on the instance_id
INSTANCE_ID=$2
DATASET_NAME=${3:-"luolin101/Visual-SWE-bench"}
SPLIT=${4:-"test"}
echo "INSTANCE_ID: $INSTANCE_ID"
echo "DATASET_NAME: $DATASET_NAME"
echo "SPLIT: $SPLIT"
PROCESS_FILEPATH=$(realpath $PROCESS_FILEPATH)
FILE_DIR=$(dirname $PROCESS_FILEPATH)
FILE_NAME=$(basename $PROCESS_FILEPATH)
echo "Evaluating $FILE_NAME @ $FILE_DIR"
# ================================================
# detect whether PROCESS_FILEPATH is in OH format or in SWE-bench format
echo "=============================================================="
echo "Detecting whether PROCESS_FILEPATH is in OH format or in SWE-bench format"
echo "=============================================================="
# SWE-bench format is a JSONL where every line has three fields: model_name_or_path, instance_id, and model_patch
function is_swebench_format() {
# Read the first line of the file
read -r first_line < "$PROCESS_FILEPATH"
# Use jq to check if the first line has the required fields
echo "$first_line" | jq -e '. | has("model_name_or_path") and has("instance_id") and has("model_patch")' > /dev/null
if [ $? -ne 0 ]; then
return 1 # Return 1 if the first line does not have the required fields
fi
return 0 # Return 0 if the first line has the required fields
}
# Call the function with the file path
is_swebench_format "$PROCESS_FILEPATH"
IS_SWEBENCH_FORMAT=$?
# Use the result in an if-else statement
if [ $IS_SWEBENCH_FORMAT -eq 0 ]; then
echo "The file IS in SWE-bench format."
SWEBENCH_FORMAT_JSONL=$PROCESS_FILEPATH
else
echo "The file IS NOT in SWE-bench format."
# ==== Convert OH format to SWE-bench format ====
echo "Merged output file with fine-grained report will be saved to $FILE_DIR"
poetry run python3 evaluation/benchmarks/swe_bench/scripts/eval/convert_oh_output_to_swe_json.py $PROCESS_FILEPATH
# replace .jsonl with .swebench.jsonl in filename
SWEBENCH_FORMAT_JSONL=${PROCESS_FILEPATH/.jsonl/.swebench.jsonl}
echo "SWEBENCH_FORMAT_JSONL: $SWEBENCH_FORMAT_JSONL"
# assert that the file exists
if [ ! -f $SWEBENCH_FORMAT_JSONL ]; then
echo "Error: $SWEBENCH_FORMAT_JSONL does not exist. There is probably an error in the conversion process."
exit 1
fi
SWEBENCH_FORMAT_JSONL=$(realpath $SWEBENCH_FORMAT_JSONL)
fi
# ================================================
echo "=============================================================="
echo "Running SWE-bench evaluation"
echo "=============================================================="
RUN_ID=$(date +"%Y%m%d_%H%M%S")
N_PROCESS=16
if [ -z "$INSTANCE_ID" ]; then
echo "Running SWE-bench evaluation on the whole input file..."
# Default to SWE-Bench-lite
# change `--dataset_name` and `--split` to alter dataset
poetry run python -m visualswebench.harness.run_evaluation \
--dataset_name "$DATASET_NAME" \
--split "$SPLIT" \
--predictions_path $SWEBENCH_FORMAT_JSONL \
--timeout 1800 \
--cache_level instance \
--max_workers $N_PROCESS \
--run_id $RUN_ID
# get the "model_name_or_path" from the first line of the SWEBENCH_FORMAT_JSONL
MODEL_NAME_OR_PATH=$(jq -r '.model_name_or_path' $SWEBENCH_FORMAT_JSONL | head -n 1)
echo "MODEL_NAME_OR_PATH: $MODEL_NAME_OR_PATH"
RESULT_OUTPUT_DIR=$(dirname $SWEBENCH_FORMAT_JSONL)
echo "RESULT_OUTPUT_DIR: $RESULT_OUTPUT_DIR"
# move the eval results to the target directory
mkdir -p $RESULT_OUTPUT_DIR
# rm eval_outputs directory if it exists
if [ -d $RESULT_OUTPUT_DIR/eval_outputs ]; then
rm -rf $RESULT_OUTPUT_DIR/eval_outputs
fi
mv logs/run_evaluation/$RUN_ID/$MODEL_NAME_OR_PATH $RESULT_OUTPUT_DIR
mv $RESULT_OUTPUT_DIR/$MODEL_NAME_OR_PATH $RESULT_OUTPUT_DIR/eval_outputs
echo "RUN_ID: $RUN_ID" > $RESULT_OUTPUT_DIR/run_id.txt
# move report file
REPORT_PATH=$MODEL_NAME_OR_PATH.$RUN_ID.json
if [ -f $REPORT_PATH ]; then
# check if $RESULT_OUTPUT_DIR/report.json exists
if [ -f $RESULT_OUTPUT_DIR/report.json ]; then
echo "Report file $RESULT_OUTPUT_DIR/report.json already exists. Overwriting..."
if [ -f $RESULT_OUTPUT_DIR/report.json.bak ]; then
rm $RESULT_OUTPUT_DIR/report.json.bak
fi
mv $RESULT_OUTPUT_DIR/report.json $RESULT_OUTPUT_DIR/report.json.bak
fi
mv $REPORT_PATH $RESULT_OUTPUT_DIR/report.json
fi
poetry run python evaluation/benchmarks/swe_bench/scripts/eval/update_output_with_eval.py $PROCESS_FILEPATH
else
echo "Running SWE-bench evaluation on the instance_id: $INSTANCE_ID"
poetry run python -m visualswebench.harness.run_evaluation \
--dataset_name "$DATASET_NAME" \
--split "$SPLIT" \
--predictions_path $SWEBENCH_FORMAT_JSONL \
--timeout 1800 \
--instance_ids $INSTANCE_ID \
--cache_level instance \
--max_workers $N_PROCESS \
--run_id $RUN_ID
fi
+117
View File
@@ -0,0 +1,117 @@
#!/bin/bash
set -eo pipefail
source "evaluation/utils/version_control.sh"
MODEL_CONFIG=$1
COMMIT_HASH=$2
AGENT=$3
EVAL_LIMIT=$4
MAX_ITER=$5
NUM_WORKERS=$6
DATASET=$7
SPLIT=$8
N_RUNS=$9
if [ -z "$NUM_WORKERS" ]; then
NUM_WORKERS=1
echo "Number of workers not specified, use default $NUM_WORKERS"
fi
checkout_eval_branch
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
if [ -z "$MAX_ITER" ]; then
echo "MAX_ITER not specified, use default 100"
MAX_ITER=100
fi
if [ -z "$USE_INSTANCE_IMAGE" ]; then
echo "USE_INSTANCE_IMAGE not specified, use default true"
USE_INSTANCE_IMAGE=true
fi
if [ -z "$RUN_WITH_BROWSING" ]; then
echo "RUN_WITH_BROWSING not specified, use default false"
RUN_WITH_BROWSING=false
fi
if [ -z "$DATASET" ]; then
echo "DATASET not specified, use default luolin101/Visual-SWE-bench"
DATASET="luolin101/Visual-SWE-bench"
fi
if [ -z "$SPLIT" ]; then
echo "SPLIT not specified, use default test"
SPLIT="test"
fi
export USE_INSTANCE_IMAGE=$USE_INSTANCE_IMAGE
echo "USE_INSTANCE_IMAGE: $USE_INSTANCE_IMAGE"
export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
get_openhands_version
echo "AGENT: $AGENT"
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
echo "MODEL_CONFIG: $MODEL_CONFIG"
echo "DATASET: $DATASET"
echo "SPLIT: $SPLIT"
# Default to NOT use Hint
if [ -z "$USE_HINT_TEXT" ]; then
export USE_HINT_TEXT=false
fi
echo "USE_HINT_TEXT: $USE_HINT_TEXT"
EVAL_NOTE="$OPENHANDS_VERSION"
# if not using Hint, add -no-hint to the eval note
if [ "$USE_HINT_TEXT" = false ]; then
EVAL_NOTE="$EVAL_NOTE-no-hint"
fi
if [ "$RUN_WITH_BROWSING" = true ]; then
EVAL_NOTE="$EVAL_NOTE-with-browsing"
fi
if [ -n "$EXP_NAME" ]; then
EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
fi
function run_eval() {
local eval_note=$1
COMMAND="poetry run python evaluation/benchmarks/visual_swe_bench/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations $MAX_ITER \
--eval-num-workers $NUM_WORKERS \
--eval-note $eval_note \
--dataset $DATASET \
--split $SPLIT"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
}
unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
if [ -z "$N_RUNS" ]; then
N_RUNS=1
echo "N_RUNS not specified, use default $N_RUNS"
fi
for i in $(seq 1 $N_RUNS); do
current_eval_note="$EVAL_NOTE-run_$i"
echo "EVAL_NOTE: $current_eval_note"
run_eval $current_eval_note
done
checkout_original_branch
@@ -0,0 +1,40 @@
#!/bin/bash
source ~/.bashrc
SWEUTIL_DIR=/swe_util
# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
# SWE_INSTANCE_ID=django__django-11099
if [ -z "$SWE_INSTANCE_ID" ]; then
echo "Error: SWE_INSTANCE_ID is not set." >&2
exit 1
fi
# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
if [[ -z "$item" ]]; then
echo "No item found for the provided instance ID."
exit 1
fi
WORKSPACE_NAME=$(echo "$item" | jq -r '(.repo | tostring) + "__" + (.version | tostring) | gsub("/"; "__")')
echo "WORKSPACE_NAME: $WORKSPACE_NAME"
# Clear the workspace
if [ -d /workspace ]; then
rm -rf /workspace/*
else
mkdir /workspace
fi
# Copy repo to workspace
if [ -d /workspace/$WORKSPACE_NAME ]; then
rm -rf /workspace/$WORKSPACE_NAME
fi
mkdir -p /workspace
cp -r /testbed /workspace/$WORKSPACE_NAME
# Activate instance-specific environment
. /opt/miniconda3/etc/profile.d/conda.sh
conda activate testbed
@@ -10,11 +10,7 @@ describe("ChatMessage", () => {
expect(screen.getByText("Hello, World!")).toBeInTheDocument();
});
it("should render an assistant message", () => {
render(<ChatMessage type="assistant" message="Hello, World!" />);
expect(screen.getByTestId("assistant-message")).toBeInTheDocument();
expect(screen.getByText("Hello, World!")).toBeInTheDocument();
});
it.todo("should render an assistant message");
it.skip("should support code syntax highlighting", () => {
const code = "```js\nconsole.log('Hello, World!')\n```";
@@ -66,10 +62,7 @@ describe("ChatMessage", () => {
it("should apply correct styles to inline code", () => {
render(
<ChatMessage
type="assistant"
message="Here is some `inline code` text"
/>,
<ChatMessage type="agent" message="Here is some `inline code` text" />,
);
const codeElement = screen.getByText("inline code");
@@ -1,11 +1,9 @@
import { afterEach, beforeAll, describe, expect, it, vi } from "vitest";
import { act, screen, waitFor, within } from "@testing-library/react";
import { screen, waitFor, within } from "@testing-library/react";
import userEvent from "@testing-library/user-event";
import { renderWithProviders } from "test-utils";
import type { Message } from "#/message";
import { addUserMessage } from "#/state/chat-slice";
import { SUGGESTIONS } from "#/utils/suggestions";
import * as ChatSlice from "#/state/chat-slice";
import { WsClientProviderStatus } from "#/context/ws-client-provider";
import { ChatInterface } from "#/components/features/chat/chat-interface";
@@ -42,51 +40,10 @@ describe("Empty state", () => {
vi.clearAllMocks();
});
it("should render suggestions if empty", () => {
const { store } = renderWithProviders(<ChatInterface />, {
preloadedState: {
chat: {
messages: [],
systemMessage: {
content: "",
tools: [],
openhands_version: null,
agent_class: null
}
},
},
});
expect(screen.getByTestId("suggestions")).toBeInTheDocument();
act(() => {
store.dispatch(
addUserMessage({
content: "Hello",
imageUrls: [],
timestamp: new Date().toISOString(),
pending: true,
}),
);
});
expect(screen.queryByTestId("suggestions")).not.toBeInTheDocument();
});
it.todo("should render suggestions if empty");
it("should render the default suggestions", () => {
renderWithProviders(<ChatInterface />, {
preloadedState: {
chat: {
messages: [],
systemMessage: {
content: "",
tools: [],
openhands_version: null,
agent_class: null
}
},
},
});
renderWithProviders(<ChatInterface />);
const suggestions = screen.getByTestId("suggestions");
const repoSuggestions = Object.keys(SUGGESTIONS.repo);
@@ -110,21 +67,8 @@ describe("Empty state", () => {
status: WsClientProviderStatus.CONNECTED,
isLoadingMessages: false,
}));
const addUserMessageSpy = vi.spyOn(ChatSlice, "addUserMessage");
const user = userEvent.setup();
const { store } = renderWithProviders(<ChatInterface />, {
preloadedState: {
chat: {
messages: [],
systemMessage: {
content: "",
tools: [],
openhands_version: null,
agent_class: null
}
},
},
});
renderWithProviders(<ChatInterface />);
const suggestions = screen.getByTestId("suggestions");
const displayedSuggestions = within(suggestions).getAllByRole("button");
@@ -133,9 +77,7 @@ describe("Empty state", () => {
await user.click(displayedSuggestions[0]);
// user message loaded to input
expect(addUserMessageSpy).not.toHaveBeenCalled();
expect(screen.queryByTestId("suggestions")).toBeInTheDocument();
expect(store.getState().chat.messages).toHaveLength(0);
expect(input).toHaveValue(displayedSuggestions[0].textContent);
},
);
@@ -149,19 +91,7 @@ describe("Empty state", () => {
isLoadingMessages: false,
}));
const user = userEvent.setup();
const { rerender } = renderWithProviders(<ChatInterface />, {
preloadedState: {
chat: {
messages: [],
systemMessage: {
content: "",
tools: [],
openhands_version: null,
agent_class: null
}
},
},
});
const { rerender } = renderWithProviders(<ChatInterface />);
const suggestions = screen.getByTestId("suggestions");
const displayedSuggestions = within(suggestions).getAllByRole("button");
@@ -22,7 +22,7 @@ const renderRepoConnector = () => {
path: "/conversations/:conversationId",
},
{
Component: Outlet,
Component: () => <Outlet />,
path: "/settings",
children: [
{
@@ -11,7 +11,7 @@ import { MOCK_TASKS } from "#/mocks/task-suggestions-handlers";
const renderTaskSuggestions = () => {
const RouterStub = createRoutesStub([
{
Component: TaskSuggestions,
Component: () => <TaskSuggestions />,
path: "/",
},
{
@@ -1,92 +1,11 @@
import { render, screen } from "@testing-library/react";
import { describe, it, expect, vi } from "vitest";
import { Messages } from "#/components/features/chat/messages";
import type { Message } from "#/message";
import { renderWithProviders } from "test-utils";
// Mock the useParams hook to provide a conversationId
vi.mock("react-router", async () => {
const actual = await vi.importActual<typeof import("react-router")>("react-router");
return {
...actual,
useParams: () => ({ conversationId: "test-conversation-id" }),
};
});
import { describe, it } from "vitest";
describe("File Operations Messages", () => {
it("should show success indicator for successful file read operation", () => {
const messages: Message[] = [
{
type: "action",
translationID: "read_file_contents",
content: "Successfully read file contents",
success: true,
sender: "assistant",
timestamp: new Date().toISOString(),
},
];
it.todo("should show success indicator for successful file read operation");
renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
it.todo("should show failure indicator for failed file read operation");
const statusIcon = screen.getByTestId("status-icon");
expect(statusIcon).toBeInTheDocument();
expect(statusIcon.closest("svg")).toHaveClass("fill-success");
});
it.todo("should show success indicator for successful file edit operation");
it("should show failure indicator for failed file read operation", () => {
const messages: Message[] = [
{
type: "action",
translationID: "read_file_contents",
content: "Failed to read file contents",
success: false,
sender: "assistant",
timestamp: new Date().toISOString(),
},
];
renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
const statusIcon = screen.getByTestId("status-icon");
expect(statusIcon).toBeInTheDocument();
expect(statusIcon.closest("svg")).toHaveClass("fill-danger");
});
it("should show success indicator for successful file edit operation", () => {
const messages: Message[] = [
{
type: "action",
translationID: "edit_file_contents",
content: "Successfully edited file contents",
success: true,
sender: "assistant",
timestamp: new Date().toISOString(),
},
];
renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
const statusIcon = screen.getByTestId("status-icon");
expect(statusIcon).toBeInTheDocument();
expect(statusIcon.closest("svg")).toHaveClass("fill-success");
});
it("should show failure indicator for failed file edit operation", () => {
const messages: Message[] = [
{
type: "action",
translationID: "edit_file_contents",
content: "Failed to edit file contents",
success: false,
sender: "assistant",
timestamp: new Date().toISOString(),
},
];
renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
const statusIcon = screen.getByTestId("status-icon");
expect(statusIcon).toBeInTheDocument();
expect(statusIcon.closest("svg")).toHaveClass("fill-danger");
});
it.todo("should show failure indicator for failed file edit operation");
});
@@ -2,7 +2,6 @@ import { describe, it, expect, vi, beforeEach } from "vitest";
import { render, waitFor } from "@testing-library/react";
import React from "react";
import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
import * as ChatSlice from "#/state/chat-slice";
import {
updateStatusWhenErrorMessagePresent,
WsClientProvider,
@@ -11,42 +10,15 @@ import {
describe("Propagate error message", () => {
it("should do nothing when no message was passed from server", () => {
const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
updateStatusWhenErrorMessagePresent(null);
updateStatusWhenErrorMessagePresent(undefined);
updateStatusWhenErrorMessagePresent({});
updateStatusWhenErrorMessagePresent({ message: null });
expect(addErrorMessageSpy).not.toHaveBeenCalled();
});
it("should display error to user when present", () => {
const message = "We have a problem!";
const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
updateStatusWhenErrorMessagePresent({ message });
it.todo("should display error to user when present");
expect(addErrorMessageSpy).toHaveBeenCalledWith({
message,
status_update: true,
type: "error",
});
});
it("should display error including translation id when present", () => {
const message = "We have a problem!";
const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
updateStatusWhenErrorMessagePresent({
message,
data: { msg_id: "..id.." },
});
expect(addErrorMessageSpy).toHaveBeenCalledWith({
message,
id: "..id..",
status_update: true,
type: "error",
});
});
it.todo("should display error including translation id when present");
});
// Create a mock for socket.io-client
+3 -15
View File
@@ -59,11 +59,7 @@ describe("useTerminal", () => {
it("should render", () => {
renderWithProviders(<TestTerminalComponent commands={[]} />, {
preloadedState: {
agent: {
curAgentState: AgentState.RUNNING,
currentAgentType: "CodeActAgent",
isDelegated: false
},
agent: { curAgentState: AgentState.RUNNING },
cmd: { commands: [] },
},
});
@@ -77,11 +73,7 @@ describe("useTerminal", () => {
renderWithProviders(<TestTerminalComponent commands={commands} />, {
preloadedState: {
agent: {
curAgentState: AgentState.RUNNING,
currentAgentType: "CodeActAgent",
isDelegated: false
},
agent: { curAgentState: AgentState.RUNNING },
cmd: { commands },
},
});
@@ -108,11 +100,7 @@ describe("useTerminal", () => {
/>,
{
preloadedState: {
agent: {
curAgentState: AgentState.RUNNING,
currentAgentType: "CodeActAgent",
isDelegated: false
},
agent: { curAgentState: AgentState.RUNNING },
cmd: { commands },
},
},
@@ -22,7 +22,7 @@ const MOCK_GET_SECRETS_RESPONSE: GetSecretsResponse["custom_secrets"] = [
const RouterStub = createRoutesStub([
{
Component: Outlet,
Component: () => <Outlet />,
path: "/settings",
children: [
{
-146
View File
@@ -1,146 +0,0 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { handleStatusMessage, handleActionMessage } from "#/services/actions";
import store from "#/store";
import { trackError } from "#/utils/error-handler";
import ActionType from "#/types/action-type";
import { ActionMessage } from "#/types/message";
// Mock dependencies
vi.mock("#/utils/error-handler", () => ({
trackError: vi.fn(),
}));
vi.mock("#/store", () => ({
default: {
dispatch: vi.fn(),
},
}));
describe("Actions Service", () => {
beforeEach(() => {
vi.clearAllMocks();
});
describe("handleStatusMessage", () => {
it("should dispatch info messages to status state", () => {
const message = {
type: "info",
message: "Runtime is not available",
id: "runtime.unavailable",
status_update: true as const,
};
handleStatusMessage(message);
expect(store.dispatch).toHaveBeenCalledWith(expect.objectContaining({
payload: message,
}));
});
it("should log error messages and display them in chat", () => {
const message = {
type: "error",
message: "Runtime connection failed",
id: "runtime.connection.failed",
status_update: true as const,
};
handleStatusMessage(message);
expect(trackError).toHaveBeenCalledWith({
message: "Runtime connection failed",
source: "chat",
metadata: { msgId: "runtime.connection.failed" },
});
expect(store.dispatch).toHaveBeenCalledWith(expect.objectContaining({
payload: message,
}));
});
});
describe("handleActionMessage", () => {
it("should use first-person perspective for task completion messages", () => {
// Test partial completion
const messagePartial: ActionMessage = {
id: 1,
action: ActionType.FINISH,
source: "agent",
message: "",
timestamp: new Date().toISOString(),
args: {
final_thought: "",
task_completed: "partial",
outputs: "",
thought: ""
}
};
// Mock implementation to capture the message
let capturedPartialMessage = "";
(store.dispatch as any).mockImplementation((action: any) => {
if (action.type === "chat/addAssistantMessage" &&
action.payload.includes("believe that the task was **completed partially**")) {
capturedPartialMessage = action.payload;
}
});
handleActionMessage(messagePartial);
expect(capturedPartialMessage).toContain("I believe that the task was **completed partially**");
// Test not completed
const messageNotCompleted: ActionMessage = {
id: 2,
action: ActionType.FINISH,
source: "agent",
message: "",
timestamp: new Date().toISOString(),
args: {
final_thought: "",
task_completed: "false",
outputs: "",
thought: ""
}
};
// Mock implementation to capture the message
let capturedNotCompletedMessage = "";
(store.dispatch as any).mockImplementation((action: any) => {
if (action.type === "chat/addAssistantMessage" &&
action.payload.includes("believe that the task was **not completed**")) {
capturedNotCompletedMessage = action.payload;
}
});
handleActionMessage(messageNotCompleted);
expect(capturedNotCompletedMessage).toContain("I believe that the task was **not completed**");
// Test completed successfully
const messageCompleted: ActionMessage = {
id: 3,
action: ActionType.FINISH,
source: "agent",
message: "",
timestamp: new Date().toISOString(),
args: {
final_thought: "",
task_completed: "true",
outputs: "",
thought: ""
}
};
// Mock implementation to capture the message
let capturedCompletedMessage = "";
(store.dispatch as any).mockImplementation((action: any) => {
if (action.type === "chat/addAssistantMessage" &&
action.payload.includes("believe that the task was **completed successfully**")) {
capturedCompletedMessage = action.payload;
}
});
handleActionMessage(messageCompleted);
expect(capturedCompletedMessage).toContain("I believe that the task was **completed successfully**");
});
});
});
@@ -1,51 +0,0 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import { handleObservationMessage } from "#/services/observations";
import store from "#/store";
import { ObservationMessage } from "#/types/message";
// Mock dependencies
vi.mock("#/store", () => ({
default: {
dispatch: vi.fn(),
},
}));
describe("Observations Service", () => {
beforeEach(() => {
vi.clearAllMocks();
});
describe("handleObservationMessage", () => {
const createErrorMessage = (): ObservationMessage => ({
id: 14,
timestamp: "2025-04-14T13:37:54.451843",
message: "The action has not been executed.",
cause: 12,
observation: "error",
content: "The action has not been executed.",
extras: {
error_id: "",
metadata: {},
},
});
it("should dispatch error messages exactly once", () => {
const errorMessage = createErrorMessage();
handleObservationMessage(errorMessage);
expect(store.dispatch).toHaveBeenCalledTimes(1);
expect(store.dispatch).toHaveBeenCalledWith({
type: "chat/addAssistantObservation",
payload: expect.objectContaining({
observation: "error",
content: "The action has not been executed.",
source: "user",
extras: {
error_id: "",
},
}),
});
});
});
});
@@ -1,8 +1,4 @@
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { handleObservationMessage } from "#/services/observations";
import { setScreenshotSrc, setUrl } from "#/state/browser-slice";
import ObservationType from "#/types/observation-type";
import store from "#/store";
import { describe, it, vi, beforeEach, afterEach } from "vitest";
// Mock the store module
vi.mock("#/store", () => ({
@@ -20,43 +16,9 @@ describe("handleObservationMessage", () => {
vi.resetAllMocks();
});
it("updates browser state when receiving a browse observation", () => {
const message = {
id: "test-id",
cause: "test-cause",
observation: ObservationType.BROWSE,
content: "test content",
message: "test message",
extras: {
url: "https://example.com",
screenshot: "base64-screenshot-data",
},
};
it.todo("updates browser state when receiving a browse observation");
handleObservationMessage(message);
// Check that setScreenshotSrc and setUrl were called with the correct values
expect(store.dispatch).toHaveBeenCalledWith(setScreenshotSrc("base64-screenshot-data"));
expect(store.dispatch).toHaveBeenCalledWith(setUrl("https://example.com"));
});
it("updates browser state when receiving a browse_interactive observation", () => {
const message = {
id: "test-id",
cause: "test-cause",
observation: ObservationType.BROWSE_INTERACTIVE,
content: "test content",
message: "test message",
extras: {
url: "https://example.com",
screenshot: "base64-screenshot-data",
},
};
handleObservationMessage(message);
// Check that setScreenshotSrc and setUrl were called with the correct values
expect(store.dispatch).toHaveBeenCalledWith(setScreenshotSrc("base64-screenshot-data"));
expect(store.dispatch).toHaveBeenCalledWith(setUrl("https://example.com"));
});
it.todo(
"updates browser state when receiving a browse_interactive observation",
);
});
+276 -406
View File
File diff suppressed because it is too large Load Diff
+19 -19
View File
@@ -8,30 +8,30 @@
},
"dependencies": {
"@heroui/react": "2.7.8",
"@microlink/react-json-view": "^1.26.1",
"@microlink/react-json-view": "^1.26.2",
"@monaco-editor/react": "^4.7.0-rc.0",
"@react-router/node": "^7.5.3",
"@react-router/serve": "^7.5.3",
"@react-router/node": "^7.6.0",
"@react-router/serve": "^7.6.0",
"@react-types/shared": "^3.29.0",
"@reduxjs/toolkit": "^2.7.0",
"@reduxjs/toolkit": "^2.8.2",
"@stripe/react-stripe-js": "^3.7.0",
"@stripe/stripe-js": "^7.3.0",
"@tanstack/react-query": "^5.75.4",
"@tanstack/react-query": "^5.76.1",
"@vitejs/plugin-react": "^4.4.0",
"@xterm/addon-fit": "^0.10.0",
"@xterm/xterm": "^5.4.0",
"axios": "^1.9.0",
"clsx": "^2.1.1",
"eslint-config-airbnb-typescript": "^18.0.0",
"framer-motion": "^12.10.0",
"i18next": "^25.1.1",
"framer-motion": "^12.12.1",
"i18next": "^25.1.3",
"i18next-browser-languagedetector": "^8.1.0",
"i18next-http-backend": "^3.0.2",
"isbot": "^5.1.27",
"isbot": "^5.1.28",
"jose": "^6.0.11",
"lucide-react": "^0.507.0",
"lucide-react": "^0.511.0",
"monaco-editor": "^0.52.2",
"posthog-js": "^1.239.1",
"posthog-js": "^1.242.2",
"react": "^19.1.0",
"react-dom": "^19.1.0",
"react-highlight": "^0.15.0",
@@ -40,15 +40,15 @@
"react-icons": "^5.5.0",
"react-markdown": "^10.1.0",
"react-redux": "^9.2.0",
"react-router": "^7.5.3",
"react-router": "^7.6.0",
"react-syntax-highlighter": "^15.6.1",
"react-textarea-autosize": "^8.5.9",
"remark-gfm": "^4.0.1",
"sirv-cli": "^3.0.1",
"socket.io-client": "^4.8.1",
"tailwind-merge": "^3.2.0",
"tailwind-merge": "^3.3.0",
"vite": "^6.3.5",
"web-vitals": "^3.5.2",
"web-vitals": "^5.0.1",
"ws": "^8.18.2"
},
"scripts": {
@@ -83,16 +83,16 @@
"@babel/types": "^7.27.0",
"@mswjs/socket.io-binding": "^0.1.1",
"@playwright/test": "^1.52.0",
"@react-router/dev": "^7.5.3",
"@react-router/dev": "^7.6.0",
"@tailwindcss/typography": "^0.5.16",
"@tanstack/eslint-plugin-query": "^5.74.7",
"@testing-library/dom": "^10.4.0",
"@testing-library/jest-dom": "^6.6.1",
"@testing-library/react": "^16.3.0",
"@testing-library/user-event": "^14.6.1",
"@types/node": "^22.15.12",
"@types/react": "^19.1.3",
"@types/react-dom": "^19.1.3",
"@types/node": "^22.15.18",
"@types/react": "^19.1.4",
"@types/react-dom": "^19.1.5",
"@types/react-highlight": "^0.12.8",
"@types/react-syntax-highlighter": "^15.5.13",
"@types/ws": "^8.18.1",
@@ -104,7 +104,7 @@
"eslint": "^8.57.0",
"eslint-config-airbnb": "^19.0.4",
"eslint-config-airbnb-typescript": "^18.0.0",
"eslint-config-prettier": "^10.1.3",
"eslint-config-prettier": "^10.1.5",
"eslint-plugin-import": "^2.29.1",
"eslint-plugin-jsx-a11y": "^6.10.2",
"eslint-plugin-prettier": "^5.4.0",
@@ -113,7 +113,7 @@
"eslint-plugin-unused-imports": "^4.1.4",
"husky": "^9.1.7",
"jsdom": "^26.1.0",
"lint-staged": "^15.5.2",
"lint-staged": "^16.0.0",
"msw": "^2.6.6",
"postcss": "^8.5.2",
"prettier": "^3.5.3",
@@ -1,4 +1,4 @@
import { useDispatch, useSelector } from "react-redux";
import { useSelector } from "react-redux";
import React from "react";
import posthog from "posthog-js";
import { useParams } from "react-router";
@@ -8,7 +8,6 @@ import { convertImageToBase64 } from "#/utils/convert-image-to-base-64";
import { TrajectoryActions } from "../trajectory/trajectory-actions";
import { createChatMessage } from "#/services/chat-service";
import { InteractiveChatBox } from "./interactive-chat-box";
import { addUserMessage } from "#/state/chat-slice";
import { RootState } from "#/store";
import { AgentState } from "#/types/agent-state";
import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
@@ -25,6 +24,11 @@ import { LoadingSpinner } from "#/components/shared/loading-spinner";
import { useGetTrajectory } from "#/hooks/mutation/use-get-trajectory";
import { downloadTrajectory } from "#/utils/download-trajectory";
import { displayErrorToast } from "#/utils/custom-toast-handlers";
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
import i18n from "#/i18n";
import { ErrorMessageBanner } from "./error-message-banner";
import { shouldRenderEvent } from "./event-content-helpers/should-render-event";
function getEntryPoint(
hasRepository: boolean | null,
@@ -36,14 +40,15 @@ function getEntryPoint(
}
export function ChatInterface() {
const { send, isLoadingMessages } = useWsClient();
const dispatch = useDispatch();
const { getErrorMessage } = useWSErrorMessage();
const { send, isLoadingMessages, parsedEvents } = useWsClient();
const { setOptimisticUserMessage, getOptimisticUserMessage } =
useOptimisticUserMessage();
const { t } = useTranslation();
const scrollRef = React.useRef<HTMLDivElement>(null);
const { scrollDomToBottom, onChatBodyScroll, hitBottom } =
useScrollToBottom(scrollRef);
const { messages } = useSelector((state: RootState) => state.chat);
const { curAgentState } = useSelector((state: RootState) => state.agent);
const [feedbackPolarity, setFeedbackPolarity] = React.useState<
@@ -57,8 +62,13 @@ export function ChatInterface() {
const params = useParams();
const { mutate: getTrajectory } = useGetTrajectory();
const optimisticUserMessage = getOptimisticUserMessage();
const errorMessage = getErrorMessage();
const events = parsedEvents.filter(shouldRenderEvent);
const handleSendMessage = async (content: string, files: File[]) => {
if (messages.length === 0) {
if (events.length === 0) {
posthog.capture("initial_query_submitted", {
entry_point: getEntryPoint(
selectedRepository !== null,
@@ -69,7 +79,7 @@ export function ChatInterface() {
});
} else {
posthog.capture("user_message_sent", {
session_message_count: messages.length,
session_message_count: events.length,
current_message_length: content.length,
});
}
@@ -77,9 +87,8 @@ export function ChatInterface() {
const imageUrls = await Promise.all(promises);
const timestamp = new Date().toISOString();
const pending = true;
dispatch(addUserMessage({ content, imageUrls, timestamp, pending }));
send(createChatMessage(content, imageUrls, timestamp));
setOptimisticUserMessage(content);
setMessageToSend(null);
};
@@ -120,7 +129,7 @@ export function ChatInterface() {
return (
<div className="h-full flex flex-col justify-between">
{messages.length === 0 && (
{events.length === 0 && !optimisticUserMessage && (
<ChatSuggestions onSuggestionsClick={setMessageToSend} />
)}
@@ -137,7 +146,7 @@ export function ChatInterface() {
{!isLoadingMessages && (
<Messages
messages={messages}
messages={events}
isAwaitingUserConfirmation={
curAgentState === AgentState.AWAITING_USER_CONFIRMATION
}
@@ -170,6 +179,12 @@ export function ChatInterface() {
{!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
</div>
{errorMessage && (
<ErrorMessageBanner
message={i18n.exists(errorMessage) ? t(errorMessage) : errorMessage}
/>
)}
<InteractiveChatBox
onSubmit={handleSendMessage}
onStop={handleStop}
@@ -6,10 +6,11 @@ import { cn } from "#/utils/utils";
import { ul, ol } from "../markdown/list";
import { CopyToClipboardButton } from "#/components/shared/buttons/copy-to-clipboard-button";
import { anchor } from "../markdown/anchor";
import { OpenHandsSourceType } from "#/types/core/base";
import { paragraph } from "../markdown/paragraph";
interface ChatMessageProps {
type: "user" | "assistant";
type: OpenHandsSourceType;
message: string;
}
@@ -49,7 +50,7 @@ export function ChatMessage({
"rounded-xl relative",
"flex flex-col gap-2",
type === "user" && " max-w-[305px] p-4 bg-tertiary self-end",
type === "assistant" && "mt-6 max-w-full bg-transparent",
type === "agent" && "mt-6 max-w-full bg-transparent",
)}
>
<CopyToClipboardButton
@@ -0,0 +1,11 @@
interface ErrorMessageBannerProps {
message: string;
}
export function ErrorMessageBanner({ message }: ErrorMessageBannerProps) {
return (
<div className="w-full rounded-lg p-2 text-black border border-red-800 bg-red-500">
{message}
</div>
);
}
@@ -0,0 +1,56 @@
import React from "react";
import Markdown from "react-markdown";
import remarkGfm from "remark-gfm";
import { useTranslation } from "react-i18next";
import { code } from "../markdown/code";
import { ol, ul } from "../markdown/list";
import ArrowDown from "#/icons/angle-down-solid.svg?react";
import ArrowUp from "#/icons/angle-up-solid.svg?react";
import i18n from "#/i18n";
interface ErrorMessageProps {
errorId?: string;
defaultMessage: string;
}
export function ErrorMessage({ errorId, defaultMessage }: ErrorMessageProps) {
const { t } = useTranslation();
const [showDetails, setShowDetails] = React.useState(false);
const hasValidTranslationId = !!errorId && i18n.exists(errorId);
const errorKey = hasValidTranslationId
? errorId
: "CHAT_INTERFACE$AGENT_ERROR_MESSAGE";
return (
<div className="flex flex-col gap-2 border-l-2 pl-2 my-2 py-2 border-danger text-sm w-full">
<div className="font-bold text-danger">
{t(errorKey)}
<button
type="button"
onClick={() => setShowDetails((prev) => !prev)}
className="cursor-pointer text-left"
>
{showDetails ? (
<ArrowUp className="h-4 w-4 ml-2 inline fill-danger" />
) : (
<ArrowDown className="h-4 w-4 ml-2 inline fill-danger" />
)}
</button>
</div>
{showDetails && (
<Markdown
components={{
code,
ul,
ol,
}}
remarkPlugins={[remarkGfm]}
>
{defaultMessage}
</Markdown>
)}
</div>
);
}
@@ -0,0 +1,125 @@
import { ActionSecurityRisk } from "#/state/security-analyzer-slice";
import {
FileWriteAction,
CommandAction,
IPythonAction,
BrowseAction,
BrowseInteractiveAction,
MCPAction,
ThinkAction,
OpenHandsAction,
FinishAction,
} from "#/types/core/actions";
import { getDefaultEventContent, MAX_CONTENT_LENGTH } from "./shared";
const getRiskText = (risk: ActionSecurityRisk) => {
switch (risk) {
case ActionSecurityRisk.LOW:
return "Low Risk";
case ActionSecurityRisk.MEDIUM:
return "Medium Risk";
case ActionSecurityRisk.HIGH:
return "High Risk";
case ActionSecurityRisk.UNKNOWN:
default:
return "Unknown Risk";
}
};
const getWriteActionContent = (event: FileWriteAction): string => {
let { content } = event.args;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${event.args.content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
return `${event.args.path}\n${content}`;
};
const getRunActionContent = (event: CommandAction): string => {
let content = `Command:\n\`${event.args.command}\``;
if (event.args.confirmation_state === "awaiting_confirmation") {
content += `\n\n${getRiskText(event.args.security_risk)}`;
}
return content;
};
const getIPythonActionContent = (event: IPythonAction): string => {
let content = `\`\`\`\n${event.args.code}\n\`\`\``;
if (event.args.confirmation_state === "awaiting_confirmation") {
content += `\n\n${getRiskText(event.args.security_risk)}`;
}
return content;
};
const getBrowseActionContent = (event: BrowseAction): string =>
`Browsing ${event.args.url}`;
const getBrowseInteractiveActionContent = (event: BrowseInteractiveAction) =>
`**Action:**\n\n\`\`\`python\n${event.args.browser_actions}\n\`\`\``;
const getMcpActionContent = (event: MCPAction): string => {
// Format MCP action with name and arguments
const name = event.args.name || "";
const args = event.args.arguments || {};
let details = `**MCP Tool Call:** ${name}\n\n`;
// Include thought if available
if (event.args.thought) {
details += `\n\n**Thought:**\n${event.args.thought}`;
}
details += `\n\n**Arguments:**\n\`\`\`json\n${JSON.stringify(args, null, 2)}\n\`\`\``;
return details;
};
const getThinkActionContent = (event: ThinkAction): string =>
event.args.thought;
const getFinishActionContent = (event: FinishAction): string => {
let content = event.args.final_thought;
switch (event.args.task_completed) {
case "success":
content +=
"\n\n\nI believe that the task was **completed successfully**.";
break;
case "failure":
content += "\n\n\nI believe that the task was **not completed**.";
break;
case "partial":
default:
content += "\n\n\nI believe that the task was **completed partially**.";
break;
}
return content.trim();
};
const getNoContentActionContent = (): string => "";
export const getActionContent = (event: OpenHandsAction): string => {
switch (event.action) {
case "read":
case "edit":
return getNoContentActionContent();
case "write":
return getWriteActionContent(event);
case "run":
return getRunActionContent(event);
case "run_ipython":
return getIPythonActionContent(event);
case "browse":
return getBrowseActionContent(event);
case "browse_interactive":
return getBrowseInteractiveActionContent(event);
case "call_tool_mcp":
return getMcpActionContent(event);
case "think":
return getThinkActionContent(event);
case "finish":
return getFinishActionContent(event);
default:
return getDefaultEventContent(event);
}
};
@@ -0,0 +1,70 @@
import { Trans } from "react-i18next";
import { OpenHandsAction } from "#/types/core/actions";
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
import { OpenHandsObservation } from "#/types/core/observations";
import { MonoComponent } from "../mono-component";
import { PathComponent } from "../path-component";
import { getActionContent } from "./get-action-content";
import { getObservationContent } from "./get-observation-content";
const hasPathProperty = (
obj: Record<string, unknown>,
): obj is { path: string } => typeof obj.path === "string";
const hasCommandProperty = (
obj: Record<string, unknown>,
): obj is { command: string } => typeof obj.command === "string";
const trimText = (text: string, maxLength: number): string => {
if (!text) return "";
return text.length > maxLength ? `${text.substring(0, maxLength)}...` : text;
};
export const getEventContent = (
event: OpenHandsAction | OpenHandsObservation,
) => {
let title: React.ReactNode = "";
let details: string = "";
if (isOpenHandsAction(event)) {
title = (
<Trans
i18nKey={`ACTION_MESSAGE$${event.action.toUpperCase()}`}
values={{
path: hasPathProperty(event.args) && event.args.path,
command:
hasCommandProperty(event.args) && trimText(event.args.command, 80),
}}
components={{
path: <PathComponent />,
cmd: <MonoComponent />,
}}
/>
);
details = getActionContent(event);
}
if (isOpenHandsObservation(event)) {
title = (
<Trans
i18nKey={`OBSERVATION_MESSAGE$${event.observation.toUpperCase()}`}
values={{
path: hasPathProperty(event.extras) && event.extras.path,
command:
hasCommandProperty(event.extras) &&
trimText(event.extras.command, 80),
}}
components={{
path: <PathComponent />,
cmd: <MonoComponent />,
}}
/>
);
details = getObservationContent(event);
}
return {
title: title ?? "Unknown event",
details: details ?? "Unknown event",
};
};
@@ -0,0 +1,133 @@
import {
ReadObservation,
CommandObservation,
IPythonObservation,
EditObservation,
BrowseObservation,
OpenHandsObservation,
RecallObservation,
} from "#/types/core/observations";
import { getObservationResult } from "./get-observation-result";
import { getDefaultEventContent, MAX_CONTENT_LENGTH } from "./shared";
const getReadObservationContent = (event: ReadObservation): string =>
`\`\`\`\n${event.content}\n\`\`\``;
const getCommandObservationContent = (
event: CommandObservation | IPythonObservation,
): string => {
let { content } = event;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
return `Output:\n\`\`\`sh\n${content.trim() || "[Command finished execution with no output]"}\n\`\`\``;
};
const getEditObservationContent = (
event: EditObservation,
successMessage: boolean,
): string => {
if (successMessage) {
return `\`\`\`diff\n${event.extras.diff}\n\`\`\``; // Content is already truncated by the ACI
}
return event.content;
};
const getBrowseObservationContent = (event: BrowseObservation) => {
let contentDetails = `**URL:** ${event.extras.url}\n`;
if (event.extras.error) {
contentDetails += `\n\n**Error:**\n${event.extras.error}\n`;
}
contentDetails += `\n\n**Output:**\n${event.content}`;
if (contentDetails.length > MAX_CONTENT_LENGTH) {
contentDetails = `${contentDetails.slice(0, MAX_CONTENT_LENGTH)}...(truncated)`;
}
return contentDetails;
};
const getMcpObservationContent = (event: OpenHandsObservation): string => {
let { content } = event;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
return `**Output:**\n\`\`\`\n${content.trim() || "[MCP Tool finished execution with no output]"}\n\`\`\``;
};
const getRecallObservationContent = (event: RecallObservation): string => {
let content = "";
if (event.extras.recall_type === "workspace_context") {
if (event.extras.repo_name) {
content += `\n\n**Repository:** ${event.extras.repo_name}`;
}
if (event.extras.repo_directory) {
content += `\n\n**Directory:** ${event.extras.repo_directory}`;
}
if (event.extras.date) {
content += `\n\n**Date:** ${event.extras.date}`;
}
if (
event.extras.runtime_hosts &&
Object.keys(event.extras.runtime_hosts).length > 0
) {
content += `\n\n**Available Hosts**`;
for (const [host, port] of Object.entries(event.extras.runtime_hosts)) {
content += `\n\n- ${host} (port ${port})`;
}
}
if (event.extras.repo_instructions) {
content += `\n\n**Repository Instructions:**\n\n${event.extras.repo_instructions}`;
}
if (event.extras.additional_agent_instructions) {
content += `\n\n**Additional Instructions:**\n\n${event.extras.additional_agent_instructions}`;
}
}
// Handle microagent knowledge
if (
event.extras.microagent_knowledge &&
event.extras.microagent_knowledge.length > 0
) {
content += `\n\n**Triggered Microagent Knowledge:**`;
for (const knowledge of event.extras.microagent_knowledge) {
content += `\n\n- **${knowledge.name}** (triggered by keyword: ${knowledge.trigger})\n\n\`\`\`\n${knowledge.content}\n\`\`\``;
}
}
if (
event.extras.custom_secrets_descriptions &&
Object.keys(event.extras.custom_secrets_descriptions).length > 0
) {
content += `\n\n**Custom Secrets**`;
for (const [name, description] of Object.entries(
event.extras.custom_secrets_descriptions,
)) {
content += `\n\n- $${name}: ${description}`;
}
}
return content;
};
export const getObservationContent = (event: OpenHandsObservation): string => {
switch (event.observation) {
case "read":
return getReadObservationContent(event);
case "edit":
return getEditObservationContent(
event,
getObservationResult(event) === "success",
);
case "run_ipython":
case "run":
return getCommandObservationContent(event);
case "browse":
return getBrowseObservationContent(event);
case "mcp":
return getMcpObservationContent(event);
case "recall":
return getRecallObservationContent(event);
default:
return getDefaultEventContent(event);
}
};
@@ -0,0 +1,26 @@
import { OpenHandsObservation } from "#/types/core/observations";
export type ObservationResultStatus = "success" | "error" | "timeout";
export const getObservationResult = (event: OpenHandsObservation) => {
const hasContent = event.content.length > 0;
const contentIncludesError = event.content.toLowerCase().includes("error:");
switch (event.observation) {
case "run": {
const exitCode = event.extras.metadata.exit_code;
if (exitCode === -1) return "timeout"; // Command timed out
if (exitCode === 0) return "success"; // Command executed successfully
return "error"; // Command failed
}
case "run_ipython":
case "read":
case "edit":
case "mcp":
if (!hasContent || contentIncludesError) return "error";
return "success"; // Content is valid
default:
return "success";
}
};
@@ -0,0 +1,8 @@
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsObservation } from "#/types/core/observations";
export const MAX_CONTENT_LENGTH = 1000;
export const getDefaultEventContent = (
event: OpenHandsAction | OpenHandsObservation,
): string => `\`\`\`json\n${JSON.stringify(event, null, 2)}\n\`\`\``;
@@ -0,0 +1,27 @@
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsEventType } from "#/types/core/base";
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
import { OpenHandsObservation } from "#/types/core/observations";
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
"system",
"agent_state_changed",
"change_agent_state",
];
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
export const shouldRenderEvent = (
event: OpenHandsAction | OpenHandsObservation,
) => {
if (isOpenHandsAction(event)) {
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
return !noRenderList.includes(event.action);
}
if (isOpenHandsObservation(event)) {
return !COMMON_NO_RENDER_LIST.includes(event.observation);
}
return true;
};
@@ -0,0 +1,123 @@
import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
import { I18nKey } from "#/i18n/declaration";
import { OpenHandsAction } from "#/types/core/actions";
import {
isUserMessage,
isErrorObservation,
isAssistantMessage,
isOpenHandsAction,
isOpenHandsObservation,
isFinishAction,
isRejectObservation,
} from "#/types/core/guards";
import { OpenHandsObservation } from "#/types/core/observations";
import { ImageCarousel } from "../images/image-carousel";
import { ChatMessage } from "./chat-message";
import { ErrorMessage } from "./error-message";
import { getObservationResult } from "./event-content-helpers/get-observation-result";
import { getEventContent } from "./event-content-helpers/get-event-content";
import { ExpandableMessage } from "./expandable-message";
import { GenericEventMessage } from "./generic-event-message";
const hasThoughtProperty = (
obj: Record<string, unknown>,
): obj is { thought: string } => "thought" in obj && !!obj.thought;
interface EventMessageProps {
event: OpenHandsAction | OpenHandsObservation;
hasObservationPair: boolean;
isFirstMessageWithResolverTrigger: boolean;
isAwaitingUserConfirmation: boolean;
isLastMessage: boolean;
}
export function EventMessage({
event,
hasObservationPair,
isFirstMessageWithResolverTrigger,
isAwaitingUserConfirmation,
isLastMessage,
}: EventMessageProps) {
const shouldShowConfirmationButtons =
isLastMessage && event.source === "agent" && isAwaitingUserConfirmation;
const isFirstUserMessageWithResolverTrigger =
isFirstMessageWithResolverTrigger && isUserMessage(event);
// Special case: First user message with resolver trigger
if (isFirstUserMessageWithResolverTrigger) {
return (
<div>
<ExpandableMessage
type="action"
message={event.args.content}
id={I18nKey.CHAT$RESOLVER_INSTRUCTIONS}
/>
{event.args.image_urls && event.args.image_urls.length > 0 && (
<ImageCarousel size="small" images={event.args.image_urls} />
)}
</div>
);
}
if (isErrorObservation(event)) {
return (
<ErrorMessage
errorId={event.extras.error_id}
defaultMessage={event.message}
/>
);
}
if (
hasObservationPair &&
isOpenHandsAction(event) &&
hasThoughtProperty(event.args)
) {
return <ChatMessage type="agent" message={event.args.thought} />;
}
if (isFinishAction(event)) {
return (
<ChatMessage type="agent" message={getEventContent(event).details} />
);
}
if (isUserMessage(event) || isAssistantMessage(event)) {
return (
<ChatMessage
type={event.source}
message={isUserMessage(event) ? event.args.content : event.message}
>
{event.args.image_urls && event.args.image_urls.length > 0 && (
<ImageCarousel size="small" images={event.args.image_urls} />
)}
{shouldShowConfirmationButtons && <ConfirmationButtons />}
</ChatMessage>
);
}
if (isRejectObservation(event)) {
return <ChatMessage type="agent" message={event.content} />;
}
return (
<div>
{isOpenHandsAction(event) && hasThoughtProperty(event.args) && (
<ChatMessage type="agent" message={event.args.thought} />
)}
<GenericEventMessage
title={getEventContent(event).title}
details={getEventContent(event).details}
success={
isOpenHandsObservation(event)
? getObservationResult(event)
: undefined
}
/>
{shouldShowConfirmationButtons && <ConfirmationButtons />}
</div>
);
}
@@ -0,0 +1,61 @@
import React from "react";
import Markdown from "react-markdown";
import remarkGfm from "remark-gfm";
import { code } from "../markdown/code";
import { ol, ul } from "../markdown/list";
import ArrowDown from "#/icons/angle-down-solid.svg?react";
import ArrowUp from "#/icons/angle-up-solid.svg?react";
import { SuccessIndicator } from "./success-indicator";
import { ObservationResultStatus } from "./event-content-helpers/get-observation-result";
interface GenericEventMessageProps {
title: React.ReactNode;
details: string;
success?: ObservationResultStatus;
}
export function GenericEventMessage({
title,
details,
success,
}: GenericEventMessageProps) {
const [showDetails, setShowDetails] = React.useState(false);
return (
<div className="flex flex-col gap-2 border-l-2 pl-2 my-2 py-2 border-neutral-300 text-sm w-full">
<div className="flex items-center justify-between font-bold text-neutral-300">
<div>
{title}
{details && (
<button
type="button"
onClick={() => setShowDetails((prev) => !prev)}
className="cursor-pointer text-left"
>
{showDetails ? (
<ArrowUp className="h-4 w-4 ml-2 inline fill-neutral-300" />
) : (
<ArrowDown className="h-4 w-4 ml-2 inline fill-neutral-300" />
)}
</button>
)}
</div>
{success && <SuccessIndicator status={success} />}
</div>
{showDetails && (
<Markdown
components={{
code,
ul,
ol,
}}
remarkPlugins={[remarkGfm]}
>
{details}
</Markdown>
)}
</div>
);
}
@@ -1,80 +1,82 @@
import React from "react";
import type { Message } from "#/message";
import { ChatMessage } from "#/components/features/chat/chat-message";
import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
import { ImageCarousel } from "../images/image-carousel";
import { ExpandableMessage } from "./expandable-message";
import { useUserConversation } from "#/hooks/query/use-user-conversation";
import { useConversation } from "#/context/conversation-context";
import { I18nKey } from "#/i18n/declaration";
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsObservation } from "#/types/core/observations";
import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
import { OpenHandsEventType } from "#/types/core/base";
import { EventMessage } from "./event-message";
import { ChatMessage } from "./chat-message";
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
"system",
"agent_state_changed",
"change_agent_state",
];
const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
if (isOpenHandsAction(event)) {
const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
return !noRenderList.includes(event.action);
}
if (isOpenHandsObservation(event)) {
return !COMMON_NO_RENDER_LIST.includes(event.observation);
}
return true;
};
interface MessagesProps {
messages: Message[];
messages: (OpenHandsAction | OpenHandsObservation)[];
isAwaitingUserConfirmation: boolean;
}
export const Messages: React.FC<MessagesProps> = React.memo(
({ messages, isAwaitingUserConfirmation }) => {
const { getOptimisticUserMessage } = useOptimisticUserMessage();
const { conversationId } = useConversation();
const { data: conversation } = useUserConversation(conversationId || null);
const optimisticUserMessage = getOptimisticUserMessage();
// Check if conversation metadata has trigger=resolver
const isResolverTrigger = conversation?.trigger === "resolver";
return messages.map((message, index) => {
const shouldShowConfirmationButtons =
messages.length - 1 === index &&
message.sender === "assistant" &&
isAwaitingUserConfirmation;
const actionHasObservationPair = React.useCallback(
(event: OpenHandsAction | OpenHandsObservation): boolean => {
if (isOpenHandsAction(event)) {
return !!messages.some(
(msg) => isOpenHandsObservation(msg) && msg.cause === event.id,
);
}
const isFirstUserMessageWithResolverTrigger =
index === 0 && message.sender === "user" && isResolverTrigger;
return false;
},
[messages],
);
// Special case: First user message with resolver trigger
if (isFirstUserMessageWithResolverTrigger) {
return (
<div key={index}>
<ExpandableMessage
type="action"
message={message.content}
id={I18nKey.CHAT$RESOLVER_INSTRUCTIONS}
/>
{message.imageUrls && message.imageUrls.length > 0 && (
<ImageCarousel size="small" images={message.imageUrls} />
)}
</div>
);
}
return (
<>
{messages.filter(shouldRenderEvent).map((message, index) => (
<EventMessage
key={index}
event={message}
hasObservationPair={actionHasObservationPair(message)}
isFirstMessageWithResolverTrigger={index === 0 && isResolverTrigger}
isAwaitingUserConfirmation={isAwaitingUserConfirmation}
isLastMessage={messages.length - 1 === index}
/>
))}
if (message.type === "error" || message.type === "action") {
return (
<div key={index}>
<ExpandableMessage
type={message.type}
id={message.translationID}
message={message.content}
success={message.success}
observation={message.observation}
action={message.action}
/>
{shouldShowConfirmationButtons && <ConfirmationButtons />}
</div>
);
}
return (
<ChatMessage
key={index}
type={message.sender}
message={message.content}
>
{message.imageUrls && message.imageUrls.length > 0 && (
<ImageCarousel size="small" images={message.imageUrls} />
)}
{shouldShowConfirmationButtons && <ConfirmationButtons />}
</ChatMessage>
);
});
{optimisticUserMessage && (
<ChatMessage type="user" message={optimisticUserMessage} />
)}
</>
);
},
);
@@ -0,0 +1,35 @@
import { FaClock } from "react-icons/fa";
import CheckCircle from "#/icons/check-circle-solid.svg?react";
import XCircle from "#/icons/x-circle-solid.svg?react";
import { ObservationResultStatus } from "./event-content-helpers/get-observation-result";
interface SuccessIndicatorProps {
status: ObservationResultStatus;
}
export function SuccessIndicator({ status }: SuccessIndicatorProps) {
return (
<span className="flex-shrink-0">
{status === "success" && (
<CheckCircle
data-testid="status-icon"
className="h-4 w-4 ml-2 inline fill-success"
/>
)}
{status === "error" && (
<XCircle
data-testid="status-icon"
className="h-4 w-4 ml-2 inline fill-danger"
/>
)}
{status === "timeout" && (
<FaClock
data-testid="status-icon"
className="h-4 w-4 ml-2 inline fill-yellow-500"
/>
)}
</span>
);
}
@@ -9,7 +9,6 @@ import { AgentState } from "#/types/agent-state";
import { useWsClient } from "#/context/ws-client-provider";
import { IGNORE_TASK_STATE_MAP } from "#/ignore-task-state-map.constant";
import { ActionButton } from "#/components/shared/buttons/action-button";
import { AgentModeToggle } from "./agent-mode-toggle";
export function AgentControlBar() {
const { t } = useTranslation();
@@ -24,29 +23,25 @@ export function AgentControlBar() {
return (
<div className="flex justify-between items-center gap-20">
<div className="flex items-center gap-4">
<ActionButton
isDisabled={
curAgentState !== AgentState.RUNNING &&
curAgentState !== AgentState.PAUSED
}
content={
curAgentState === AgentState.PAUSED
? t(I18nKey.AGENT$RESUME_TASK)
: t(I18nKey.AGENT$PAUSE_TASK)
}
action={
curAgentState === AgentState.PAUSED
? AgentState.RUNNING
: AgentState.PAUSED
}
handleAction={handleAction}
>
{curAgentState === AgentState.PAUSED ? <PlayIcon /> : <PauseIcon />}
</ActionButton>
<AgentModeToggle />
</div>
<ActionButton
isDisabled={
curAgentState !== AgentState.RUNNING &&
curAgentState !== AgentState.PAUSED
}
content={
curAgentState === AgentState.PAUSED
? t(I18nKey.AGENT$RESUME_TASK)
: t(I18nKey.AGENT$PAUSE_TASK)
}
action={
curAgentState === AgentState.PAUSED
? AgentState.RUNNING
: AgentState.PAUSED
}
handleAction={handleAction}
>
{curAgentState === AgentState.PAUSED ? <PlayIcon /> : <PauseIcon />}
</ActionButton>
</div>
);
}
@@ -1,72 +0,0 @@
import { useSelector } from "react-redux";
import { useTranslation } from "react-i18next";
import { Switch } from "@heroui/react";
import { useWsClient } from "#/context/ws-client-provider";
import { RootState } from "#/store";
import { cn } from "#/utils/utils";
import {
generateDelegateToReadOnlyAction,
generateFinishDelegationAction,
} from "#/services/agent-mode-service";
import { AgentState } from "#/types/agent-state";
import { I18nKey } from "#/i18n/declaration";
export function AgentModeToggle() {
const { t } = useTranslation();
const { send } = useWsClient();
// Get agent type and state from Redux
const { currentAgentType, curAgentState } = useSelector(
(state: RootState) => state.agent,
);
// Compute if we're in read-only mode
const isReadOnly = currentAgentType === "ReadOnlyAgent";
// Check if toggle is disabled (should be disabled during certain agent states)
const isDisabled = [
AgentState.LOADING,
AgentState.INIT,
AgentState.ERROR,
AgentState.RATE_LIMITED,
].includes(curAgentState);
const handleToggle = () => {
if (isReadOnly) {
// Currently in read-only mode, switch back to execute mode
send(generateFinishDelegationAction());
} else {
// Currently in execute mode, switch to read-only mode
send(generateDelegateToReadOnlyAction());
}
};
return (
<div className="flex items-center gap-2">
<Switch
isDisabled={isDisabled}
name="agent-mode"
isSelected={isReadOnly}
onValueChange={handleToggle}
classNames={{
thumb: cn("bg-white w-3 h-3"),
wrapper: cn(
"border border-[#D4D4D4] bg-white px-[6px] w-12 h-6",
"group-data-[selected=true]:border-transparent",
isReadOnly
? "group-data-[selected=true]:bg-amber-600"
: "group-data-[selected=true]:bg-blue-600",
),
label: "text-[#A3A3A3] text-xs",
}}
>
<span className="sr-only">{t(I18nKey.AGENT$MODE_TOGGLE_LABEL)}</span>
<span className="text-sm font-medium ml-2">
{isReadOnly
? t(I18nKey.AGENT$MODE_READ_ONLY)
: t(I18nKey.AGENT$MODE_EXECUTE)}
</span>
</Switch>
</div>
);
}
@@ -24,9 +24,7 @@ const notificationStates = [
export function AgentStatusBar() {
const { t, i18n } = useTranslation();
const { curAgentState, currentAgentType } = useSelector(
(state: RootState) => state.agent,
);
const { curAgentState } = useSelector((state: RootState) => state.agent);
const { curStatusMessage } = useSelector((state: RootState) => state.status);
const { status } = useWsClient();
const { notify } = useNotification();
@@ -101,10 +99,6 @@ export function AgentStatusBar() {
}
}, [curAgentState, status, notify, t]);
// Determine agent mode badge color
const agentModeBadgeColor =
currentAgentType === "ReadOnlyAgent" ? "bg-amber-600" : "bg-blue-600";
return (
<div className="flex flex-col items-center">
<div className="flex items-center bg-base-secondary px-2 py-1 text-gray-400 rounded-[100px] text-sm gap-[6px]">
@@ -112,15 +106,6 @@ export function AgentStatusBar() {
className={`w-2 h-2 rounded-full animate-pulse ${indicatorColor}`}
/>
<span className="text-sm text-stone-400">{t(statusMessage)}</span>
{/* Agent Mode Badge */}
<div
className={`ml-2 px-2 py-0.5 rounded-full text-xs text-white ${agentModeBadgeColor}`}
>
{currentAgentType === "ReadOnlyAgent"
? t(I18nKey.AGENT$MODE_READ_ONLY)
: t(I18nKey.AGENT$MODE_EXECUTE)}
</div>
</div>
</div>
);
@@ -15,8 +15,9 @@ import { cn } from "#/utils/utils";
import { BaseModal } from "../../shared/modals/base-modal/base-modal";
import { RootState } from "#/store";
import { I18nKey } from "#/i18n/declaration";
import { selectSystemMessage } from "#/state/chat-slice";
import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
import { useWsClient } from "#/context/ws-client-provider";
import { isSystemMessage } from "#/types/core/guards";
interface ConversationCardProps {
onClick?: () => void;
@@ -52,15 +53,17 @@ export function ConversationCard({
conversationId,
}: ConversationCardProps) {
const { t } = useTranslation();
const { parsedEvents } = useWsClient();
const [contextMenuVisible, setContextMenuVisible] = React.useState(false);
const [titleMode, setTitleMode] = React.useState<"view" | "edit">("view");
const [metricsModalVisible, setMetricsModalVisible] = React.useState(false);
const [systemModalVisible, setSystemModalVisible] = React.useState(false);
const inputRef = React.useRef<HTMLInputElement>(null);
const systemMessage = parsedEvents.find(isSystemMessage);
// Subscribe to metrics data from Redux store
const metrics = useSelector((state: RootState) => state.metrics);
const systemMessage = useSelector(selectSystemMessage);
const handleBlur = () => {
if (inputRef.current?.value) {
@@ -365,7 +368,7 @@ export function ConversationCard({
<SystemMessageModal
isOpen={systemModalVisible}
onClose={() => setSystemModalVisible(false)}
systemMessage={systemMessage}
systemMessage={systemMessage ? systemMessage.args : null}
/>
</>
);
@@ -6,6 +6,7 @@ import { cn } from "#/utils/utils";
import { useUserRepositories } from "#/hooks/query/use-user-repositories";
import { TaskIssueNumber } from "./task-issue-number";
import { Provider } from "#/types/settings";
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
const getTaskTypeMap = (
t: (key: string) => string,
@@ -21,6 +22,7 @@ interface TaskCardProps {
}
export function TaskCard({ task }: TaskCardProps) {
const { setOptimisticUserMessage } = useOptimisticUserMessage();
const { data: repositories } = useUserRepositories();
const { mutate: createConversation, isPending } = useCreateConversation();
const isCreatingConversation = useIsCreatingConversation();
@@ -38,6 +40,7 @@ export function TaskCard({ task }: TaskCardProps) {
const handleLaunchConversation = () => {
const repo = getRepo(task.repo, task.git_provider);
setOptimisticUserMessage("Addressing task...");
return createConversation({
selectedRepository: repo,
@@ -24,6 +24,10 @@ export function JupyterCellOutput({ lines }: JupyterCellOutputProps) {
{/* display the lines as plaintext or image */}
{lines.map((line, index) => {
if (line.type === "image") {
// Use markdown to display the image
const imageMarkdown = line.url
? `![image](${line.url})`
: line.content;
return (
<div key={index}>
<Markdown
@@ -32,7 +36,7 @@ export function JupyterCellOutput({ lines }: JupyterCellOutputProps) {
}}
urlTransform={(value: string) => value}
>
{line.content}
{imageMarkdown}
</Markdown>
</div>
);
@@ -12,8 +12,8 @@ export function JupyterCell({ cell }: JupyterCellProps) {
const [lines, setLines] = React.useState<JupyterLine[]>([]);
React.useEffect(() => {
setLines(parseCellContent(cell.content));
}, [cell.content]);
setLines(parseCellContent(cell.content, cell.imageUrls));
}, [cell.content, cell.imageUrls]);
if (cell.type === "input") {
return <JupytrerCellInput code={cell.content} />;
+57 -11
View File
@@ -3,7 +3,7 @@ import { io, Socket } from "socket.io-client";
import { useQueryClient } from "@tanstack/react-query";
import EventLogger from "#/utils/event-logger";
import { handleAssistantMessage } from "#/services/actions";
import { showChatError } from "#/utils/error-handler";
import { showChatError, trackError } from "#/utils/error-handler";
import { useRate } from "#/hooks/use-rate";
import { OpenHandsParsedEvent } from "#/types/core";
import {
@@ -11,10 +11,26 @@ import {
CommandAction,
FileEditAction,
FileWriteAction,
OpenHandsAction,
UserMessageAction,
} from "#/types/core/actions";
import { Conversation } from "#/api/open-hands.types";
import { useUserProviders } from "#/hooks/use-user-providers";
import { OpenHandsObservation } from "#/types/core/observations";
import {
isErrorObservation,
isOpenHandsAction,
isOpenHandsObservation,
isUserMessage,
} from "#/types/core/guards";
import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
const hasValidMessageProperty = (obj: unknown): obj is { message: string } =>
typeof obj === "object" &&
obj !== null &&
"message" in obj &&
typeof obj.message === "string";
const isOpenHandsEvent = (event: unknown): event is OpenHandsParsedEvent =>
typeof event === "object" &&
@@ -35,14 +51,6 @@ const isFileEditAction = (
const isCommandAction = (event: OpenHandsParsedEvent): event is CommandAction =>
"action" in event && event.action === "run";
const isUserMessage = (
event: OpenHandsParsedEvent,
): event is UserMessageAction =>
"source" in event &&
"type" in event &&
event.source === "user" &&
event.type === "message";
const isAssistantMessage = (
event: OpenHandsParsedEvent,
): event is AssistantMessageAction =>
@@ -65,6 +73,7 @@ interface UseWsClient {
status: WsClientProviderStatus;
isLoadingMessages: boolean;
events: Record<string, unknown>[];
parsedEvents: (OpenHandsAction | OpenHandsObservation)[];
send: (event: Record<string, unknown>) => void;
}
@@ -72,6 +81,7 @@ const WsClientContext = React.createContext<UseWsClient>({
status: WsClientProviderStatus.DISCONNECTED,
isLoadingMessages: true,
events: [],
parsedEvents: [],
send: () => {
throw new Error("not connected");
},
@@ -121,12 +131,17 @@ export function WsClientProvider({
conversationId,
children,
}: React.PropsWithChildren<WsClientProviderProps>) {
const { removeOptimisticUserMessage } = useOptimisticUserMessage();
const { setErrorMessage, removeErrorMessage } = useWSErrorMessage();
const queryClient = useQueryClient();
const sioRef = React.useRef<Socket | null>(null);
const [status, setStatus] = React.useState(
WsClientProviderStatus.DISCONNECTED,
);
const [events, setEvents] = React.useState<Record<string, unknown>[]>([]);
const [parsedEvents, setParsedEvents] = React.useState<
(OpenHandsAction | OpenHandsObservation)[]
>([]);
const lastEventRef = React.useRef<Record<string, unknown> | null>(null);
const { providers } = useUserProviders();
@@ -146,6 +161,24 @@ export function WsClientProvider({
function handleMessage(event: Record<string, unknown>) {
if (isOpenHandsEvent(event)) {
if (isOpenHandsAction(event) || isOpenHandsObservation(event)) {
setParsedEvents((prevEvents) => [...prevEvents, event]);
}
if (isErrorObservation(event)) {
trackError({
message: event.message,
source: "chat",
metadata: { msgId: event.id },
});
} else {
removeErrorMessage();
}
if (isUserMessage(event)) {
removeOptimisticUserMessage();
}
if (isMessageAction(event)) {
messageRateHandler.record(new Date().getTime());
}
@@ -156,7 +189,7 @@ export function WsClientProvider({
isFileWriteAction(event) ||
isCommandAction(event)
) {
queryClient.invalidateQueries({
queryClient.removeQueries({
queryKey: ["file_changes", conversationId],
});
@@ -202,11 +235,23 @@ export function WsClientProvider({
sio.io.opts.query = sio.io.opts.query || {};
sio.io.opts.query.latest_event_id = lastEventRef.current?.id;
updateStatusWhenErrorMessagePresent(data);
setErrorMessage(
hasValidMessageProperty(data)
? data.message
: "The WebSocket connection was closed.",
);
}
function handleError(data: unknown) {
setStatus(WsClientProviderStatus.DISCONNECTED);
updateStatusWhenErrorMessagePresent(data);
setErrorMessage(
hasValidMessageProperty(data)
? data.message
: "An unknown error occurred on the WebSocket connection.",
);
}
React.useEffect(() => {
@@ -267,9 +312,10 @@ export function WsClientProvider({
status,
isLoadingMessages: messageRateHandler.isUnderThreshold,
events,
parsedEvents,
send,
}),
[status, messageRateHandler.isUnderThreshold, events],
[status, messageRateHandler.isUnderThreshold, events, parsedEvents],
);
return <WsClientContext value={value}>{children}</WsClientContext>;
@@ -1,47 +0,0 @@
import { useEffect } from "react";
import { useDispatch } from "react-redux";
import { setAgentType, setDelegationState } from "#/state/agent-slice";
import ActionType from "#/types/action-type";
/**
* Hook to handle agent mode changes based on WebSocket events
*/
export function useAgentModeHandler(events: Record<string, unknown>[]) {
const dispatch = useDispatch();
useEffect(() => {
// Process only the latest event
if (events.length === 0) return;
const latestEvent = events[events.length - 1];
// Handle agent delegation events
if (
"action" in latestEvent &&
latestEvent.action === ActionType.DELEGATE &&
"args" in latestEvent &&
typeof latestEvent.args === "object" &&
latestEvent.args !== null &&
"agent" in latestEvent.args
) {
// A delegation is starting
dispatch(setDelegationState(true));
dispatch(setAgentType(latestEvent.args.agent as string));
}
// Handle agent delegate observation (delegation ended)
else if (
"observation" in latestEvent &&
latestEvent.observation === "delegate" &&
"data" in latestEvent &&
typeof latestEvent.data === "object" &&
latestEvent.data !== null &&
"status" in latestEvent.data &&
latestEvent.data.status === "finished"
) {
// Delegation has ended, returning to parent agent
dispatch(setDelegationState(false));
dispatch(setAgentType("CodeActAgent")); // Reset to default agent
}
}, [events, dispatch]);
}
+2 -65
View File
@@ -1,19 +1,8 @@
import React from "react";
import { useDispatch } from "react-redux";
import { useTranslation } from "react-i18next";
import { useWsClient } from "#/context/ws-client-provider";
import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
import { addErrorMessage } from "#/state/chat-slice";
import { AgentState } from "#/types/agent-state";
import { ErrorObservation } from "#/types/core/observations";
import { useEndSession } from "./use-end-session";
import {
displayErrorToast,
displaySuccessToast,
} from "#/utils/custom-toast-handlers";
import { setAgentType, setDelegationState } from "#/state/agent-slice";
import ActionType from "#/types/action-type";
import { I18nKey } from "#/i18n/declaration";
import { displayErrorToast } from "#/utils/custom-toast-handlers";
interface ServerError {
error: boolean | string;
@@ -23,13 +12,8 @@ interface ServerError {
const isServerError = (data: object): data is ServerError => "error" in data;
const isErrorObservation = (data: object): data is ErrorObservation =>
"observation" in data && data.observation === "error";
export const useHandleWSEvents = () => {
const { events, send } = useWsClient();
const dispatch = useDispatch();
const { t } = useTranslation();
React.useEffect(() => {
if (!events.length) {
@@ -58,52 +42,5 @@ export const useHandleWSEvents = () => {
send(generateAgentStateChangeEvent(AgentState.PAUSED));
}
}
if (isErrorObservation(event)) {
dispatch(
addErrorMessage({
id: event.extras?.error_id,
message: event.message,
}),
);
}
// Handle agent mode changes
// Handle agent delegation events
if (
"action" in event &&
event.action === ActionType.DELEGATE &&
"args" in event &&
typeof event.args === "object" &&
event.args !== null &&
"agent" in event.args
) {
// A delegation is starting
const agentType = event.args.agent as string;
dispatch(setDelegationState(true));
dispatch(setAgentType(agentType));
// Show notification
if (agentType === "ReadOnlyAgent") {
displaySuccessToast(t(I18nKey.AGENT$MODE_READ_ONLY));
}
}
// Handle agent delegate observation (delegation ended)
else if (
"observation" in event &&
event.observation === "delegate" &&
"data" in event &&
typeof event.data === "object" &&
event.data !== null &&
"status" in event.data &&
event.data.status === "finished"
) {
// Delegation has ended, returning to parent agent
dispatch(setDelegationState(false));
dispatch(setAgentType("CodeActAgent")); // Reset to default agent
// Show notification
displaySuccessToast(t(I18nKey.AGENT$MODE_EXECUTE));
}
}, [events.length, dispatch, send, t]);
}, [events.length]);
};
@@ -0,0 +1,23 @@
import { useQueryClient } from "@tanstack/react-query";
export const useOptimisticUserMessage = () => {
const queryKey = ["optimistic_user_message"] as const;
const queryClient = useQueryClient();
const setOptimisticUserMessage = (message: string) => {
queryClient.setQueryData<string>(queryKey, message);
};
const getOptimisticUserMessage = () =>
queryClient.getQueryData<string>(queryKey);
const removeOptimisticUserMessage = () => {
queryClient.removeQueries({ queryKey });
};
return {
setOptimisticUserMessage,
getOptimisticUserMessage,
removeOptimisticUserMessage,
};
};
@@ -0,0 +1,22 @@
import { useQueryClient } from "@tanstack/react-query";
export const useWSErrorMessage = () => {
const queryClient = useQueryClient();
const setErrorMessage = (message: string) => {
queryClient.setQueryData<string>(["error_message"], message);
};
const getErrorMessage = () =>
queryClient.getQueryData<string>(["error_message"]);
const removeErrorMessage = () => {
queryClient.removeQueries({ queryKey: ["error_message"] });
};
return {
setErrorMessage,
getErrorMessage,
removeErrorMessage,
};
};
-3
View File
@@ -1,8 +1,5 @@
// this file generate by script, don't modify it manually!!!
export enum I18nKey {
AGENT$MODE_READ_ONLY = "AGENT$MODE_READ_ONLY",
AGENT$MODE_EXECUTE = "AGENT$MODE_EXECUTE",
AGENT$MODE_TOGGLE_LABEL = "AGENT$MODE_TOGGLE_LABEL",
SECRETS$SECRET_VALUE_REQUIRED = "SECRETS$SECRET_VALUE_REQUIRED",
SECRETS$ADD_SECRET = "SECRETS$ADD_SECRET",
SECRETS$EDIT_SECRET = "SECRETS$EDIT_SECRET",
+112 -157
View File
@@ -1,49 +1,4 @@
{
"AGENT$MODE_READ_ONLY": {
"en": "Read-Only Mode",
"ja": "読み取り専用モード",
"zh-CN": "只读模式",
"zh-TW": "唯讀模式",
"ko-KR": "읽기 전용 모드",
"no": "Skrivebeskyttet modus",
"it": "Modalità di sola lettura",
"pt": "Modo somente leitura",
"es": "Modo de solo lectura",
"ar": "وضع القراءة فقط",
"fr": "Mode lecture seule",
"tr": "Salt okunur mod",
"de": "Nur-Lese-Modus"
},
"AGENT$MODE_EXECUTE": {
"en": "Execute Mode",
"ja": "実行モード",
"zh-CN": "执行模式",
"zh-TW": "執行模式",
"ko-KR": "실행 모드",
"no": "Utførelsesmodus",
"it": "Modalità di esecuzione",
"pt": "Modo de execução",
"es": "Modo de ejecución",
"ar": "وضع التنفيذ",
"fr": "Mode d'exécution",
"tr": "Yürütme modu",
"de": "Ausführungsmodus"
},
"AGENT$MODE_TOGGLE_LABEL": {
"en": "Toggle agent mode",
"ja": "エージェントモードを切り替える",
"zh-CN": "切换代理模式",
"zh-TW": "切換代理模式",
"ko-KR": "에이전트 모드 전환",
"no": "Bytt agentmodus",
"it": "Cambia modalità agente",
"pt": "Alternar modo do agente",
"es": "Cambiar modo del agente",
"ar": "تبديل وضع الوكيل",
"fr": "Basculer le mode de l'agent",
"tr": "Ajan modunu değiştir",
"de": "Agentenmodus umschalten"
},
"SECRETS$SECRET_VALUE_REQUIRED": {
"en": "Secret value is required",
"ja": "シークレット値は必須です",
@@ -6429,20 +6384,20 @@
"uk": "Завантажити файл"
},
"ACTION_MESSAGE$RUN": {
"en": "Running <cmd>{{action.payload.args.command}}</cmd>",
"zh-CN": "运行 <cmd>{{action.payload.args.command}}</cmd>",
"zh-TW": "執行 <cmd>{{action.payload.args.command}}</cmd>",
"ko-KR": "실행 <cmd>{{action.payload.args.command}}</cmd>",
"ja": "実行 <cmd>{{action.payload.args.command}}</cmd>",
"no": "Kjører <cmd>{{action.payload.args.command}}</cmd>",
"ar": "تشغيل <cmd>{{action.payload.args.command}}</cmd>",
"de": "Führt <cmd>{{action.payload.args.command}}</cmd> aus",
"fr": "Exécution de <cmd>{{action.payload.args.command}}</cmd>",
"it": "Esecuzione di <cmd>{{action.payload.args.command}}</cmd>",
"pt": "Executando <cmd>{{action.payload.args.command}}</cmd>",
"es": "Ejecutando <cmd>{{action.payload.args.command}}</cmd>",
"tr": "<cmd>{{action.payload.args.command}}</cmd> çalıştırılıyor",
"uk": "Виконую <cmd>{{action.payload.args.command}}</cmd>"
"en": "Running <cmd>{{command}}</cmd>",
"zh-CN": "运行 <cmd>{{command}}</cmd>",
"zh-TW": "執行 <cmd>{{command}}</cmd>",
"ko-KR": "실행 <cmd>{{command}}</cmd>",
"ja": "実行 <cmd>{{command}}</cmd>",
"no": "Kjører <cmd>{{command}}</cmd>",
"ar": "تشغيل <cmd>{{command}}</cmd>",
"de": "Führt <cmd>{{command}}</cmd> aus",
"fr": "Exécution de <cmd>{{command}}</cmd>",
"it": "Esecuzione di <cmd>{{command}}</cmd>",
"pt": "Executando <cmd>{{command}}</cmd>",
"es": "Ejecutando <cmd>{{command}}</cmd>",
"tr": "<cmd>{{command}}</cmd> çalıştırılıyor",
"uk": "Виконую <cmd>{{command}}</cmd>"
},
"ACTION_MESSAGE$RUN_IPYTHON": {
"en": "Running a Python command",
@@ -6477,52 +6432,52 @@
"uk": "Викликаю інструмент MCP: {{action.payload.args.name}}"
},
"ACTION_MESSAGE$READ": {
"en": "Reading <path>{{action.payload.args.path}}</path>",
"zh-CN": "读取 <path>{{action.payload.args.path}}</path>",
"zh-TW": "讀取 <path>{{action.payload.args.path}}</path>",
"ko-KR": "읽기 <path>{{action.payload.args.path}}</path>",
"ja": "読み取り <path>{{action.payload.args.path}}</path>",
"no": "Leser <path>{{action.payload.args.path}}</path>",
"ar": "قراءة <path>{{action.payload.args.path}}</path>",
"de": "Liest <path>{{action.payload.args.path}}</path>",
"fr": "Lecture de <path>{{action.payload.args.path}}</path>",
"it": "Lettura di <path>{{action.payload.args.path}}</path>",
"pt": "Lendo <path>{{action.payload.args.path}}</path>",
"es": "Leyendo <path>{{action.payload.args.path}}</path>",
"tr": "<path>{{action.payload.args.path}}</path> okunuyor",
"uk": "Читаю <path>{{action.payload.args.path}}</path>"
"en": "Reading <path>{{path}}</path>",
"zh-CN": "读取 <path>{{path}}</path>",
"zh-TW": "讀取 <path>{{path}}</path>",
"ko-KR": "읽기 <path>{{path}}</path>",
"ja": "読み取り <path>{{path}}</path>",
"no": "Leser <path>{{path}}</path>",
"ar": "قراءة <path>{{path}}</path>",
"de": "Liest <path>{{path}}</path>",
"fr": "Lecture de <path>{{path}}</path>",
"it": "Lettura di <path>{{path}}</path>",
"pt": "Lendo <path>{{path}}</path>",
"es": "Leyendo <path>{{path}}</path>",
"tr": "<path>{{path}}</path> okunuyor",
"uk": "Читаю <path>{{path}}</path>"
},
"ACTION_MESSAGE$EDIT": {
"en": "Editing <path>{{action.payload.args.path}}</path>",
"zh-CN": "编辑 <path>{{action.payload.args.path}}</path>",
"zh-TW": "編輯 <path>{{action.payload.args.path}}</path>",
"ko-KR": "편집 <path>{{action.payload.args.path}}</path>",
"ja": "編集 <path>{{action.payload.args.path}}</path>",
"no": "Redigerer <path>{{action.payload.args.path}}</path>",
"ar": "تحرير <path>{{action.payload.args.path}}</path>",
"de": "Bearbeitet <path>{{action.payload.args.path}}</path>",
"fr": "Modification de <path>{{action.payload.args.path}}</path>",
"it": "Modifica di <path>{{action.payload.args.path}}</path>",
"pt": "Editando <path>{{action.payload.args.path}}</path>",
"es": "Editando <path>{{action.payload.args.path}}</path>",
"tr": "<path>{{action.payload.args.path}}</path> düzenleniyor",
"uk": "Редагую <path>{{action.payload.args.path}}</path>"
"en": "Editing <path>{{path}}</path>",
"zh-CN": "编辑 <path>{{path}}</path>",
"zh-TW": "編輯 <path>{{path}}</path>",
"ko-KR": "편집 <path>{{path}}</path>",
"ja": "編集 <path>{{path}}</path>",
"no": "Redigerer <path>{{path}}</path>",
"ar": "تحرير <path>{{path}}</path>",
"de": "Bearbeitet <path>{{path}}</path>",
"fr": "Modification de <path>{{path}}</path>",
"it": "Modifica di <path>{{path}}</path>",
"pt": "Editando <path>{{path}}</path>",
"es": "Editando <path>{{path}}</path>",
"tr": "<path>{{path}}</path> düzenleniyor",
"uk": "Редагую <path>{{path}}</path>"
},
"ACTION_MESSAGE$WRITE": {
"en": "Writing to <path>{{action.payload.args.path}}</path>",
"zh-CN": "写入 <path>{{action.payload.args.path}}</path>",
"zh-TW": "寫入 <path>{{action.payload.args.path}}</path>",
"ko-KR": "쓰기 <path>{{action.payload.args.path}}</path>",
"ja": "書き込み <path>{{action.payload.args.path}}</path>",
"no": "Skriver til <path>{{action.payload.args.path}}</path>",
"ar": "الكتابة إلى <path>{{action.payload.args.path}}</path>",
"de": "Schreibt in <path>{{action.payload.args.path}}</path>",
"fr": "Écriture dans <path>{{action.payload.args.path}}</path>",
"it": "Scrittura su <path>{{action.payload.args.path}}</path>",
"pt": "Escrevendo em <path>{{action.payload.args.path}}</path>",
"es": "Escribiendo en <path>{{action.payload.args.path}}</path>",
"tr": "<path>{{action.payload.args.path}}</path> dosyasına yazılıyor",
"uk": "Записую в <path>{{action.payload.args.path}}</path>"
"en": "Writing to <path>{{path}}</path>",
"zh-CN": "写入 <path>{{path}}</path>",
"zh-TW": "寫入 <path>{{path}}</path>",
"ko-KR": "쓰기 <path>{{path}}</path>",
"ja": "書き込み <path>{{path}}</path>",
"no": "Skriver til <path>{{path}}</path>",
"ar": "الكتابة إلى <path>{{path}}</path>",
"de": "Schreibt in <path>{{path}}</path>",
"fr": "Écriture dans <path>{{path}}</path>",
"it": "Scrittura su <path>{{path}}</path>",
"pt": "Escrevendo em <path>{{path}}</path>",
"es": "Escribiendo en <path>{{path}}</path>",
"tr": "<path>{{path}}</path> dosyasına yazılıyor",
"uk": "Записую в <path>{{path}}</path>"
},
"ACTION_MESSAGE$BROWSE": {
"en": "Browsing the web",
@@ -6589,20 +6544,20 @@
"uk": "Системне повідомлення"
},
"OBSERVATION_MESSAGE$RUN": {
"en": "Ran <cmd>{{observation.payload.extras.command}}</cmd>",
"zh-CN": "运行 <cmd>{{observation.payload.extras.command}}</cmd>",
"zh-TW": "執行 <cmd>{{observation.payload.extras.command}}</cmd>",
"ko-KR": "실행 <cmd>{{observation.payload.extras.command}}</cmd>",
"ja": "実行 <cmd>{{observation.payload.extras.command}}</cmd>",
"no": "Kjørte <cmd>{{observation.payload.extras.command}}</cmd>",
"ar": "تم تشغيل <cmd>{{observation.payload.extras.command}}</cmd>",
"de": "Führte <cmd>{{observation.payload.extras.command}}</cmd> aus",
"fr": "A exécuté <cmd>{{observation.payload.extras.command}}</cmd>",
"it": "Ha eseguito <cmd>{{observation.payload.extras.command}}</cmd>",
"pt": "Executou <cmd>{{observation.payload.extras.command}}</cmd>",
"es": "Ejecutó <cmd>{{observation.payload.extras.command}}</cmd>",
"tr": "<cmd>{{observation.payload.extras.command}}</cmd> çalıştırıldı",
"uk": "Запустив <cmd>{{observation.payload.extras.command}}</cmd>"
"en": "Ran <cmd>{{command}}</cmd>",
"zh-CN": "运行 <cmd>{{command}}</cmd>",
"zh-TW": "執行 <cmd>{{command}}</cmd>",
"ko-KR": "실행 <cmd>{{command}}</cmd>",
"ja": "実行 <cmd>{{command}}</cmd>",
"no": "Kjørte <cmd>{{command}}</cmd>",
"ar": "تم تشغيل <cmd>{{command}}</cmd>",
"de": "Führte <cmd>{{command}}</cmd> aus",
"fr": "A exécuté <cmd>{{command}}</cmd>",
"it": "Ha eseguito <cmd>{{command}}</cmd>",
"pt": "Executou <cmd>{{command}}</cmd>",
"es": "Ejecutó <cmd>{{command}}</cmd>",
"tr": "<cmd>{{command}}</cmd> çalıştırıldı",
"uk": "Запустив <cmd>{{command}}</cmd>"
},
"OBSERVATION_MESSAGE$RUN_IPYTHON": {
"en": "Ran a Python command",
@@ -6621,52 +6576,52 @@
"uk": "Виконав команду Python"
},
"OBSERVATION_MESSAGE$READ": {
"en": "Read <path>{{observation.payload.extras.path}}</path>",
"zh-CN": "读取 <path>{{observation.payload.extras.path}}</path>",
"zh-TW": "讀取 <path>{{observation.payload.extras.path}}</path>",
"ko-KR": "읽기 <path>{{observation.payload.extras.path}}</path>",
"ja": "読み取り <path>{{observation.payload.extras.path}}</path>",
"no": "Leste <path>{{observation.payload.extras.path}}</path>",
"ar": "تمت قراءة <path>{{observation.payload.extras.path}}</path>",
"de": "Las <path>{{observation.payload.extras.path}}</path>",
"fr": "A lu <path>{{observation.payload.extras.path}}</path>",
"it": "Ha letto <path>{{observation.payload.extras.path}}</path>",
"pt": "Leu <path>{{observation.payload.extras.path}}</path>",
"es": "Leyó <path>{{observation.payload.extras.path}}</path>",
"tr": "<path>{{observation.payload.extras.path}}</path> okundu",
"uk": "Прочитав <path>{{observation.payload.extras.path}}</path>"
"en": "Read <path>{{path}}</path>",
"zh-CN": "读取 <path>{{path}}</path>",
"zh-TW": "讀取 <path>{{path}}</path>",
"ko-KR": "읽기 <path>{{path}}</path>",
"ja": "読み取り <path>{{path}}</path>",
"no": "Leste <path>{{path}}</path>",
"ar": "تمت قراءة <path>{{path}}</path>",
"de": "Las <path>{{path}}</path>",
"fr": "A lu <path>{{path}}</path>",
"it": "Ha letto <path>{{path}}</path>",
"pt": "Leu <path>{{path}}</path>",
"es": "Leyó <path>{{path}}</path>",
"tr": "<path>{{path}}</path> okundu",
"uk": "Прочитав <path>{{path}}</path>"
},
"OBSERVATION_MESSAGE$EDIT": {
"en": "Edited <path>{{observation.payload.extras.path}}</path>",
"zh-CN": "编辑 <path>{{observation.payload.extras.path}}</path>",
"zh-TW": "編輯 <path>{{observation.payload.extras.path}}</path>",
"ko-KR": "편집 <path>{{observation.payload.extras.path}}</path>",
"ja": "編集 <path>{{observation.payload.extras.path}}</path>",
"no": "Redigerte <path>{{observation.payload.extras.path}}</path>",
"ar": "تم تحرير <path>{{observation.payload.extras.path}}</path>",
"de": "Hat <path>{{observation.payload.extras.path}}</path> bearbeitet",
"fr": "A modifié <path>{{observation.payload.extras.path}}</path>",
"it": "Ha modificato <path>{{observation.payload.extras.path}}</path>",
"pt": "Editou <path>{{observation.payload.extras.path}}</path>",
"es": "Editó <path>{{observation.payload.extras.path}}</path>",
"tr": "<path>{{observation.payload.extras.path}}</path> düzenlendi",
"uk": "Відредагував <path>{{observation.payload.extras.path}}</path>"
"en": "Edited <path>{{path}}</path>",
"zh-CN": "编辑 <path>{{path}}</path>",
"zh-TW": "編輯 <path>{{path}}</path>",
"ko-KR": "편집 <path>{{path}}</path>",
"ja": "編集 <path>{{path}}</path>",
"no": "Redigerte <path>{{path}}</path>",
"ar": "تم تحرير <path>{{path}}</path>",
"de": "Hat <path>{{path}}</path> bearbeitet",
"fr": "A modifié <path>{{path}}</path>",
"it": "Ha modificato <path>{{path}}</path>",
"pt": "Editou <path>{{path}}</path>",
"es": "Editó <path>{{path}}</path>",
"tr": "<path>{{path}}</path> düzenlendi",
"uk": "Відредагував <path>{{path}}</path>"
},
"OBSERVATION_MESSAGE$WRITE": {
"en": "Wrote to <path>{{observation.payload.extras.path}}</path>",
"zh-CN": "写入 <path>{{observation.payload.extras.path}}</path>",
"zh-TW": "寫入 <path>{{observation.payload.extras.path}}</path>",
"ko-KR": "쓰기 <path>{{observation.payload.extras.path}}</path>",
"ja": "書き込み <path>{{observation.payload.extras.path}}</path>",
"no": "Skrev til <path>{{observation.payload.extras.path}}</path>",
"ar": "تمت الكتابة إلى <path>{{observation.payload.extras.path}}</path>",
"de": "Hat in <path>{{observation.payload.extras.path}}</path> geschrieben",
"fr": "A écrit dans <path>{{observation.payload.extras.path}}</path>",
"it": "Ha scritto su <path>{{observation.payload.extras.path}}</path>",
"pt": "Escreveu em <path>{{observation.payload.extras.path}}</path>",
"es": "Escribió en <path>{{observation.payload.extras.path}}</path>",
"tr": "<path>{{observation.payload.extras.path}}</path> dosyasına yazıldı",
"uk": "Записав на <path>{{observation.payload.extras.path}}</path>"
"en": "Wrote to <path>{{path}}</path>",
"zh-CN": "写入 <path>{{path}}</path>",
"zh-TW": "寫入 <path>{{path}}</path>",
"ko-KR": "쓰기 <path>{{path}}</path>",
"ja": "書き込み <path>{{path}}</path>",
"no": "Skrev til <path>{{path}}</path>",
"ar": "تمت الكتابة إلى <path>{{path}}</path>",
"de": "Hat in <path>{{path}}</path> geschrieben",
"fr": "A écrit dans <path>{{path}}</path>",
"it": "Ha scritto su <path>{{path}}</path>",
"pt": "Escreveu em <path>{{path}}</path>",
"es": "Escribió en <path>{{path}}</path>",
"tr": "<path>{{path}}</path> dosyasına yazıldı",
"uk": "Записав на <path>{{path}}</path>"
},
"OBSERVATION_MESSAGE$BROWSE": {
"en": "Browsing completed",
+1 -19
View File
@@ -13,7 +13,6 @@ import {
useConversation,
} from "#/context/conversation-context";
import { Controls } from "#/components/features/controls/controls";
import { clearMessages, addUserMessage } from "#/state/chat-slice";
import { clearTerminal } from "#/state/command-slice";
import { useEffectOnce } from "#/hooks/use-effect-once";
import GlobeIcon from "#/icons/globe.svg?react";
@@ -34,7 +33,6 @@ import Security from "#/components/shared/modals/security/security";
import { useUserConversation } from "#/hooks/query/use-user-conversation";
import { ServedAppLabel } from "#/components/layout/served-app-label";
import { useSettings } from "#/hooks/query/use-settings";
import { clearFiles, clearInitialPrompt } from "#/state/initial-query-slice";
import { RootState } from "#/store";
import { displayErrorToast } from "#/utils/custom-toast-handlers";
import { useDocumentTitleFromState } from "#/hooks/use-document-title-from-state";
@@ -49,9 +47,7 @@ function AppContent() {
const { data: conversation, isFetched } = useUserConversation(
conversationId || null,
);
const { initialPrompt, files } = useSelector(
(state: RootState) => state.initialQuery,
);
const { curAgentState } = useSelector((state: RootState) => state.agent);
const dispatch = useDispatch();
const navigate = useNavigate();
@@ -71,25 +67,11 @@ function AppContent() {
}, [conversation, isFetched]);
React.useEffect(() => {
dispatch(clearMessages());
dispatch(clearTerminal());
dispatch(clearJupyter());
if (conversationId && (initialPrompt || files.length > 0)) {
dispatch(
addUserMessage({
content: initialPrompt || "",
imageUrls: files || [],
timestamp: new Date().toISOString(),
pending: true,
}),
);
dispatch(clearInitialPrompt());
dispatch(clearFiles());
}
}, [conversationId]);
useEffectOnce(() => {
dispatch(clearMessages());
dispatch(clearTerminal());
dispatch(clearJupyter());
});
@@ -4,7 +4,6 @@ import { StatusMessage } from "#/types/message";
import { queryClient } from "#/query-client-config";
import store from "#/store";
import { setCurStatusMessage } from "#/state/status-slice";
import { addErrorMessage } from "#/state/chat-slice";
import { trackError } from "#/utils/error-handler";
// Mock dependencies
@@ -101,9 +100,6 @@ describe("handleStatusMessage", () => {
metadata: { msgId: "ERROR_ID" },
});
// Verify that store.dispatch was called with addErrorMessage
expect(store.dispatch).toHaveBeenCalledWith(addErrorMessage(statusMessage));
// Verify that queryClient.invalidateQueries was not called
expect(queryClient.invalidateQueries).not.toHaveBeenCalled();
});
-122
View File
@@ -1,13 +1,5 @@
import {
addAssistantMessage,
addAssistantAction,
addUserMessage,
addErrorMessage,
} from "#/state/chat-slice";
import { trackError } from "#/utils/error-handler";
import { appendSecurityAnalyzerInput } from "#/state/security-analyzer-slice";
import { setCode, setActiveFilepath } from "#/state/code-slice";
import { appendJupyterInput } from "#/state/jupyter-slice";
import { setCurStatusMessage } from "#/state/status-slice";
import { setMetrics } from "#/state/metrics-slice";
import store from "#/store";
@@ -21,67 +13,6 @@ import { handleObservationMessage } from "./observations";
import { appendInput } from "#/state/command-slice";
import { queryClient } from "#/query-client-config";
const messageActions = {
[ActionType.BROWSE]: (message: ActionMessage) => {
if (!message.args.thought && message.message) {
store.dispatch(addAssistantMessage(message.message));
}
},
[ActionType.BROWSE_INTERACTIVE]: (message: ActionMessage) => {
if (!message.args.thought && message.message) {
store.dispatch(addAssistantMessage(message.message));
}
},
[ActionType.WRITE]: (message: ActionMessage) => {
const { path, content } = message.args;
store.dispatch(setActiveFilepath(path));
store.dispatch(setCode(content));
},
[ActionType.MESSAGE]: (message: ActionMessage) => {
if (message.source === "user") {
store.dispatch(
addUserMessage({
content: message.args.content,
imageUrls:
typeof message.args.image_urls === "string"
? [message.args.image_urls]
: message.args.image_urls,
timestamp: message.timestamp,
pending: false,
}),
);
} else {
store.dispatch(addAssistantMessage(message.args.content));
}
},
[ActionType.RUN_IPYTHON]: (message: ActionMessage) => {
if (message.args.confirmation_state !== "rejected") {
store.dispatch(appendJupyterInput(message.args.code));
}
},
[ActionType.FINISH]: (message: ActionMessage) => {
store.dispatch(addAssistantMessage(message.args.final_thought));
let successPrediction = "";
if (message.args.task_completed === "partial") {
successPrediction =
"I believe that the task was **completed partially**.";
} else if (message.args.task_completed === "false") {
successPrediction = "I believe that the task was **not completed**.";
} else if (message.args.task_completed === "true") {
successPrediction =
"I believe that the task was **completed successfully**.";
}
if (successPrediction) {
// if final_thought is not empty, add a new line before the success prediction
if (message.args.final_thought) {
store.dispatch(addAssistantMessage(`\n${successPrediction}`));
} else {
store.dispatch(addAssistantMessage(successPrediction));
}
}
},
};
export function handleActionMessage(message: ActionMessage) {
if (message.args?.hidden) {
return;
@@ -103,26 +34,6 @@ export function handleActionMessage(message: ActionMessage) {
if ("args" in message && "security_risk" in message.args) {
store.dispatch(appendSecurityAnalyzerInput(message));
}
if (message.source === "agent") {
// Only add thought as a message if it's not a "think" action
if (
message.args &&
message.args.thought &&
message.action !== ActionType.THINK
) {
store.dispatch(addAssistantMessage(message.args.thought));
}
// Need to convert ActionMessage to RejectAction
// @ts-expect-error TODO: fix
store.dispatch(addAssistantAction(message));
}
if (message.action in messageActions) {
const actionFn =
messageActions[message.action as keyof typeof messageActions];
actionFn(message);
}
}
export function handleStatusMessage(message: StatusMessage) {
@@ -146,11 +57,6 @@ export function handleStatusMessage(message: StatusMessage) {
source: "chat",
metadata: { msgId: message.id },
});
store.dispatch(
addErrorMessage({
...message,
}),
);
}
}
@@ -161,33 +67,5 @@ export function handleAssistantMessage(message: Record<string, unknown>) {
handleObservationMessage(message as unknown as ObservationMessage);
} else if (message.status_update) {
handleStatusMessage(message as unknown as StatusMessage);
} else if (message.error) {
// Handle error messages from the server
const errorMessage =
typeof message.message === "string"
? message.message
: String(message.message || "Unknown error");
trackError({
message: errorMessage,
source: "websocket",
metadata: { raw_message: message },
});
store.dispatch(
addErrorMessage({
message: errorMessage,
}),
);
} else {
const errorMsg = "Unknown message type received";
trackError({
message: errorMsg,
source: "chat",
metadata: { raw_message: message },
});
store.dispatch(
addErrorMessage({
message: errorMsg,
}),
);
}
}
@@ -1,24 +0,0 @@
import ActionType from "#/types/action-type";
export const generateDelegateToReadOnlyAction = () => ({
action: ActionType.DELEGATE,
args: {
agent: "ReadOnlyAgent",
inputs: {
task: "Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes.",
},
thought: "Switching to read-only mode at user's request",
},
});
export const generateFinishDelegationAction = () => ({
action: ActionType.FINISH,
args: {
message:
"Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.",
task_completed: "true",
outputs: {
mode_switch: true,
},
},
});
+8 -204
View File
@@ -2,14 +2,9 @@ import { setCurrentAgentState } from "#/state/agent-slice";
import { setUrl, setScreenshotSrc } from "#/state/browser-slice";
import store from "#/store";
import { ObservationMessage } from "#/types/message";
import { AgentState } from "#/types/agent-state";
import { appendOutput } from "#/state/command-slice";
import { appendJupyterOutput } from "#/state/jupyter-slice";
import ObservationType from "#/types/observation-type";
import {
addAssistantMessage,
addAssistantObservation,
} from "#/state/chat-slice";
export function handleObservationMessage(message: ObservationMessage) {
switch (message.observation) {
@@ -26,8 +21,14 @@ export function handleObservationMessage(message: ObservationMessage) {
break;
}
case ObservationType.RUN_IPYTHON:
// FIXME: render this as markdown
store.dispatch(appendJupyterOutput(message.content));
store.dispatch(
appendJupyterOutput({
content: message.content,
imageUrls: Array.isArray(message.extras?.image_urls)
? message.extras.image_urls
: undefined,
}),
);
break;
case ObservationType.BROWSE:
case ObservationType.BROWSE_INTERACTIVE:
@@ -42,11 +43,6 @@ export function handleObservationMessage(message: ObservationMessage) {
store.dispatch(setCurrentAgentState(message.extras.agent_state));
break;
case ObservationType.DELEGATE:
// TODO: better UI for delegation result (#2309)
if (message.content) {
store.dispatch(addAssistantMessage(message.content));
}
break;
case ObservationType.READ:
case ObservationType.EDIT:
case ObservationType.THINK:
@@ -56,107 +52,13 @@ export function handleObservationMessage(message: ObservationMessage) {
case ObservationType.MCP:
break; // We don't display the default message for these observations
default:
store.dispatch(addAssistantMessage(message.message));
break;
}
if (!message.extras?.hidden) {
// Convert the message to the appropriate observation type
const { observation } = message;
const baseObservation = {
...message,
source: "agent" as const,
};
switch (observation) {
case "agent_state_changed":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "agent_state_changed" as const,
extras: {
agent_state: (message.extras.agent_state as AgentState) || "idle",
},
}),
);
break;
case "recall":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "recall" as const,
extras: {
...(message.extras || {}),
recall_type:
(message.extras?.recall_type as
| "workspace_context"
| "knowledge") || "knowledge",
},
}),
);
break;
case "run":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "run" as const,
extras: {
command: String(message.extras.command || ""),
metadata: message.extras.metadata,
hidden: Boolean(message.extras.hidden),
},
}),
);
break;
case "read":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation,
extras: {
path: String(message.extras.path || ""),
impl_source: String(message.extras.impl_source || ""),
},
}),
);
break;
case "edit":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation,
extras: {
path: String(message.extras.path || ""),
diff: String(message.extras.diff || ""),
impl_source: String(message.extras.impl_source || ""),
},
}),
);
break;
case "run_ipython":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "run_ipython" as const,
extras: {
code: String(message.extras.code || ""),
},
}),
);
break;
case "delegate":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "delegate" as const,
extras: {
outputs:
typeof message.extras.outputs === "object"
? (message.extras.outputs as Record<string, unknown>)
: {},
},
}),
);
break;
case "browse":
if (message.extras?.screenshot) {
store.dispatch(setScreenshotSrc(message.extras.screenshot));
@@ -164,45 +66,6 @@ export function handleObservationMessage(message: ObservationMessage) {
if (message.extras?.url) {
store.dispatch(setUrl(message.extras.url));
}
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "browse" as const,
extras: {
url: String(message.extras.url || ""),
screenshot: String(message.extras.screenshot || ""),
error: Boolean(message.extras.error),
open_page_urls: Array.isArray(message.extras.open_page_urls)
? message.extras.open_page_urls
: [],
active_page_index: Number(message.extras.active_page_index || 0),
dom_object:
typeof message.extras.dom_object === "object"
? (message.extras.dom_object as Record<string, unknown>)
: {},
axtree_object:
typeof message.extras.axtree_object === "object"
? (message.extras.axtree_object as Record<string, unknown>)
: {},
extra_element_properties:
typeof message.extras.extra_element_properties === "object"
? (message.extras.extra_element_properties as Record<
string,
unknown
>)
: {},
last_browser_action: String(
message.extras.last_browser_action || "",
),
last_browser_action_error:
message.extras.last_browser_action_error,
focused_element_bid: String(
message.extras.focused_element_bid || "",
),
},
}),
);
break;
case "browse_interactive":
if (message.extras?.screenshot) {
@@ -211,65 +74,6 @@ export function handleObservationMessage(message: ObservationMessage) {
if (message.extras?.url) {
store.dispatch(setUrl(message.extras.url));
}
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "browse_interactive" as const,
extras: {
url: String(message.extras.url || ""),
screenshot: String(message.extras.screenshot || ""),
error: Boolean(message.extras.error),
open_page_urls: Array.isArray(message.extras.open_page_urls)
? message.extras.open_page_urls
: [],
active_page_index: Number(message.extras.active_page_index || 0),
dom_object:
typeof message.extras.dom_object === "object"
? (message.extras.dom_object as Record<string, unknown>)
: {},
axtree_object:
typeof message.extras.axtree_object === "object"
? (message.extras.axtree_object as Record<string, unknown>)
: {},
extra_element_properties:
typeof message.extras.extra_element_properties === "object"
? (message.extras.extra_element_properties as Record<
string,
unknown
>)
: {},
last_browser_action: String(
message.extras.last_browser_action || "",
),
last_browser_action_error:
message.extras.last_browser_action_error,
focused_element_bid: String(
message.extras.focused_element_bid || "",
),
},
}),
);
break;
case "error":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "error" as const,
source: "user" as const,
extras: {
error_id: message.extras.error_id,
},
}),
);
break;
case "mcp":
store.dispatch(
addAssistantObservation({
...baseObservation,
observation: "mcp" as const,
}),
);
break;
default:
// For any unhandled observation types, just ignore them
+1 -10
View File
@@ -5,23 +5,14 @@ export const agentSlice = createSlice({
name: "agent",
initialState: {
curAgentState: AgentState.LOADING,
currentAgentType: "CodeActAgent", // Default agent type
isDelegated: false, // Track if we're in a delegation
},
reducers: {
setCurrentAgentState: (state, action) => {
state.curAgentState = action.payload;
},
setAgentType: (state, action) => {
state.currentAgentType = action.payload;
},
setDelegationState: (state, action) => {
state.isDelegated = action.payload;
},
},
});
export const { setCurrentAgentState, setAgentType, setDelegationState } =
agentSlice.actions;
export const { setCurrentAgentState } = agentSlice.actions;
export default agentSlice.reducer;
-380
View File
@@ -1,380 +0,0 @@
import { createSlice, PayloadAction } from "@reduxjs/toolkit";
import type { Message } from "#/message";
import { ActionSecurityRisk } from "#/state/security-analyzer-slice";
import { OpenHandsAction } from "#/types/core/actions";
import { OpenHandsEventType } from "#/types/core/base";
import {
CommandObservation,
IPythonObservation,
OpenHandsObservation,
RecallObservation,
} from "#/types/core/observations";
type SliceState = {
messages: Message[];
systemMessage: {
content: string;
tools: Array<Record<string, unknown>> | null;
openhands_version: string | null;
agent_class: string | null;
} | null;
};
const MAX_CONTENT_LENGTH = 1000;
const HANDLED_ACTIONS: OpenHandsEventType[] = [
"run",
"run_ipython",
"write",
"read",
"browse",
"browse_interactive",
"edit",
"recall",
"think",
"system",
"call_tool_mcp",
"mcp",
];
function getRiskText(risk: ActionSecurityRisk) {
switch (risk) {
case ActionSecurityRisk.LOW:
return "Low Risk";
case ActionSecurityRisk.MEDIUM:
return "Medium Risk";
case ActionSecurityRisk.HIGH:
return "High Risk";
case ActionSecurityRisk.UNKNOWN:
default:
return "Unknown Risk";
}
}
const initialState: SliceState = {
messages: [],
systemMessage: null,
};
export const chatSlice = createSlice({
name: "chat",
initialState,
reducers: {
addUserMessage(
state,
action: PayloadAction<{
content: string;
imageUrls: string[];
timestamp: string;
pending?: boolean;
}>,
) {
const message: Message = {
type: "thought",
sender: "user",
content: action.payload.content,
imageUrls: action.payload.imageUrls,
timestamp: action.payload.timestamp || new Date().toISOString(),
pending: !!action.payload.pending,
};
// Remove any pending messages
let i = state.messages.length;
while (i) {
i -= 1;
const m = state.messages[i] as Message;
if (m.pending) {
state.messages.splice(i, 1);
}
}
state.messages.push(message);
},
addAssistantMessage(state: SliceState, action: PayloadAction<string>) {
const message: Message = {
type: "thought",
sender: "assistant",
content: action.payload,
imageUrls: [],
timestamp: new Date().toISOString(),
pending: false,
};
state.messages.push(message);
},
addAssistantAction(
state: SliceState,
action: PayloadAction<OpenHandsAction>,
) {
const actionID = action.payload.action;
if (!HANDLED_ACTIONS.includes(actionID)) {
return;
}
const translationID = `ACTION_MESSAGE$${actionID.toUpperCase()}`;
let text = "";
if (actionID === "system") {
// Store the system message in the state
state.systemMessage = {
content: action.payload.args.content,
tools: action.payload.args.tools,
openhands_version: action.payload.args.openhands_version,
agent_class: action.payload.args.agent_class,
};
// Don't add a message for system actions
return;
}
if (actionID === "run") {
text = `Command:\n\`${action.payload.args.command}\``;
} else if (actionID === "run_ipython") {
text = `\`\`\`\n${action.payload.args.code}\n\`\`\``;
} else if (actionID === "write") {
let { content } = action.payload.args;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
text = `${action.payload.args.path}\n${content}`;
} else if (actionID === "browse") {
text = `Browsing ${action.payload.args.url}`;
} else if (actionID === "browse_interactive") {
// Include the browser_actions in the content
text = `**Action:**\n\n\`\`\`python\n${action.payload.args.browser_actions}\n\`\`\``;
} else if (actionID === "recall") {
// skip recall actions
return;
} else if (actionID === "call_tool_mcp") {
// Format MCP action with name and arguments
const name = action.payload.args.name || "";
const args = action.payload.args.arguments || {};
text = `**MCP Tool Call:** ${name}\n\n`;
// Include thought if available
if (action.payload.args.thought) {
text += `\n\n**Thought:**\n${action.payload.args.thought}`;
}
text += `\n\n**Arguments:**\n\`\`\`json\n${JSON.stringify(args, null, 2)}\n\`\`\``;
}
if (actionID === "run" || actionID === "run_ipython") {
if (
action.payload.args.confirmation_state === "awaiting_confirmation"
) {
text += `\n\n${getRiskText(action.payload.args.security_risk as unknown as ActionSecurityRisk)}`;
}
} else if (actionID === "think") {
text = action.payload.args.thought;
}
const message: Message = {
type: "action",
sender: "assistant",
translationID,
eventID: action.payload.id,
content: text,
imageUrls: [],
timestamp: new Date().toISOString(),
action,
};
state.messages.push(message);
},
addAssistantObservation(
state: SliceState,
observation: PayloadAction<OpenHandsObservation>,
) {
const observationID = observation.payload.observation;
if (!HANDLED_ACTIONS.includes(observationID)) {
return;
}
// Special handling for RecallObservation - create a new message instead of updating an existing one
if (observationID === "recall") {
const recallObs = observation.payload as RecallObservation;
let content = ``;
// Handle workspace context
if (recallObs.extras.recall_type === "workspace_context") {
if (recallObs.extras.repo_name) {
content += `\n\n**Repository:** ${recallObs.extras.repo_name}`;
}
if (recallObs.extras.repo_directory) {
content += `\n\n**Directory:** ${recallObs.extras.repo_directory}`;
}
if (recallObs.extras.date) {
content += `\n\n**Date:** ${recallObs.extras.date}`;
}
if (
recallObs.extras.runtime_hosts &&
Object.keys(recallObs.extras.runtime_hosts).length > 0
) {
content += `\n\n**Available Hosts**`;
for (const [host, port] of Object.entries(
recallObs.extras.runtime_hosts,
)) {
content += `\n\n- ${host} (port ${port})`;
}
}
if (
recallObs.extras.custom_secrets_descriptions &&
Object.keys(recallObs.extras.custom_secrets_descriptions).length > 0
) {
content += `\n\n**Custom Secrets**`;
for (const [name, description] of Object.entries(
recallObs.extras.custom_secrets_descriptions,
)) {
content += `\n\n- $${name}: ${description}`;
}
}
if (recallObs.extras.repo_instructions) {
content += `\n\n**Repository Instructions:**\n\n${recallObs.extras.repo_instructions}`;
}
if (recallObs.extras.additional_agent_instructions) {
content += `\n\n**Additional Instructions:**\n\n${recallObs.extras.additional_agent_instructions}`;
}
}
// Create a new message for the observation
// Use the correct translation ID format that matches what's in the i18n file
const translationID = `OBSERVATION_MESSAGE$${observationID.toUpperCase()}`;
// Handle microagent knowledge
if (
recallObs.extras.microagent_knowledge &&
recallObs.extras.microagent_knowledge.length > 0
) {
content += `\n\n**Triggered Microagent Knowledge:**`;
for (const knowledge of recallObs.extras.microagent_knowledge) {
content += `\n\n- **${knowledge.name}** (triggered by keyword: ${knowledge.trigger})\n\n\`\`\`\n${knowledge.content}\n\`\`\``;
}
}
const message: Message = {
type: "action",
sender: "assistant",
translationID,
eventID: observation.payload.id,
content,
imageUrls: [],
timestamp: new Date().toISOString(),
success: true,
};
state.messages.push(message);
return; // Skip the normal observation handling below
}
// Normal handling for other observation types
const translationID = `OBSERVATION_MESSAGE$${observationID.toUpperCase()}`;
const causeID = observation.payload.cause;
const causeMessage = state.messages.find(
(message) => message.eventID === causeID,
);
if (!causeMessage) {
return;
}
causeMessage.translationID = translationID;
causeMessage.observation = observation;
// Set success property based on observation type
if (observationID === "run") {
const commandObs = observation.payload as CommandObservation;
// If exit_code is -1, it means the command timed out, so we set success to undefined
// to not show any status indicator
if (commandObs.extras.metadata.exit_code === -1) {
causeMessage.success = undefined;
} else {
causeMessage.success = commandObs.extras.metadata.exit_code === 0;
}
} else if (observationID === "run_ipython") {
// For IPython, we consider it successful if there's no error message
const ipythonObs = observation.payload as IPythonObservation;
causeMessage.success = !ipythonObs.content
.toLowerCase()
.includes("error:");
} else if (observationID === "read" || observationID === "edit") {
// For read/edit operations, we consider it successful if there's content and no error
if (observation.payload.extras.impl_source === "oh_aci") {
causeMessage.success =
observation.payload.content.length > 0 &&
!observation.payload.content.startsWith("ERROR:\n");
} else {
causeMessage.success =
observation.payload.content.length > 0 &&
!observation.payload.content.toLowerCase().includes("error:");
}
}
if (observationID === "run" || observationID === "run_ipython") {
let { content } = observation.payload;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
content = `${causeMessage.content}\n\nOutput:\n\`\`\`\n${content.trim() || "[Command finished execution with no output]"}\n\`\`\``;
causeMessage.content = content; // Observation content includes the action
} else if (observationID === "read") {
causeMessage.content = `\`\`\`\n${observation.payload.content}\n\`\`\``; // Content is already truncated by the ACI
} else if (observationID === "edit") {
if (causeMessage.success) {
causeMessage.content = `\`\`\`diff\n${observation.payload.extras.diff}\n\`\`\``; // Content is already truncated by the ACI
} else {
causeMessage.content = observation.payload.content;
}
} else if (observationID === "browse") {
let content = `**URL:** ${observation.payload.extras.url}\n`;
if (observation.payload.extras.error) {
content += `\n\n**Error:**\n${observation.payload.extras.error}\n`;
}
content += `\n\n**Output:**\n${observation.payload.content}`;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...(truncated)`;
}
causeMessage.content = content;
} else if (observationID === "mcp") {
// For MCP observations, we want to show the content as formatted output
// similar to how run/run_ipython actions are handled
let { content } = observation.payload;
if (content.length > MAX_CONTENT_LENGTH) {
content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
}
content = `${causeMessage.content}\n\n**Output:**\n\`\`\`\n${content.trim() || "[MCP Tool finished execution with no output]"}\n\`\`\``;
causeMessage.content = content; // Observation content includes the action
// Set success based on whether there's an error message
causeMessage.success = !observation.payload.content
.toLowerCase()
.includes("error:");
}
},
addErrorMessage(
state: SliceState,
action: PayloadAction<{ id?: string; message: string }>,
) {
const { id, message } = action.payload;
state.messages.push({
translationID: id,
content: message,
type: "error",
sender: "assistant",
timestamp: new Date().toISOString(),
});
},
clearMessages(state: SliceState) {
state.messages = [];
state.systemMessage = null;
},
},
});
export const {
addUserMessage,
addAssistantMessage,
addAssistantAction,
addAssistantObservation,
addErrorMessage,
clearMessages,
} = chatSlice.actions;
// Selectors
export const selectSystemMessage = (state: { chat: SliceState }) =>
state.chat.systemMessage;
export default chatSlice.reducer;
+6 -1
View File
@@ -3,6 +3,7 @@ import { createSlice } from "@reduxjs/toolkit";
export type Cell = {
content: string;
type: "input" | "output";
imageUrls?: string[];
};
const initialCells: Cell[] = [];
@@ -17,7 +18,11 @@ export const jupyterSlice = createSlice({
state.cells.push({ content: action.payload, type: "input" });
},
appendJupyterOutput: (state, action) => {
state.cells.push({ content: action.payload, type: "output" });
state.cells.push({
content: action.payload.content,
type: "output",
imageUrls: action.payload.imageUrls,
});
},
clearJupyter: (state) => {
state.cells = [];
-2
View File
@@ -1,7 +1,6 @@
import { combineReducers, configureStore } from "@reduxjs/toolkit";
import agentReducer from "./state/agent-slice";
import browserReducer from "./state/browser-slice";
import chatReducer from "./state/chat-slice";
import codeReducer from "./state/code-slice";
import fileStateReducer from "./state/file-state-slice";
import initialQueryReducer from "./state/initial-query-slice";
@@ -15,7 +14,6 @@ export const rootReducer = combineReducers({
fileState: fileStateReducer,
initialQuery: initialQueryReducer,
browser: browserReducer,
chat: chatReducer,
code: codeReducer,
cmd: commandReducer,
agent: agentReducer,
+6 -2
View File
@@ -2,6 +2,7 @@ export type OpenHandsEventType =
| "message"
| "system"
| "agent_state_changed"
| "change_agent_state"
| "run"
| "read"
| "write"
@@ -16,11 +17,14 @@ export type OpenHandsEventType =
| "error"
| "recall"
| "mcp"
| "call_tool_mcp";
| "call_tool_mcp"
| "user_rejected";
export type OpenHandsSourceType = "agent" | "user" | "environment";
interface OpenHandsBaseEvent {
id: number;
source: "agent" | "user";
source: OpenHandsSourceType;
message: string;
timestamp: string; // ISO 8601
}
+59
View File
@@ -0,0 +1,59 @@
import { OpenHandsParsedEvent } from ".";
import {
UserMessageAction,
AssistantMessageAction,
OpenHandsAction,
SystemMessageAction,
} from "./actions";
import {
CommandObservation,
ErrorObservation,
OpenHandsObservation,
} from "./observations";
export const isOpenHandsAction = (
event: OpenHandsParsedEvent,
): event is OpenHandsAction => "action" in event;
export const isOpenHandsObservation = (
event: OpenHandsParsedEvent,
): event is OpenHandsObservation => "observation" in event;
export const isUserMessage = (
event: OpenHandsParsedEvent,
): event is UserMessageAction =>
isOpenHandsAction(event) &&
event.source === "user" &&
event.action === "message";
export const isAssistantMessage = (
event: OpenHandsParsedEvent,
): event is AssistantMessageAction =>
isOpenHandsAction(event) &&
event.source === "agent" &&
(event.action === "message" || event.action === "finish");
export const isErrorObservation = (
event: OpenHandsParsedEvent,
): event is ErrorObservation =>
isOpenHandsObservation(event) && event.observation === "error";
export const isCommandObservation = (
event: OpenHandsParsedEvent,
): event is CommandObservation =>
isOpenHandsObservation(event) && event.observation === "run";
export const isFinishAction = (
event: OpenHandsParsedEvent,
): event is AssistantMessageAction =>
isOpenHandsAction(event) && event.action === "finish";
export const isSystemMessage = (
event: OpenHandsParsedEvent,
): event is SystemMessageAction =>
isOpenHandsAction(event) && event.action === "system";
export const isRejectObservation = (
event: OpenHandsParsedEvent,
): event is OpenHandsObservation =>
isOpenHandsObservation(event) && event.observation === "user_rejected";
+11 -1
View File
@@ -23,6 +23,7 @@ export interface IPythonObservation
source: "agent";
extras: {
code: string;
image_urls?: string[];
};
}
@@ -137,6 +138,14 @@ export interface MCPObservation extends OpenHandsObservationEvent<"mcp"> {
};
}
export interface UserRejectedObservation
extends OpenHandsObservationEvent<"user_rejected"> {
source: "agent";
extras: {
// Add any specific fields for MCP observations
};
}
export type OpenHandsObservation =
| AgentStateChangeObservation
| AgentThinkObservation
@@ -150,4 +159,5 @@ export type OpenHandsObservation =
| EditObservation
| ErrorObservation
| RecallObservation
| MCPObservation;
| MCPObservation
| UserRejectedObservation;
+19 -13
View File
@@ -1,26 +1,32 @@
export type JupyterLine = { type: "plaintext" | "image"; content: string };
export type JupyterLine = {
type: "plaintext" | "image";
content: string;
url?: string;
};
const IMAGE_PREFIX = "![image](data:image/png;base64,";
export const parseCellContent = (content: string) => {
export const parseCellContent = (content: string, imageUrls?: string[]) => {
const lines: JupyterLine[] = [];
let currentText = "";
// First, process the text content
for (const line of content.split("\n")) {
if (line.startsWith(IMAGE_PREFIX)) {
if (currentText) {
lines.push({ type: "plaintext", content: currentText });
currentText = ""; // Reset after pushing plaintext
}
lines.push({ type: "image", content: line });
} else {
currentText += `${line}\n`;
}
currentText += `${line}\n`;
}
if (currentText) {
lines.push({ type: "plaintext", content: currentText });
}
// Then, add image lines if we have image URLs
if (imageUrls && imageUrls.length > 0) {
imageUrls.forEach((url) => {
lines.push({
type: "image",
content: `![image](${url})`,
url,
});
});
}
return lines;
};
@@ -37,3 +37,8 @@ Today's date is {{ runtime_info.date }} (UTC).
{% endif %}
</RUNTIME_INFORMATION>
{% endif %}
{% if runtime_info and runtime_info.context_message -%}
<CONTEXT_MESSAGE>
{{ runtime_info.context_message }}
</CONTEXT_MESSAGE>
{% endif %}
+2 -1
View File
@@ -251,7 +251,8 @@ async def run_session(
)
# Add MCP tools to the agent
await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
if agent.config.enable_mcp:
await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
# Clear loading animation
is_loaded.set()
+2
View File
@@ -28,6 +28,8 @@ class AgentConfig(BaseModel):
"""Whether to enable finish tool"""
enable_prompt_extensions: bool = Field(default=True)
"""Whether to enable prompt extensions"""
enable_mcp: bool = Field(default=True)
"""Whether to enable MCP tools"""
disabled_microagents: list[str] = Field(default_factory=list)
"""A list of microagents to disable (by name, without .py extension, e.g. ["github", "lint"]). Default is None."""
enable_history_truncation: bool = Field(default=True)
+2 -1
View File
@@ -129,7 +129,8 @@ async def run_controller(
)
# Add MCP tools to the agent
await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
if agent.config.enable_mcp:
await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
replay_events: list[Event] | None = None
if config.replay_trajectory_path:
+1 -1
View File
@@ -154,7 +154,7 @@ def create_memory(
if runtime:
# sets available hosts
memory.set_runtime_info(runtime, {})
memory.set_contextual_info(runtime, {})
# loads microagents from repo/.openhands/microagents
microagents: list[BaseMicroagent] = runtime.get_microagents_from_selected_repo(
+1
View File
@@ -75,6 +75,7 @@ class RecallObservation(Observation):
additional_agent_instructions: str = ''
date: str = ''
custom_secrets_descriptions: dict[str, str] = field(default_factory=dict)
context_message: str | None = None
# knowledge
microagent_knowledge: list[MicroagentKnowledge] = field(default_factory=list)
+5 -1
View File
@@ -170,6 +170,7 @@ class IPythonRunCellObservation(Observation):
code: str
observation: str = ObservationType.RUN_IPYTHON
image_urls: list[str] | None = None
@property
def error(self) -> bool:
@@ -184,4 +185,7 @@ class IPythonRunCellObservation(Observation):
return True # IPython cells are always considered successful
def __str__(self) -> str:
return f'**IPythonRunCellObservation**\n{self.content}'
result = f'**IPythonRunCellObservation**\n{self.content}'
if self.image_urls:
result += f'\nImages: {len(self.image_urls)}'
return result
+13 -5
View File
@@ -41,7 +41,7 @@ from openhands.events.observation.error import ErrorObservation
from openhands.events.observation.mcp import MCPObservation
from openhands.events.observation.observation import Observation
from openhands.events.serialization.event import truncate_content
from openhands.utils.prompt import PromptManager, RepositoryInfo, RuntimeInfo
from openhands.utils.prompt import PromptManager, RepositoryInfo, ContextualInfo
class ConversationMemory:
@@ -360,7 +360,7 @@ class ConversationMemory:
message = Message(role='user', content=[TextContent(text=text)])
elif isinstance(obs, IPythonRunCellObservation):
text = obs.content
# replace base64 images with a placeholder
# Clean up any remaining base64 images in text content
splitted = text.split('\n')
for i, line in enumerate(splitted):
if '![image](data:image/png;base64,' in line:
@@ -369,7 +369,15 @@ class ConversationMemory:
)
text = '\n'.join(splitted)
text = truncate_content(text, max_message_chars)
message = Message(role='user', content=[TextContent(text=text)])
# Create message content with text
content = [TextContent(text=text)]
# Add image URLs if available and vision is active
if vision_is_active and obs.image_urls:
content.append(ImageContent(image_urls=obs.image_urls))
message = Message(role='user', content=content)
elif isinstance(obs, FileEditObservation):
text = truncate_content(str(obs), max_message_chars)
message = Message(role='user', content=[TextContent(text=text)])
@@ -447,14 +455,14 @@ class ConversationMemory:
date = obs.date
if obs.runtime_hosts or obs.additional_agent_instructions:
runtime_info = RuntimeInfo(
runtime_info = ContextualInfo(
available_hosts=obs.runtime_hosts,
additional_agent_instructions=obs.additional_agent_instructions,
date=date,
custom_secrets_descriptions=obs.custom_secrets_descriptions,
)
else:
runtime_info = RuntimeInfo(
runtime_info = ContextualInfo(
date=date,
custom_secrets_descriptions=obs.custom_secrets_descriptions,
)
+21 -18
View File
@@ -22,7 +22,7 @@ from openhands.microagent import (
load_microagents_from_dir,
)
from openhands.runtime.base import Runtime
from openhands.utils.prompt import RepositoryInfo, RuntimeInfo
from openhands.utils.prompt import RepositoryInfo, ContextualInfo
GLOBAL_MICROAGENTS_DIR = os.path.join(
os.path.dirname(os.path.dirname(openhands.__file__)),
@@ -31,8 +31,8 @@ GLOBAL_MICROAGENTS_DIR = os.path.join(
class Memory:
"""
Memory is a component that listens to the EventStream for information retrieval actions
"""Memory is a component that listens to the EventStream for information retrieval actions.
(a RecallAction) and publishes observations with the content (such as RecallObservation).
"""
@@ -64,7 +64,7 @@ class Memory:
# Store repository / runtime info to send them to the templating later
self.repository_info: RepositoryInfo | None = None
self.runtime_info: RuntimeInfo | None = None
self.runtime_info: ContextualInfo | None = None
# Load global microagents (Knowledge + Repo)
# from typically OpenHands/microagents (i.e., the PUBLIC microagents)
@@ -131,7 +131,6 @@ class Memory:
This method collects information from all available repo microagents and concatenates their contents.
Multiple repo microagents are supported, and their contents will be concatenated with newlines between them.
"""
# Create WORKSPACE_CONTEXT info:
# - repository_info
# - runtime_info
@@ -180,6 +179,9 @@ class Memory:
custom_secrets_descriptions=self.runtime_info.custom_secrets_descriptions
if self.runtime_info is not None
else {},
context_message=self.runtime_info.context_message
if self.runtime_info and self.runtime_info.context_message is not None
else None,
)
return obs
return None
@@ -189,7 +191,6 @@ class Memory:
event: RecallAction,
) -> RecallObservation | None:
"""When a microagent action triggers microagents, create a RecallObservation with structured data."""
# Find any matched microagents based on the query
microagent_knowledge = self._find_microagent_knowledge(event.query)
@@ -235,8 +236,7 @@ class Memory:
def load_user_workspace_microagents(
self, user_microagents: list[BaseMicroagent]
) -> None:
"""
This method loads microagents from a user's cloned repo or workspace directory.
"""This method loads microagents from a user's cloned repo or workspace directory.
This is typically called from agent_session or setup once the workspace is cloned.
"""
@@ -250,9 +250,7 @@ class Memory:
self.repo_microagents[user_microagent.name] = user_microagent
def _load_global_microagents(self) -> None:
"""
Loads microagents from the global microagents_dir
"""
"""Loads microagents from the global microagents_dir."""
repo_agents, knowledge_agents = load_microagents_from_dir(
GLOBAL_MICROAGENTS_DIR
)
@@ -264,8 +262,7 @@ class Memory:
self.repo_microagents[name] = agent
def get_microagent_mcp_tools(self) -> list[MCPConfig]:
"""
Get MCP tools from all repo microagents (always active)
"""Get MCP tools from all repo microagents (always active).
Returns:
A list of MCP tools configurations from microagents
@@ -289,8 +286,11 @@ class Memory:
else:
self.repository_info = None
def set_runtime_info(
self, runtime: Runtime, custom_secrets_descriptions: dict[str, str]
def set_contextual_info(
self,
runtime: Runtime,
custom_secrets_descriptions: dict[str, str],
context_message: str | None = None,
) -> None:
"""Store runtime info (web hosts, ports, etc.)."""
# e.g. { '127.0.0.1': 8080 }
@@ -298,15 +298,18 @@ class Memory:
date = str(utc_now.date())
if runtime.web_hosts or runtime.additional_agent_instructions:
self.runtime_info = RuntimeInfo(
self.runtime_info = ContextualInfo(
available_hosts=runtime.web_hosts,
additional_agent_instructions=runtime.additional_agent_instructions,
date=date,
custom_secrets_descriptions=custom_secrets_descriptions,
context_message=context_message,
)
else:
self.runtime_info = RuntimeInfo(
date=date, custom_secrets_descriptions=custom_secrets_descriptions
self.runtime_info = ContextualInfo(
date=date,
custom_secrets_descriptions=custom_secrets_descriptions,
context_message=context_message,
)
def send_error_message(self, message_id: str, message: str):
+9 -1
View File
@@ -230,7 +230,7 @@ class IssueResolver:
"""Initialize the runtime for the agent.
This function is called before the runtime is used to run the agent.
Currently it does nothing.
It sets up git configuration and runs the setup script if it exists.
"""
logger.info('-' * 30)
logger.info('BEGIN Runtime Completion Fn')
@@ -257,6 +257,14 @@ class IssueResolver:
if not isinstance(obs, CmdOutputObservation) or obs.exit_code != 0:
raise RuntimeError(f'Failed to set git config.\n{obs}')
# Run setup script if it exists
logger.info('Checking for .openhands/setup.sh script...')
runtime.maybe_run_setup_script()
# Setup git hooks if they exist
logger.info('Checking for .openhands/pre-commit.sh script...')
runtime.maybe_setup_git_hooks()
async def complete_runtime(
self,
runtime: Runtime,
-1
View File
@@ -91,7 +91,6 @@ async def browse(
active_page_index=obs.get(
'active_page_index', -1
), # index of the active page
dom_object=obs.get('dom_object', {}), # DOM object
axtree_object=obs.get('axtree_object', {}), # accessibility tree object
extra_element_properties=obs.get('extra_element_properties', {}),
focused_element_bid=obs.get(
@@ -200,6 +200,11 @@ class LocalRuntime(ActionExecutionClient):
headless_mode,
)
#If there is an API key in the environment we use this in requests to the runtime
session_api_key = os.getenv("SESSION_API_KEY")
if session_api_key:
self.session.headers['X-Session-API-Key'] = session_api_key
@property
def action_execution_server_url(self) -> str:
return self.api_url
@@ -153,10 +153,18 @@ class JupyterPlugin(Plugin):
if not self.kernel.initialized:
await self.kernel.initialize()
# Execute the code and get structured output
output = await self.kernel.execute(action.code, timeout=action.timeout)
# Extract text content and image URLs from the structured output
text_content = output.get('text', '')
image_urls = output.get('images', [])
return IPythonRunCellObservation(
content=output,
content=text_content,
code=action.code,
image_urls=image_urls if image_urls else None,
)
async def run(self, action: Action) -> IPythonRunCellObservation:
@@ -139,7 +139,9 @@ class JupyterKernel:
stop=stop_after_attempt(3),
wait=wait_fixed(2),
) # type: ignore
async def execute(self, code: str, timeout: int = 120) -> str:
async def execute(
self, code: str, timeout: int = 120
) -> dict[str, list[str] | str]:
if not self.ws or self.ws.stream.closed():
await self._connect()
@@ -171,7 +173,7 @@ class JupyterKernel:
)
logging.info(f'Executed code in jupyter kernel:\n{res}')
outputs: list[str] = []
outputs: list[dict] = []
async def wait_for_messages() -> bool:
execution_done = False
@@ -194,17 +196,23 @@ class JupyterKernel:
if msg_type == 'error':
traceback = '\n'.join(msg_dict['content']['traceback'])
outputs.append(traceback)
outputs.append({'type': 'text', 'content': traceback})
execution_done = True
elif msg_type == 'stream':
outputs.append(msg_dict['content']['text'])
outputs.append(
{'type': 'text', 'content': msg_dict['content']['text']}
)
elif msg_type in ['execute_result', 'display_data']:
outputs.append(msg_dict['content']['data']['text/plain'])
outputs.append(
{
'type': 'text',
'content': msg_dict['content']['data']['text/plain'],
}
)
if 'image/png' in msg_dict['content']['data']:
# use markdone to display image (in case of large image)
outputs.append(
f'\n![image](data:image/png;base64,{msg_dict["content"]["data"]["image/png"]})\n'
)
# Store image data in structured format
image_url = f'data:image/png;base64,{msg_dict["content"]["data"]["image/png"]}'
outputs.append({'type': 'image', 'content': image_url})
elif msg_type == 'execute_reply':
execution_done = True
@@ -225,19 +233,28 @@ class JupyterKernel:
execution_done = await asyncio.wait_for(wait_for_messages(), timeout)
except asyncio.TimeoutError:
await interrupt_kernel()
return f'[Execution timed out ({timeout} seconds).]'
return {'text': f'[Execution timed out ({timeout} seconds).]', 'images': []}
if not outputs and execution_done:
ret = '[Code executed successfully with no output]'
# Process structured outputs
text_outputs = []
image_outputs = []
for output in outputs:
if output['type'] == 'text':
text_outputs.append(output['content'])
elif output['type'] == 'image':
image_outputs.append(output['content'])
if not text_outputs and execution_done:
text_content = '[Code executed successfully with no output]'
else:
ret = ''.join(outputs)
text_content = ''.join(text_outputs)
# Remove ANSI
ret = strip_ansi(ret)
# Remove ANSI from text content
text_content = strip_ansi(text_content)
if os.environ.get('DEBUG'):
logging.info(f'OUTPUT:\n{ret}')
return ret
# Return a dictionary with text content and image URLs
return {'text': text_content, 'images': image_outputs}
async def shutdown_async(self) -> None:
if self.kernel_id:
@@ -267,7 +284,9 @@ class ExecuteHandler(tornado.web.RequestHandler):
output = await self.jupyter_kernel.execute(code)
self.write(output)
# Set content type to JSON and return the structured output
self.set_header('Content-Type', 'application/json')
self.write(json_encode(output))
def make_app() -> tornado.web.Application:
@@ -6,10 +6,8 @@ import socketio
from openhands.core.config import AppConfig
from openhands.events.action import MessageAction
from openhands.events.event_store import EventStore
from openhands.server.config.server_config import ServerConfig
from openhands.server.data_models.agent_loop_info import AgentLoopInfo
from openhands.server.data_models.conversation_info import ConversationInfo
from openhands.server.monitoring import MonitoringListener
from openhands.server.session.conversation import Conversation
from openhands.storage.conversation.conversation_store import ConversationStore
@@ -83,6 +81,7 @@ class ConversationManager(ABC):
user_id: str | None,
initial_user_msg: MessageAction | None = None,
replay_json: str | None = None,
context_message: str | None = None,
) -> AgentLoopInfo:
"""Start an event loop if one is not already running"""
@@ -245,12 +245,13 @@ class StandaloneConversationManager(ConversationManager):
user_id: str | None,
initial_user_msg: MessageAction | None = None,
replay_json: str | None = None,
context_message: str | None = None,
) -> AgentLoopInfo:
logger.info(f'maybe_start_agent_loop:{sid}', extra={'session_id': sid})
session = self._local_agent_loops_by_sid.get(sid)
if not session:
session = await self._start_agent_loop(
sid, settings, user_id, initial_user_msg, replay_json
sid, settings, user_id, initial_user_msg, replay_json, context_message
)
return self._agent_loop_info_from_session(session)
@@ -261,6 +262,7 @@ class StandaloneConversationManager(ConversationManager):
user_id: str | None,
initial_user_msg: MessageAction | None = None,
replay_json: str | None = None,
context_message: str | None = None,
) -> Session:
logger.info(f'starting_agent_loop:{sid}', extra={'session_id': sid})
@@ -304,7 +306,9 @@ class StandaloneConversationManager(ConversationManager):
)
self._local_agent_loops_by_sid[sid] = session
asyncio.create_task(
session.initialize_agent(settings, initial_user_msg, replay_json)
session.initialize_agent(
settings, initial_user_msg, replay_json, context_message
)
)
# This does not get added when resuming an existing conversation
try:
@@ -475,17 +479,17 @@ class StandaloneConversationManager(ConversationManager):
continue
results.append(self._agent_loop_info_from_session(session))
return results
def _agent_loop_info_from_session(self, session: Session):
return AgentLoopInfo(
conversation_id=session.sid,
url=self._get_conversation_url(session.sid),
api_key=None,
session_api_key=None,
event_store=session.agent_session.event_stream,
)
def _get_conversation_url(self, conversation_id: str):
return f"/conversations/{conversation_id}"
return f'/conversations/{conversation_id}'
def _last_updated_at_key(conversation: ConversationMetadata) -> float:
@@ -10,5 +10,5 @@ class AgentLoopInfo:
"""
conversation_id: str
url: str | None
api_key: str | None
session_api_key: str | None
event_store: EventStoreABC
@@ -20,5 +20,5 @@ class ConversationInfo:
trigger: ConversationTrigger | None = None
num_connections: int = 0
url: str | None = None
api_key: str | None = None
session_api_key: str | None = None
created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
+5
View File
@@ -10,6 +10,7 @@ from openhands.server.middleware import (
InMemoryRateLimiter,
LocalhostCORSMiddleware,
RateLimitMiddleware,
SessionApiKeyMiddleware,
)
from openhands.server.static import SPAStaticFiles
@@ -32,4 +33,8 @@ base_app.add_middleware(
)
base_app.middleware('http')(AttachConversationMiddleware(base_app))
session_api_key = os.getenv('SESSION_API_KEY')
if session_api_key:
base_app.middleware('http')(SessionApiKeyMiddleware(session_api_key))
app = socketio.ASGIApp(sio, other_asgi_app=base_app)
+14
View File
@@ -1,4 +1,5 @@
import asyncio
import os
from types import MappingProxyType
from typing import Any
from urllib.parse import parse_qs
@@ -72,6 +73,9 @@ async def connect(connection_id: str, environ: dict) -> None:
logger.error('No conversation_id in query params')
raise ConnectionRefusedError('No conversation_id in query params')
if _invalid_session_api_key(query_params):
raise ConnectionRefusedError('invalid_session_api_key')
cookies_str = environ.get('HTTP_COOKIE', '')
# Get Authorization header from the environment
# Headers in WSGI/ASGI are prefixed with 'HTTP_' and have dashes replaced with underscores
@@ -160,3 +164,13 @@ async def oh_action(connection_id: str, data: dict[str, Any]) -> None:
async def disconnect(connection_id: str) -> None:
logger.info(f'sio:disconnect:{connection_id}')
await conversation_manager.disconnect_from_session(connection_id)
def _invalid_session_api_key(query_params: dict[str, list[Any]]):
session_api_key = os.getenv('SESSION_API_KEY')
if not session_api_key:
return False
query_api_keys = query_params['session_api_key']
if not query_api_keys:
return True
return query_api_keys[0] != session_api_key
+24
View File
@@ -1,5 +1,6 @@
import asyncio
from collections import defaultdict
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Any
from urllib.parse import urlparse
@@ -192,3 +193,26 @@ class AttachConversationMiddleware(SessionMiddlewareInterface):
await self._detach_session(request)
return response
@dataclass
class SessionApiKeyMiddleware:
"""Middleware which ensures that all requests contain a header with the token given"""
session_api_key: str
async def __call__(
self, request: Request, call_next: RequestResponseEndpoint
) -> Response:
if (
request.method != 'OPTIONS'
and request.url.path != '/alive'
and request.url.path != '/server_info'
):
if self.session_api_key != request.headers.get('X-Session-API-Key'):
return JSONResponse(
{'code': 'invalid_session_api_key'},
status_code=status.HTTP_401_UNAUTHORIZED,
)
response = await call_next(request)
return response
+25 -12
View File
@@ -61,6 +61,7 @@ class InitSessionRequest(BaseModel):
image_urls: list[str] | None = None
replay_json: str | None = None
suggested_task: SuggestedTask | None = None
context_message: str | None = None
model_config = {'extra': 'forbid'}
@@ -69,7 +70,7 @@ class InitSessionResponse(BaseModel):
status: str
conversation_id: str
conversation_url: str
api_key: str | None
session_api_key: str | None
message: str | None = None
@@ -84,6 +85,7 @@ async def _create_new_conversation(
replay_json: str | None,
conversation_trigger: ConversationTrigger = ConversationTrigger.GUI,
attach_convo_id: bool = False,
context_message: str | None = None,
) -> AgentLoopInfo:
logger.info(
'Creating conversation',
@@ -120,6 +122,7 @@ async def _create_new_conversation(
session_init_args['selected_repository'] = selected_repository
session_init_args['custom_secrets'] = custom_secrets
session_init_args['selected_branch'] = selected_branch
session_init_args['context_message'] = context_message
conversation_init_data = ConversationInitData(**session_init_args)
logger.info('Loading conversation store')
conversation_store = await ConversationStoreImpl.get_instance(config, user_id)
@@ -169,6 +172,7 @@ async def _create_new_conversation(
user_id,
initial_user_msg=initial_message_action,
replay_json=replay_json,
context_message=context_message,
)
logger.info(f'Finished initializing conversation {agent_loop_info.conversation_id}')
return agent_loop_info
@@ -195,6 +199,7 @@ async def new_conversation(
replay_json = data.replay_json
suggested_task = data.suggested_task
git_provider = data.git_provider
context_message = data.context_message
conversation_trigger = ConversationTrigger.GUI
@@ -222,13 +227,14 @@ async def new_conversation(
image_urls=image_urls,
replay_json=replay_json,
conversation_trigger=conversation_trigger,
context_message=context_message,
)
return InitSessionResponse(
status='ok',
conversation_id=agent_loop_info.conversation_id,
conversation_url=agent_loop_info.url,
api_key=agent_loop_info.api_key,
session_api_key=agent_loop_info.session_api_key,
)
except MissingSettingsError as e:
return JSONResponse(
@@ -287,19 +293,25 @@ async def search_conversations(
running_conversations = await conversation_manager.get_running_agent_loops(
user_id, conversation_ids
)
connection_ids_to_conversation_ids = await conversation_manager.get_connections(filter_to_sids=conversation_ids)
agent_loop_info = await conversation_manager.get_agent_loop_info(filter_to_sids=conversation_ids)
urls_by_conversation_id = {info.conversation_id: info.url for info in agent_loop_info}
connection_ids_to_conversation_ids = await conversation_manager.get_connections(
filter_to_sids=conversation_ids
)
agent_loop_info = await conversation_manager.get_agent_loop_info(
filter_to_sids=conversation_ids
)
agent_loop_info_by_conversation_id = {info.conversation_id: info for info in agent_loop_info}
result = ConversationInfoResultSet(
results=await wait_all(
_get_conversation_info(
conversation=conversation,
is_running=conversation.conversation_id in running_conversations,
num_connections=sum(
1 for conversation_id in connection_ids_to_conversation_ids.values()
1
for conversation_id in connection_ids_to_conversation_ids.values()
if conversation_id == conversation.conversation_id
),
url=urls_by_conversation_id.get(conversation.conversation_id),
agent_loop_info=agent_loop_info_by_conversation_id.get(conversation.conversation_id),
)
for conversation in filtered_results
),
@@ -317,9 +329,9 @@ async def get_conversation(
metadata = await conversation_store.get_metadata(conversation_id)
is_running = await conversation_manager.is_agent_loop_running(conversation_id)
num_connections = len(await conversation_manager.get_connections(filter_to_sids={conversation_id}))
agent_loop_info = await conversation_manager.get_agent_loop_info(filter_to_sids={conversation_id})
url = agent_loop_info[0].url if agent_loop_info else None
conversation_info = await _get_conversation_info(metadata, is_running, num_connections, url)
agent_loop_infos = await conversation_manager.get_agent_loop_info(filter_to_sids={conversation_id})
agent_loop_info = agent_loop_infos[0] if agent_loop_infos else None
conversation_info = await _get_conversation_info(metadata, is_running, num_connections, agent_loop_info)
return conversation_info
except FileNotFoundError:
return None
@@ -348,7 +360,7 @@ async def _get_conversation_info(
conversation: ConversationMetadata,
is_running: bool,
num_connections: int,
url: str | None,
agent_loop_info: AgentLoopInfo | None,
) -> ConversationInfo | None:
try:
title = conversation.title
@@ -365,7 +377,8 @@ async def _get_conversation_info(
ConversationStatus.RUNNING if is_running else ConversationStatus.STOPPED
),
num_connections=num_connections,
url=url,
url=agent_loop_info.url if agent_loop_info else None,
session_api_key=agent_loop_info.session_api_key if agent_loop_info else None,
)
except Exception as e:
logger.error(
+21 -7
View File
@@ -16,7 +16,11 @@ from openhands.core.schema.agent import AgentState
from openhands.events.action import ChangeAgentStateAction, MessageAction
from openhands.events.event import Event, EventSource
from openhands.events.stream import EventStream
from openhands.integrations.provider import CUSTOM_SECRETS_TYPE, PROVIDER_TOKEN_TYPE, ProviderHandler
from openhands.integrations.provider import (
CUSTOM_SECRETS_TYPE,
PROVIDER_TOKEN_TYPE,
ProviderHandler,
)
from openhands.mcp import add_mcp_tools_to_agent
from openhands.memory.memory import Memory
from openhands.microagent.microagent import BaseMicroagent
@@ -91,6 +95,7 @@ class AgentSession:
selected_branch: str | None = None,
initial_message: MessageAction | None = None,
replay_json: str | None = None,
context_message: str | None = None,
) -> None:
"""Starts the Agent session
Parameters:
@@ -116,7 +121,9 @@ class AgentSession:
finished = False # For monitoring
runtime_connected = False
custom_secrets_handler = UserSecrets(custom_secrets=custom_secrets if custom_secrets else {})
custom_secrets_handler = UserSecrets(
custom_secrets=custom_secrets if custom_secrets else {}
)
try:
self._create_security_analyzer(config.security.security_analyzer)
@@ -144,12 +151,13 @@ class AgentSession:
self.memory = await self._create_memory(
selected_repository=selected_repository,
repo_directory=repo_directory,
custom_secrets_descriptions=custom_secrets_handler.get_custom_secrets_descriptions()
custom_secrets_descriptions=custom_secrets_handler.get_custom_secrets_descriptions(),
context_message=context_message,
)
# NOTE: this needs to happen before controller is created
# so MCP tools can be included into the SystemMessageAction
if self.runtime and runtime_connected:
if self.runtime and runtime_connected and agent.config.enable_mcp:
await add_mcp_tools_to_agent(agent, self.runtime, self.memory, config.mcp)
if replay_json:
@@ -315,7 +323,7 @@ class AgentSession:
provider_tokens=git_provider_tokens
or cast(PROVIDER_TOKEN_TYPE, MappingProxyType({}))
)
# Merge git provider tokens with custom secrets before passing over to runtime
env_vars.update(await provider_handler.get_env_vars(expose_secrets=True))
self.runtime = runtime_cls(
@@ -415,7 +423,11 @@ class AgentSession:
return controller
async def _create_memory(
self, selected_repository: str | None, repo_directory: str | None, custom_secrets_descriptions: dict[str, str]
self,
selected_repository: str | None,
repo_directory: str | None,
custom_secrets_descriptions: dict[str, str],
context_message: str | None = None,
) -> Memory:
memory = Memory(
event_stream=self.event_stream,
@@ -425,7 +437,9 @@ class AgentSession:
if self.runtime:
# sets available hosts and other runtime info
memory.set_runtime_info(self.runtime, custom_secrets_descriptions)
memory.set_contextual_info(
self.runtime, custom_secrets_descriptions, context_message
)
# loads microagents from repo/.openhands/microagents
microagents: list[BaseMicroagent] = await call_sync_from_async(
@@ -14,6 +14,7 @@ class ConversationInitData(Settings):
selected_repository: str | None = Field(default=None)
replay_json: str | None = Field(default=None)
selected_branch: str | None = Field(default=None)
context_message: str | None = Field(default=None)
model_config = {
'arbitrary_types_allowed': True,
+3
View File
@@ -91,6 +91,7 @@ class Session:
settings: Settings,
initial_message: MessageAction | None,
replay_json: str | None,
context_message: str | None = None,
) -> None:
self.agent_session.event_stream.add_event(
AgentStateChangedObservation('', AgentState.LOADING),
@@ -160,6 +161,7 @@ class Session:
selected_repository = settings.selected_repository
selected_branch = settings.selected_branch
custom_secrets = settings.custom_secrets
context_message = settings.context_message
try:
await self.agent_session.start(
@@ -176,6 +178,7 @@ class Session:
selected_branch=selected_branch,
initial_message=initial_message,
replay_json=replay_json,
context_message=context_message,
)
except MicroagentValidationError as e:
self.logger.exception(f'Error creating agent_session: {e}')
+7 -7
View File
@@ -10,11 +10,12 @@ from openhands.events.observation.agent import MicroagentKnowledge
@dataclass
class RuntimeInfo:
class ContextualInfo:
date: str
available_hosts: dict[str, int] = field(default_factory=dict)
additional_agent_instructions: str = ''
custom_secrets_descriptions: dict[str, str] = field(default_factory=dict)
context_message: str | None = None
@dataclass
@@ -26,8 +27,7 @@ class RepositoryInfo:
class PromptManager:
"""
Manages prompt templates and includes information from the user's workspace micro-agents and global micro-agents.
"""Manages prompt templates and includes information from the user's workspace micro-agents and global micro-agents.
This class is dedicated to loading and rendering prompts (system prompt, user prompt).
@@ -58,8 +58,9 @@ class PromptManager:
return self.system_template.render().strip()
def get_example_user_message(self) -> str:
"""This is an initial user message that can be provided to the agent
before *actual* user instructions are provided.
"""This is an initial user message that can be provided to the agent.
Before *actual* user instructions are provided.
It can be used to provide a demonstration of how the agent
should behave in order to solve the user's task. And it may
@@ -67,13 +68,12 @@ class PromptManager:
These additional context will convert the current generic agent
into a more specialized agent that is tailored to the user's task.
"""
return self.user_template.render().strip()
def build_workspace_context(
self,
repository_info: RepositoryInfo | None,
runtime_info: RuntimeInfo | None,
runtime_info: ContextualInfo | None,
repo_instructions: str = '',
) -> str:
"""Renders the additional info template with the stored repository/runtime info."""
Generated
+45 -12
View File
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
[[package]]
name = "aiohappyeyeballs"
@@ -2871,7 +2871,7 @@ grpcio = {version = ">=1.49.1,<2.0dev", optional = true, markers = "python_versi
grpcio-status = {version = ">=1.49.1,<2.0.dev0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
proto-plus = [
{version = ">=1.25.0,<2.0.0dev", markers = "python_version >= \"3.13\""},
{version = ">=1.22.3,<2.0.0dev"},
{version = ">=1.22.3,<2.0.0dev", markers = "python_version < \"3.13\""},
]
protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0.dev0"
requests = ">=2.18.0,<3.0.0.dev0"
@@ -5144,14 +5144,14 @@ files = [
[[package]]
name = "mcp"
version = "1.7.1"
version = "1.9.0"
description = "Model Context Protocol SDK"
optional = false
python-versions = ">=3.10"
groups = ["main"]
files = [
{file = "mcp-1.7.1-py3-none-any.whl", hash = "sha256:f7e6108977db6d03418495426c7ace085ba2341b75197f8727f96f9cfd30057a"},
{file = "mcp-1.7.1.tar.gz", hash = "sha256:eb4f1f53bd717f75dda8a1416e00804b831a8f3c331e23447a03b78f04b43a6e"},
{file = "mcp-1.9.0-py3-none-any.whl", hash = "sha256:9dfb89c8c56f742da10a5910a1f64b0d2ac2c3ed2bd572ddb1cfab7f35957178"},
{file = "mcp-1.9.0.tar.gz", hash = "sha256:905d8d208baf7e3e71d70c82803b89112e321581bcd2530f9de0fe4103d28749"},
]
[package.dependencies]
@@ -5172,20 +5172,20 @@ ws = ["websockets (>=15.0.1)"]
[[package]]
name = "mcpm"
version = "1.9.0"
version = "1.12.0"
description = "MCPM - Model Context Protocol Manager"
optional = false
python-versions = ">=3.10"
groups = ["main"]
files = [
{file = "mcpm-1.9.0-py3-none-any.whl", hash = "sha256:fc9efe355329bef6a30d201668f9752d6fbc46f9d3a2affda8d45b9c5240475e"},
{file = "mcpm-1.9.0.tar.gz", hash = "sha256:97c112cb6d40e9bbcb4091c1db79da4eeda256bfa48083fa1f3abb260b814686"},
{file = "mcpm-1.12.0-py3-none-any.whl", hash = "sha256:ed3a87300420bcdb9cd12ef290179fda5bd51eb2f4cd3e793084d83eed91b249"},
{file = "mcpm-1.12.0.tar.gz", hash = "sha256:e9d2b852b90d7fd62dede584f035dd6b2b3d068d233e96b82aead835f81a911a"},
]
[package.dependencies]
click = ">=8.1.3"
duckdb = ">=1.2.2"
mcp = ">=1.6.0"
mcp = ">=1.8.0"
prompt-toolkit = ">=3.0.0"
psutil = ">=7.0.0"
pydantic = ">=2.5.1"
@@ -8574,7 +8574,7 @@ description = "C version of reader, parser and emitter for ruamel.yaml derived f
optional = false
python-versions = ">=3.9"
groups = ["main"]
markers = "platform_python_implementation == \"CPython\" and python_version == \"3.12\""
markers = "python_version < \"3.13\" and platform_python_implementation == \"CPython\""
files = [
{file = "ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl", hash = "sha256:11f891336688faf5156a36293a9c362bdc7c88f03a8a027c2c1d8e0bcde998e5"},
{file = "ruamel.yaml.clib-0.2.12-cp310-cp310-manylinux2014_aarch64.whl", hash = "sha256:a606ef75a60ecf3d924613892cc603b154178ee25abb3055db5062da811fd969"},
@@ -9966,7 +9966,7 @@ description = "A language and compiler for custom Deep Learning operations"
optional = false
python-versions = "*"
groups = ["evaluation"]
markers = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and python_version == \"3.12\""
markers = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and python_version < \"3.13\""
files = [
{file = "triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e1efef76935b2febc365bfadf74bcb65a6f959a9872e5bddf44cc9e0adce1e1a"},
{file = "triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ce8520437c602fb633f1324cc3871c47bee3b67acf9756c1a66309b60e3216c"},
@@ -10267,6 +10267,39 @@ platformdirs = ">=3.9.1,<5"
docs = ["furo (>=2023.7.26)", "proselint (>=0.13)", "sphinx (>=7.1.2,!=7.3)", "sphinx-argparse (>=0.4)", "sphinxcontrib-towncrier (>=0.2.1a0)", "towncrier (>=23.6)"]
test = ["covdefaults (>=2.3)", "coverage (>=7.2.7)", "coverage-enable-subprocess (>=1)", "flaky (>=3.7)", "packaging (>=23.1)", "pytest (>=7.4)", "pytest-env (>=0.8.2)", "pytest-freezer (>=0.4.8) ; platform_python_implementation == \"PyPy\" or platform_python_implementation == \"CPython\" and sys_platform == \"win32\" and python_version >= \"3.13\"", "pytest-mock (>=3.11.1)", "pytest-randomly (>=3.12)", "pytest-timeout (>=2.1)", "setuptools (>=68)", "time-machine (>=2.10) ; platform_python_implementation == \"CPython\""]
[[package]]
name = "visualswebench"
version = "0.0.0"
description = ""
optional = false
python-versions = ">=3.8"
groups = ["evaluation"]
files = []
develop = false
[package.dependencies]
beautifulsoup4 = "*"
chardet = "*"
datasets = "*"
docker = "*"
ghapi = "*"
GitPython = "*"
pre-commit = "*"
python-dotenv = "*"
requests = "*"
rich = "*"
tqdm = "*"
unidiff = "*"
[package.extras]
inference = ["anthropic", "flash_attn", "jedi", "peft", "protobuf", "sentencepiece", "tenacity", "tiktoken", "torch", "transformers", "triton"]
[package.source]
type = "git"
url = "https://github.com/luolin101/Visual-SWE-bench.git"
reference = "HEAD"
resolved_reference = "e12d06686202a778956bf4faa65330be23feb23e"
[[package]]
name = "watchdog"
version = "6.0.0"
@@ -11265,4 +11298,4 @@ cffi = ["cffi (>=1.11)"]
[metadata]
lock-version = "2.1"
python-versions = "^3.12,<3.14"
content-hash = "998d731b9d5cfa2239ef2019e95b681c44c70294f467cf9fdadbf2d551422ebe"
content-hash = "d45187a67e1326bd1d9c7c6a6c5cd9529688bdf95c34f611e7f9eeb8623c03a3"
+4 -3
View File
@@ -83,8 +83,8 @@ prompt-toolkit = "^3.0.50"
poetry = "^2.1.2"
anyio = "4.9.0"
pythonnet = "*"
mcp = "1.7.1"
mcpm = "1.9.0"
mcp = "1.9.0"
mcpm = "1.12.0"
[tool.poetry.group.dev.dependencies]
ruff = "0.11.10"
@@ -116,6 +116,7 @@ whatthepatch = "*"
retry = "*"
evaluate = "*"
swebench = "^3.0.8"
visualswebench = { git = "https://github.com/luolin101/Visual-SWE-bench.git" }
swegym = { git = "https://github.com/SWE-Gym/SWE-Bench-Package.git" }
commit0 = "*"
func_timeout = "*"
@@ -129,7 +130,7 @@ browsergym-webarena = "0.13.3"
browsergym-miniwob = "0.13.3"
browsergym-visualwebarena = "0.13.3"
boto3-stubs = { extras = [ "s3" ], version = "^1.37.19" }
pyarrow = "20.0.0" # transitive dependency, pinned here to avoid conflicts
pyarrow = "20.0.0" # transitive dependency, pinned here to avoid conflicts
datasets = "*"
[tool.poetry.scripts]
+65 -38
View File
@@ -1,17 +1,17 @@
import os
from unittest import mock
import pytest
from openhands.core.config import SandboxConfig
from openhands.events.action import CmdRunAction
from openhands.resolver.resolve_issue import IssueResolver
import openhands
def assert_sandbox_config(
config: SandboxConfig,
base_container_image = SandboxConfig.model_fields["base_container_image"].default,
runtime_container_image = f'ghcr.io/all-hands-ai/runtime:mock-nikolaik', # Default to mock version
local_runtime_url = SandboxConfig.model_fields["local_runtime_url"].default,
base_container_image=SandboxConfig.model_fields['base_container_image'].default,
runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik', # Default to mock version
local_runtime_url=SandboxConfig.model_fields['local_runtime_url'].default,
):
"""Helper function to assert the properties of the SandboxConfig object."""
assert isinstance(config, SandboxConfig)
@@ -22,6 +22,7 @@ def assert_sandbox_config(
assert config.timeout == 300
assert config.local_runtime_url == local_runtime_url
def test_setup_sandbox_config_default():
"""Test default configuration when no images provided and not experimental"""
with mock.patch('openhands.__version__', 'mock'):
@@ -32,23 +33,25 @@ def test_setup_sandbox_config_default():
)
assert_sandbox_config(
config,
runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik'
config, runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik'
)
def test_setup_sandbox_config_both_images():
"""Test that providing both container images raises ValueError"""
with pytest.raises(ValueError, match="Cannot provide both runtime and base container images."):
with pytest.raises(
ValueError, match='Cannot provide both runtime and base container images.'
):
IssueResolver._setup_sandbox_config(
base_container_image="base-image",
runtime_container_image="runtime-image",
base_container_image='base-image',
runtime_container_image='runtime-image',
is_experimental=False,
)
def test_setup_sandbox_config_base_only():
"""Test configuration when only base_container_image is provided"""
base_image = "custom-base-image"
base_image = 'custom-base-image'
config = IssueResolver._setup_sandbox_config(
base_container_image=base_image,
runtime_container_image=None,
@@ -56,25 +59,22 @@ def test_setup_sandbox_config_base_only():
)
assert_sandbox_config(
config,
base_container_image=base_image,
runtime_container_image=None
config, base_container_image=base_image, runtime_container_image=None
)
def test_setup_sandbox_config_runtime_only():
"""Test configuration when only runtime_container_image is provided"""
runtime_image = "custom-runtime-image"
runtime_image = 'custom-runtime-image'
config = IssueResolver._setup_sandbox_config(
base_container_image=None,
runtime_container_image=runtime_image,
is_experimental=False,
)
assert_sandbox_config(
config,
runtime_container_image=runtime_image
)
assert_sandbox_config(config, runtime_container_image=runtime_image)
def test_setup_sandbox_config_experimental():
"""Test configuration when experimental mode is enabled"""
with mock.patch('openhands.__version__', 'mock'):
@@ -84,40 +84,67 @@ def test_setup_sandbox_config_experimental():
is_experimental=True,
)
assert_sandbox_config(
config,
runtime_container_image=None
)
assert_sandbox_config(config, runtime_container_image=None)
@mock.patch("openhands.resolver.resolve_issue.os.getuid", return_value=0)
@mock.patch("openhands.resolver.resolve_issue.get_unique_uid", return_value=1001)
@mock.patch('openhands.resolver.resolve_issue.os.getuid', return_value=0)
@mock.patch('openhands.resolver.resolve_issue.get_unique_uid', return_value=1001)
def test_setup_sandbox_config_gitlab_ci(mock_get_unique_uid, mock_getuid):
"""Test GitLab CI specific configuration when running as root"""
with mock.patch('openhands.__version__', 'mock'):
with mock.patch.object(IssueResolver, "GITLAB_CI", True):
with mock.patch.object(IssueResolver, 'GITLAB_CI', True):
config = IssueResolver._setup_sandbox_config(
base_container_image=None,
runtime_container_image=None,
is_experimental=False,
)
assert_sandbox_config(
config,
local_runtime_url="http://localhost"
)
@mock.patch("openhands.resolver.resolve_issue.os.getuid", return_value=1000)
assert_sandbox_config(config, local_runtime_url='http://localhost')
@mock.patch('openhands.resolver.resolve_issue.os.getuid', return_value=1000)
def test_setup_sandbox_config_gitlab_ci_non_root(mock_getuid):
"""Test GitLab CI configuration when not running as root"""
with mock.patch('openhands.__version__', 'mock'):
with mock.patch.object(IssueResolver, "GITLAB_CI", True):
with mock.patch.object(IssueResolver, 'GITLAB_CI', True):
config = IssueResolver._setup_sandbox_config(
base_container_image=None,
runtime_container_image=None,
is_experimental=False,
)
assert_sandbox_config(
config,
local_runtime_url="http://localhost"
)
assert_sandbox_config(config, local_runtime_url='http://localhost')
@mock.patch('openhands.events.observation.CmdOutputObservation')
@mock.patch('openhands.runtime.base.Runtime')
def test_initialize_runtime_runs_setup_script_and_git_hooks(
mock_runtime, mock_cmd_output
):
"""Test that initialize_runtime calls maybe_run_setup_script and maybe_setup_git_hooks"""
# Create a minimal resolver instance with just the methods we need
class MinimalResolver:
def initialize_runtime(self, runtime):
# This is the method we're testing
action = CmdRunAction(command='git config --global core.pager ""')
runtime.run_action(action)
# Run setup script if it exists
runtime.maybe_run_setup_script()
# Setup git hooks if they exist
runtime.maybe_setup_git_hooks()
resolver = MinimalResolver()
# Mock the runtime's run_action method to return a successful CmdOutputObservation
mock_cmd_output.return_value.exit_code = 0
mock_runtime.run_action.return_value = mock_cmd_output.return_value
# Call the method
resolver.initialize_runtime(mock_runtime)
# Verify that both methods were called
mock_runtime.maybe_run_setup_script.assert_called_once()
mock_runtime.maybe_setup_git_hooks.assert_called_once()
+4
View File
@@ -57,6 +57,10 @@ def mock_agent():
agent.llm.metrics = Metrics()
agent.llm.config = AppConfig().get_llm_config()
# Add config with enable_mcp attribute
agent.config = MagicMock(spec=AgentConfig)
agent.config.enable_mcp = True
# Add a proper system message mock
system_message = SystemMessageAction(
content='Test system message', tools=['test_tool']
-162
View File
@@ -1,162 +0,0 @@
import asyncio
from unittest.mock import MagicMock, Mock
from uuid import uuid4
import pytest
from openhands.agenthub.codeact_agent.codeact_agent import CodeActAgent
from openhands.agenthub.readonly_agent.readonly_agent import ReadOnlyAgent
from openhands.controller.agent import Agent
from openhands.controller.agent_controller import AgentController
from openhands.controller.state.state import State
from openhands.core.config import AgentConfig, LLMConfig
from openhands.core.schema import AgentState
from openhands.events import EventSource, EventStream
from openhands.events.action import (
AgentDelegateAction,
AgentFinishAction,
MessageAction,
)
from openhands.events.observation import AgentDelegateObservation
from openhands.llm.llm import LLM
from openhands.llm.metrics import Metrics
from openhands.storage.memory import InMemoryFileStore
@pytest.fixture
def mock_event_stream():
"""Creates an event stream in memory."""
sid = f'test-{uuid4()}'
file_store = InMemoryFileStore({})
return EventStream(sid=sid, file_store=file_store)
@pytest.fixture
def mock_codeact_agent():
"""Creates a mock CodeActAgent for testing."""
agent = MagicMock(spec=CodeActAgent)
agent.name = 'CodeActAgent'
agent.llm = MagicMock(spec=LLM)
agent.llm.metrics = Metrics()
agent.llm.config = LLMConfig()
agent.config = AgentConfig()
# Add a proper system message mock
from openhands.events.action.message import SystemMessageAction
system_message = SystemMessageAction(content='Test system message for CodeActAgent')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
agent.get_system_message.return_value = system_message
return agent
@pytest.fixture
def mock_readonly_agent():
"""Creates a mock ReadOnlyAgent for testing."""
agent = MagicMock(spec=ReadOnlyAgent)
agent.name = 'ReadOnlyAgent'
agent.llm = MagicMock(spec=LLM)
agent.llm.metrics = Metrics()
agent.llm.config = LLMConfig()
agent.config = AgentConfig()
# Add a proper system message mock
from openhands.events.action.message import SystemMessageAction
system_message = SystemMessageAction(content='Test system message for ReadOnlyAgent')
system_message._source = EventSource.AGENT
system_message._id = -1 # Set invalid ID to avoid the ID check
agent.get_system_message.return_value = system_message
return agent
@pytest.mark.asyncio
async def test_agent_mode_toggle(mock_codeact_agent, mock_readonly_agent, mock_event_stream):
"""
Test that the agent mode toggle works correctly:
1. Start with CodeActAgent
2. Toggle to ReadOnlyAgent
3. Toggle back to CodeActAgent
"""
# Mock the agent class resolution so that AgentController can instantiate mock_readonly_agent
original_get_cls = Agent.get_cls
def mock_get_cls(agent_name):
if agent_name == 'ReadOnlyAgent':
return lambda llm, config: mock_readonly_agent
return original_get_cls(agent_name)
Agent.get_cls = Mock(side_effect=mock_get_cls)
# Create parent controller with CodeActAgent
parent_state = State(max_iterations=10)
parent_controller = AgentController(
agent=mock_codeact_agent,
event_stream=mock_event_stream,
max_iterations=10,
sid='parent',
confirmation_mode=False,
headless_mode=True,
initial_state=parent_state,
)
# Verify we're starting with CodeActAgent
assert parent_controller.agent.name == 'CodeActAgent'
assert parent_controller.delegate is None
# Create a delegate action to switch to ReadOnlyAgent
delegate_action = AgentDelegateAction(
agent='ReadOnlyAgent',
inputs={
'task': 'Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes.'
},
thought='Switching to read-only mode at user\'s request'
)
# Simulate the delegate action
await parent_controller._on_event(delegate_action)
# Give time for the async step() to execute
await asyncio.sleep(0.5)
# Verify that we've delegated to ReadOnlyAgent
assert parent_controller.delegate is not None
assert parent_controller.delegate.agent.name == 'ReadOnlyAgent'
# Simulate a user message to the ReadOnlyAgent
message_action = MessageAction(content='Show me the files in this directory')
message_action._source = EventSource.USER
await parent_controller.delegate._on_event(message_action)
# Give time for the async step() to execute
await asyncio.sleep(0.5)
# Now simulate switching back to CodeActAgent with a finish action
finish_action = AgentFinishAction(
final_thought='Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.',
task_completed=True,
outputs={'mode_switch': True}
)
# Send the finish action to the delegate
await parent_controller.delegate._on_event(finish_action)
# Give time for the async step() to execute
await asyncio.sleep(0.5)
# Verify that we're back to the parent CodeActAgent
assert parent_controller.delegate is None
assert parent_controller.agent.name == 'CodeActAgent'
# Verify that a delegate observation was added to the event stream
events = list(mock_event_stream.get_events())
assert any(isinstance(event, AgentDelegateObservation) for event in events)
# Cleanup
await parent_controller.close()
# Restore the original get_cls method
Agent.get_cls = original_get_cls
+1
View File
@@ -35,6 +35,7 @@ def mock_agent():
# Configure the agent config
agent_config.disabled_microagents = []
agent_config.enable_mcp = True
# Set up the chain of mocks
llm.metrics = metrics
+5 -5
View File
@@ -250,7 +250,7 @@ async def test_new_conversation_success(provider_handler_mock):
mock_create_conversation.return_value = MagicMock(
conversation_id='test_conversation_id',
url='https://my-conversation.com',
api_key=None,
session_api_key=None,
)
test_request = InitSessionRequest(
@@ -292,7 +292,7 @@ async def test_new_conversation_with_suggested_task(provider_handler_mock):
mock_create_conversation.return_value = MagicMock(
conversation_id='test_conversation_id',
url='https://my-conversation.com',
api_key=None,
session_api_key=None,
)
# Mock SuggestedTask.get_prompt_for_task
@@ -375,7 +375,7 @@ async def test_new_conversation_missing_settings(provider_handler_mock):
@pytest.mark.asyncio
async def test_new_conversation_invalid_api_key(provider_handler_mock):
async def test_new_conversation_invalid_session_api_key(provider_handler_mock):
"""Test creating a new conversation with an invalid API key."""
with _patch_store():
# Mock the _create_new_conversation function to raise LLMAuthenticationError
@@ -477,7 +477,7 @@ async def test_new_conversation_with_bearer_auth(provider_handler_mock):
mock_create_conversation.return_value = MagicMock(
conversation_id='test_conversation_id',
url='https://my-conversation.com',
api_key=None,
session_api_key=None,
)
# Create the request object
@@ -514,7 +514,7 @@ async def test_new_conversation_with_null_repository():
mock_create_conversation.return_value = MagicMock(
conversation_id='test_conversation_id',
url='https://my-conversation.com',
api_key=None,
session_api_key=None,
)
# Create the request object with null repository

Some files were not shown because too many files have changed in this diff Show More