Fix lint errors

Fix merge conflicts with main branch
chore(deps): bump the mcp-packages group with 2 updates (#8546 )
2026-04-29 03:00:45 -04:00 · 2025-05-19 16:55:19 +00:00 · 2025-05-19 16:53:24 +00:00 · 2025-05-19 18:37:11 +02:00 · 2025-05-19 09:59:22 -06:00 · 2025-05-19 15:49:53 +00:00
104 changed files with 3255 additions and 2758 deletions
@@ -0,0 +1,33 @@
+---
+name: documentation
+type: knowledge
+version: 1.0.0
+agent: CodeActAgent
+triggers:
+- documentation
+- docs
+- document
+---
+
+# Documentation Guidelines
+
+All documentation must be grounded in fact, so you must not make anything up without proper evidence. When you have finished writing documentation, convey to the user what reference source, including web pages, source code, or other sources of documentation you referenced when writing each new fact in the documentation. If you cannot reference a source for anything do not include it in the pull request.
+
+## Best Practices for Documentation
+
+1. **Be Factual**: Only include information that can be verified from reliable sources.
+2. **Cite Sources**: Always reference the source of information (code, web pages, official documentation).
+3. **Be Clear and Concise**: Use simple language and avoid unnecessary jargon.
+4. **Use Examples**: Include practical examples to illustrate concepts.
+5. **Structure Properly**: Use headings, lists, and code blocks to organize information.
+6. **Keep Updated**: Ensure documentation reflects the current state of the code or system.
+
+## Documentation Process
+
+1. Research and gather information from reliable sources
+2. Draft documentation based on verified facts
+3. Review for accuracy and completeness
+4. Include references for all factual statements
+5. Submit only when all information is properly sourced
+
+Remember: If you cannot verify a piece of information, it's better to exclude it than to include potentially incorrect information.
@@ -5,6 +5,7 @@ SHELL=/usr/bin/env bash
 BACKEND_HOST ?= "127.0.0.1"
 BACKEND_PORT = 3000
 BACKEND_HOST_PORT = "$(BACKEND_HOST):$(BACKEND_PORT)"
+FRONTEND_HOST ?= "127.0.0.1"
 FRONTEND_PORT = 3001
 DEFAULT_WORKSPACE_DIR = "./workspace"
 DEFAULT_MODEL = "gpt-4o"
@@ -288,6 +289,15 @@ setup-config-prompts:
 	@read -p "Enter your LLM base URL [mostly used for local LLMs, leave blank if not needed - example: http://localhost:5001/v1/]: " llm_base_url; \
 	 if [[ ! -z "$$llm_base_url" ]]; then echo "base_url=\"$$llm_base_url\"" >> $(CONFIG_FILE).tmp; fi

+setup-config-basic:
+	@printf '%s\n' \
+	'[core]' \
+	'workspace_base="./workspace"' \
+	> config.toml
+	@echo "$(GREEN)config.toml created.$(RESET)"
+
+openhands-cloud-run:
+	@$(MAKE) run BACKEND_HOST="0.0.0.0" BACKEND_PORT="12000" FRONTEND_HOST="0.0.0.0" FRONTEND_PORT="12001"

 # Develop in container
 docker-dev:
@@ -322,5 +332,4 @@ help:
 	@echo "  $(GREEN)help$(RESET)                - Display this help message, providing information on available targets."

 # Phony targets
-.PHONY: build check-dependencies check-python check-npm check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint start-backend start-frontend run run-wsl setup-config setup-config-prompts help
-.PHONY: docker-dev docker-run
+.PHONY: build check-dependencies check-system check-python check-npm check-nodejs check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint-backend lint-frontend lint test-frontend test build-frontend start-backend start-frontend _run_setup run run-wsl setup-config setup-config-prompts setup-config-basic openhands-cloud-run docker-dev docker-run clean help
@@ -1,249 +0,0 @@
-# Agent Mode Toggle Design Document
-
-## Overview
-
-This document outlines the design for implementing a toggle switch between "Read-only mode" and "Execute mode" in the OpenHands application. This feature will allow users to switch between a restricted ReadOnlyAgent that can only explore and analyze code, and the fully capable CodeActAgent that can modify code and execute commands.
-
-## Motivation
-
-Users often want to explore a codebase and discuss implementation details with the agent before making any changes. The ability to switch between read-only and execute modes provides several benefits:
-
-1. **Safety**: Users can ensure no changes are made during the exploration phase
-2. **Clarity**: Clear indication of the agent's current capabilities
-3. **Control**: Users decide when to transition from planning to execution
-4. **Workflow**: Supports a natural workflow of exploration → planning → implementation
-
-## Architecture
-
-The implementation will leverage the existing agent delegation mechanism in OpenHands. When a user toggles the switch:
-
-1. In **Execute Mode** (default): The application uses the standard CodeActAgent
-2. In **Read-only Mode**: The application delegates to a ReadOnlyAgent
-
-### Key Components
-
-#### Frontend
-
-1. **Toggle Switch Component**:
-   - UI element that shows the current mode and allows switching
-   - Sends appropriate actions to the event stream when toggled
-
-2. **Agent State Tracking**:
-   - Redux state to track current agent type and delegation status
-   - Event listeners to update state based on event stream
-
-3. **Visual Indicators**:
-   - Mode indicator showing current agent mode
-   - Visual styling differences between modes
-
-#### Backend
-
-1. **Agent Delegation**:
-   - Uses existing delegation mechanism to switch to ReadOnlyAgent
-   - User-initiated FinishAction to end delegation and return to CodeActAgent
-
-2. **Event Stream Integration**:
-   - AgentDelegateAction to start read-only mode
-   - AgentFinishAction to end read-only mode
-   - System messages to indicate mode changes
-
-## Implementation Details
-
-### Frontend Implementation
-
-#### Redux State Extension
-
-```typescript
-interface AgentState {
-  curAgentState: AgentState;
-  currentAgentType: string; // Track the agent type
-  isDelegated: boolean;     // Track if we're in a delegation
-  // other existing fields...
-}
-
-const initialState: AgentState = {
-  curAgentState: AgentState.IDLE,
-  currentAgentType: "CodeActAgent", // Default agent type
-  isDelegated: false,
-  // other initial values...
-};
-```
-
-#### Action Generators
-
-```typescript
-export const generateDelegateToReadOnlyAction = () => ({
-  action: ActionType.DELEGATE,
-  args: {
-    agent: "ReadOnlyAgent",
-    inputs: {
-      task: "Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes."
-    },
-    thought: "Switching to read-only mode at user's request"
-  }
-});
-
-export const generateFinishDelegationAction = () => ({
-  action: ActionType.FINISH,
-  args: {
-    message: "Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.",
-    task_completed: "true",
-    outputs: {
-      mode_switch: true
-    }
-  }
-});
-```
-
-#### Toggle Switch Component
-
-```tsx
-function AgentModeToggle() {
-  const { t } = useTranslation();
-  const dispatch = useDispatch();
-  const { send } = useWsClient();
-  
-  // Get agent type from Redux
-  const { currentAgentType, isDelegated } = useSelector((state: RootState) => state.agent);
-  
-  // Compute if we're in read-only mode
-  const isReadOnly = currentAgentType === "ReadOnlyAgent";
-  
-  const handleToggle = () => {
-    if (isReadOnly) {
-      // Currently in read-only mode, switch back to execute mode
-      send(generateFinishDelegationAction());
-    } else {
-      // Currently in execute mode, switch to read-only mode
-      send(generateDelegateToReadOnlyAction());
-    }
-  };
-  
-  return (
-    <div className="flex items-center gap-2">
-      <span className="text-sm font-medium">
-        {isReadOnly ? "Read-Only Mode" : "Execute Mode"}
-      </span>
-      <Switch 
-        checked={isReadOnly} 
-        onChange={handleToggle}
-        className={`${isReadOnly ? 'bg-amber-600' : 'bg-blue-600'} relative inline-flex h-6 w-11 items-center rounded-full`}
-      >
-        <span className="sr-only">Toggle agent mode</span>
-        <span
-          className={`${isReadOnly ? 'translate-x-6' : 'translate-x-1'} inline-block h-4 w-4 transform rounded-full bg-white transition`}
-        />
-      </Switch>
-    </div>
-  );
-}
-```
-
-#### Event Listener for State Updates
-
-```typescript
-function handleEvent(event) {
-  // Handle agent delegation events
-  if (event.action === ActionType.DELEGATE) {
-    // A delegation is starting
-    dispatch(setDelegationState(true));
-    dispatch(setAgentType(event.args.agent));
-  }
-  
-  // Handle agent delegate observation (delegation ended)
-  else if (event.observation === "delegate") {
-    // Delegation has ended, returning to parent agent
-    dispatch(setDelegationState(false));
-    dispatch(setAgentType("CodeActAgent")); // Reset to default agent
-  }
-  
-  // Handle other events...
-}
-```
-
-### Backend Considerations
-
-The backend implementation will leverage the existing agent delegation mechanism:
-
-1. When the user toggles to read-only mode:
-   - An AgentDelegateAction is sent to the event stream
-   - The AgentController creates a ReadOnlyAgent delegate
-   - All subsequent events are handled by the delegate
-
-2. When the user toggles back to execute mode:
-   - An AgentFinishAction is sent to the event stream
-   - The delegate agent finishes its task
-   - The parent AgentController resumes normal operation
-
-No backend code changes are required as we're using the existing delegation mechanism.
-
-## User Experience
-
-1. **Initial State**: The application starts in Execute Mode with CodeActAgent
-2. **Mode Switching**:
-   - User clicks the toggle switch to enter Read-only Mode
-   - System message indicates the mode change
-   - Agent capabilities are restricted to read-only tools
-   - UI shows visual indicators of the current mode
-   - User clicks the toggle switch again to return to Execute Mode
-   - System message indicates the return to full capabilities
-
-3. **Visual Indicators**:
-   - Toggle switch position (left/right)
-   - Color coding (amber for read-only, blue for execute)
-   - Mode label text
-   - System messages in the conversation
-
-## Future Enhancements
-
-1. **Persistent Mode Preference**: Remember the user's preferred starting mode
-2. **Context Preservation**: Improve context retention when switching modes
-3. **Custom Tool Sets**: Allow users to customize which tools are available in each mode
-4. **Mode-specific Prompts**: Optimize agent prompts for each mode
-
-## Implementation Plan
-
-1. **Frontend Implementation**:
-   - Add Redux state for agent type tracking ✅
-   - Create toggle switch component ✅
-   - Implement event listeners for state updates ✅
-   - Add visual indicators for current mode ✅
-   - Add notifications for mode changes ✅
-
-2. **Testing**:
-   - Test mode switching with various conversation states
-   - Verify proper tool restrictions in read-only mode
-   - Test persistence across page refreshes
-
-3. **Documentation**:
-   - Update user documentation to explain the mode toggle feature
-   - Add developer documentation for the implementation details ✅
-
-## Implementation Status
-
-The agent mode toggle feature has been implemented with the following components:
-
-1. **Redux State**:
-   - Added `currentAgentType` and `isDelegated` properties to the agent slice
-   - Default agent type is set to "CodeActAgent"
-
-2. **Agent Mode Service**:
-   - Created `agent-mode-service.ts` with action generators for delegation
-   - Implemented `generateDelegateToReadOnlyAction()` and `generateFinishDelegationAction()`
-
-3. **UI Components**:
-   - Created `AgentModeToggle` component with toggle switch UI
-   - Integrated toggle into the agent control bar
-   - Updated agent status bar to display current mode
-   - Added color coding (amber for read-only, blue for execute)
-
-4. **Event Handling**:
-   - Updated `use-handle-ws-events.ts` to process agent delegation events
-   - Added state updates when delegation starts/ends
-   - Added notifications to inform users of mode changes
-
-5. **Internationalization**:
-   - Added translations for all UI elements
-   - Supported multiple languages through i18n
-
-The implementation is complete and ready for testing. The feature allows users to seamlessly switch between read-only and execute modes during a conversation, with clear visual indicators and notifications of the current mode.
@@ -1,55 +0,0 @@
-# Agent Mode Toggle
-
-The Agent Mode Toggle feature allows you to switch between two different agent modes:
-
-1. **Execute Mode** (default): Full capabilities with the CodeActAgent, which can modify code and execute commands
-2. **Read-only Mode**: Restricted capabilities with the ReadOnlyAgent, which can only explore and analyze code
-
-## Why Use Different Modes?
-
- **Safety**: Ensure no changes are made during the exploration phase
- **Clarity**: Clear indication of the agent's current capabilities
- **Control**: Decide when to transition from planning to execution
- **Workflow**: Support a natural workflow of exploration → planning → implementation
-
-## How to Use
-
-1. **Toggle Switch**: Click the toggle switch in the agent control bar to switch between modes
-   - Blue toggle: Execute Mode (default)
-   - Amber toggle: Read-only Mode
-
-2. **Mode Indicators**:
-   - The current mode is displayed in the agent status bar
-   - System messages indicate when the mode changes
-
-## Available Tools in Each Mode
-
-### Execute Mode (CodeActAgent)
-
-All tools are available, including:
- File editing (`str_replace_editor`)
- Command execution (`execute_bash`)
- Python code execution (`execute_ipython_cell`)
- Web browsing (`browser`, `web_read`)
- Thinking and finishing (`think`, `finish`)
-
-### Read-only Mode (ReadOnlyAgent)
-
-Only non-destructive tools are available:
- File viewing (`view`)
- File searching (`grep`, `glob`)
- Web reading (`web_read`)
- Thinking and finishing (`think`, `finish`)
-
-## Best Practices
-
-1. **Start in Read-only Mode** for new codebases to safely explore without making changes
-2. **Switch to Execute Mode** when you're ready to implement changes
-3. **Return to Read-only Mode** when you want to explore different parts of the codebase
-
-## Technical Details
-
-The agent mode toggle uses OpenHands' agent delegation mechanism:
- When toggling to Read-only Mode, the system delegates to a ReadOnlyAgent
- When toggling back to Execute Mode, the delegation ends and returns to the CodeActAgent
- Context is preserved between mode switches
@@ -261,6 +261,7 @@ def get_config(
        enable_jupyter=False,
        enable_browsing=RUN_WITH_BROWSING,
        enable_llm_editor=False,
+        enable_mcp=False,
        condenser=metadata.condenser_config,
        enable_prompt_extensions=False,
    )
@@ -0,0 +1,172 @@
+# Visual SWE-Bench Evaluation with Docker Image
+
+This folder contains the evaluation harness that we built on top of the original [Visual SWE-Bench benchmark](https://multi-swe-bench.github.io/#/) ([paper](https://arxiv.org/abs/2412.17315)).
+
+The evaluation consists of three steps:
+
+1. Environment setup: [install python environment](../../README.md#development-environment), [configure LLM config](../../README.md#configure-openhands-and-your-llm), and [pull docker](#openhands-visual-swe-bench-instance-level-docker-support).
+2. [Run inference](#run-inference-on-visual-swe-bench-instances): Generate a edit patch for each Github issue.
+3. [Evaluate patches using Visual SWE-Bench docker](#evaluate-generated-patches).
+
+## Setup Environment and LLM Configuration
+
+Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
+
+## OpenHands Visual SWE-Bench Instance-level Docker Support
+
+OpenHands now support using the official evaluation docker for both **[inference](#run-inference-on-visual-swe-bench-instances) and [evaluation](#evaluate-generated-patches)**.
+This is now the default behavior.
+
+## Run Inference on Visual SWE-Bench Instances
+
+Make sure your Docker daemon is running, and you have ample disk space for the [instance-level docker image](#openhands-visual-swe-bench-instance-level-docker-support).
+
+When the `run_infer.sh` script is started, it will automatically pull the relevant Visual SWE-Bench images. For example, for instance ID `networkx__networkx-6503`, it will try to pull our pre-build docker image `sweb.eval.x86_64.networkx_s_networkx-6503` from DockerHub. This image will be used create an OpenHands runtime image where the agent will operate on.
+
+```bash
+./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers]
+
+# Example
+./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 133 30 1
+```
+
+where `model_config` is mandatory, and the rest are optional.
+
+- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
+LLM settings, as defined in your `config.toml`.
+- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version you would
+like to evaluate. It could also be a release tag like `0.6.2`.
+- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
+to `CodeActAgent`.
+- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By
+default, the script evaluates the entire Visual SWE-bench set (133 issues). Note:
+in order to use `eval_limit`, you must also set `agent`.
+- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By
+default, it is set to 30.
+- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
+default, it is set to 1.
+
+There are also two optional environment variables you can set.
+
+```bash
+export USE_HINT_TEXT=true # if you want to use hint text in the evaluation. Default to false. Ignore this if you are not sure.
+export USE_INSTANCE_IMAGE=true # if you want to use instance-level docker images. Default to true
+```
+
+Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,
+
+then your command would be:
+
+```bash
+./evaluation/benchmarks/visual_swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
+```
+
+### Specify a subset of tasks to run infer
+
+If you would like to specify a list of tasks you'd like to benchmark on, you could
+create a `config.toml` under `./evaluation/benchmarks/visual_swe_bench/` folder, and put a list
+attribute named `selected_ids`, e.g.
+
+```toml
+selected_ids = ['astropy__astropy-13838', 'matplotlib__matplotlib-21617', 'plotly__plotly.py-1966']
+```
+
+Then only these tasks (rows whose `instance_id` is in the above list) will be evaluated.
+In this case, `eval_limit` option applies to tasks that are in the `selected_ids` list.
+
+After running the inference, you will obtain a `output.jsonl` (by default it will be saved to `evaluation/evaluation_outputs`).
+
+## Evaluate Generated Patches
+
+### Download Docker Images
+
+**(Recommended for reproducibility)** If you have extra local space (e.g., 200GB), you can try pull the instance-level docker images we've prepared by running:
+
+```bash
+evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh instance
+```
+
+If you want to save disk space a bit, while speeding up the image pre-build process, you can pull the environment-level docker images:
+
+```bash
+evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh env
+```
+
+If you want to evaluate on the full SWE-Bench test set:
+
+```bash
+evaluation/benchmarks/visual_swe_bench/scripts/docker/pull_all_eval_docker.sh instance full
+```
+
+### Run evaluation
+
+With `output.jsonl` file, you can run `eval_infer.sh` to evaluate generated patches, and produce a fine-grained report.
+
+**This evaluation is performed using the official dockerized evaluation announced.**
+
+> If you want to evaluate existing results, you should first run this to clone existing outputs
+>
+>```bash
+>git clone https://huggingface.co/spaces/OpenHands/evaluation evaluation/evaluation_outputs
+>```
+
+NOTE, you should have already pulled the instance-level OR env-level docker images following [this section](#openhands-visual-swe-bench-instance-level-docker-support).
+
+Then you can run the following:
+
+```bash
+./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh $YOUR_OUTPUT_JSONL [instance_id]
+
+# Example
+./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh evaluation/evaluation_outputs/outputs/luolin101__Visual-SWE-bench-test/CodeActAgent/gpt-4-1106-preview_maxiter_50_N_v1.0/output.jsonl
+```
+
+The script now accepts optional arguments:
+
+- `instance_id`: Specify a single instance to evaluate (optional)
+
+For example, to evaluate a specific instance with a custom dataset and split:
+
+```bash
+./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh $YOUR_OUTPUT_JSONL instance_123
+```
+
+> You can also pass in a JSONL with SWE-Bench format to `./evaluation/benchmarks/visual_swe_bench/scripts/eval_infer.sh`, where each line is a JSON of `{"model_patch": "XXX", "model_name_or_path": "YYY", "instance_id": "ZZZ"}`.
+
+The final results will be saved to `evaluation/evaluation_outputs/outputs/visual_swe_bench/CodeActAgent/gpt-4-1106-preview_maxiter_50_N_v1.0/` with the following files/directory:
+
+- `README.md`: a report showing what are the instances that passed, failed, etc.
+- `report.json`: a JSON file that contains keys like `"resolved_ids"` pointing to instance IDs that are resolved by the agent.
+- `logs/`: a directory of test logs
+
+## Visualize Results
+
+First you need to clone `https://huggingface.co/spaces/OpenHands/evaluation` and add your own running results from openhands into the `outputs` of the cloned repo.
+
+```bash
+git clone https://huggingface.co/spaces/OpenHands/evaluation
+```
+
+**(optional) setup streamlit environment with conda**:
+
+```bash
+cd evaluation
+conda create -n streamlit python=3.10
+conda activate streamlit
+pip install -r requirements.txt
+```
+
+**run the visualizer**:
+Then, in a separate Python environment with `streamlit` library, you can run the following:
+
+```bash
+# Make sure you are inside the cloned `evaluation` repo
+conda activate streamlit # if you follow the optional conda env setup above
+streamlit app.py --server.port 8501 --server.address 0.0.0.0
+```
+
+Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
+
+## Submit your evaluation results
+
+You can start your own fork of [our huggingface evaluation outputs](https://huggingface.co/spaces/OpenHands/evaluation) and submit a PR of your evaluation results following the guide [here](https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-and-discussions).
@@ -0,0 +1,641 @@
+import asyncio
+import json
+import os
+import tempfile
+from typing import Any
+
+import pandas as pd
+import toml
+from datasets import load_dataset
+
+import openhands.agenthub
+from evaluation.benchmarks.swe_bench.resource.mapping import (
+    get_instance_resource_factor,
+)
+from evaluation.utils.shared import (
+    EvalException,
+    EvalMetadata,
+    EvalOutput,
+    assert_and_raise,
+    codeact_user_response,
+    get_default_sandbox_config_for_eval,
+    get_metrics,
+    is_fatal_evaluation_error,
+    make_metadata,
+    prepare_dataset,
+    reset_logger_for_multiprocessing,
+    run_evaluation,
+    update_llm_config_for_completions_logging,
+)
+from openhands.controller.state.state import State
+from openhands.core.config import (
+    AgentConfig,
+    AppConfig,
+    get_llm_config_arg,
+    get_parser,
+)
+from openhands.core.logger import openhands_logger as logger
+from openhands.core.main import create_runtime, run_controller
+from openhands.events.action import CmdRunAction, MessageAction
+from openhands.events.observation import CmdOutputObservation, ErrorObservation
+from openhands.events.serialization.event import event_to_dict
+from openhands.runtime.base import Runtime
+from openhands.utils.async_utils import call_async_from_sync
+from openhands.utils.shutdown_listener import sleep_if_should_continue
+
+USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
+RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'true'
+
+
+AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
+    'CodeActAgent': codeact_user_response,
+}
+
+
+def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
+    return f'{instance.repo}__{instance.version}'.replace('/', '__')
+
+
+def get_instruction(instance: pd.Series, metadata: EvalMetadata):
+    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
+    # Instruction based on Anthropic's official trajectory
+    # https://github.com/eschluntz/swe-bench-experiments/tree/main/evaluation/verified/20241022_tools_claude-3-5-sonnet-updated/trajs
+    instruction = (
+        '<uploaded_files>\n'
+        f'/workspace/{workspace_dir_name}\n'
+        '</uploaded_files>\n'
+        f"I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:\n\n"
+        f'<issue_description>\n'
+        f'{instance.problem_statement}\n'
+        '</issue_description>\n\n'
+        'Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?\n'
+        "I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!\n"
+        "Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.\n"
+        'Your task is to make the minimal changes to non-test files in the /workspace directory to ensure the <issue_description> is satisfied.\n'
+        'Follow these steps to resolve the issue:\n'
+        '1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.\n'
+        '2. Create a script to reproduce the error and execute it with `python <filename.py>` using the BashTool, to confirm the error\n'
+        '3. Edit the sourcecode of the repo to resolve the issue\n'
+        '4. Rerun your reproduce script and confirm that the error is fixed!\n'
+        '5. Think about edgecases, add comprehensive tests for them in your reproduce script, and run them to make sure your fix handles them as well\n'
+        f'6. Once you are done with the initial implementation, please carefully re-read the problem description and check the difference between the current code and the base commit {instance["base_commit"]}. Do you think that the issue has been completely and comprehensively solved? Write tests to check the correctness of the solution, specifically focusing on tests that may point out any remaining problems that are not yet solved. Run all of the tests in the repo and check if any of them fail, and if they do fix the code. Repeat this process of carefully reading the problem description and current implementation, testing, and fixing any problems until you are confident that the current implementation is correct. Find and run any tests in the repo that are related to:\n'
+        '   - The issue you are fixing\n'
+        '   - The files you modified\n'
+        '   - The functions you changed\n'
+        '   Make sure all these tests pass with your changes.\n'
+        "Your thinking should be thorough and so it's fine if it's very long.\n"
+    )
+
+    if RUN_WITH_BROWSING:
+        instruction += (
+            '<IMPORTANT!>\nYou SHOULD NEVER attempt to browse the web. </IMPORTANT!>\n'
+        )
+    return instruction
+
+
+# TODO: migrate all swe-bench docker to ghcr.io/openhands
+DOCKER_IMAGE_PREFIX = os.environ.get('EVAL_DOCKER_IMAGE_PREFIX', 'docker.io/xingyaoww/')
+logger.info(f'Using docker image prefix: {DOCKER_IMAGE_PREFIX}')
+
+
+def get_instance_docker_image(instance_id: str, official_image: bool = False) -> str:
+    image_name = 'sweb.eval.x86_64.' + instance_id
+    image_name = image_name.replace(
+        '__', '_s_'
+    )  # to comply with docker image naming convention
+    other_list = [
+        'plotly__plotly.py-4083',
+        'plotly__plotly.py-2600',
+        'plotly__plotly.py-2591',
+        'plotly__plotly.py-1966',
+        'networkx__networkx-6503',
+        'networkx__networkx-6098',
+        'networkx__networkx-5616',
+        'networkx__networkx-5354',
+        'networkx__networkx-5058',
+        'networkx__networkx-4378',
+        'networkx__networkx-3764',
+        'vega__altair-2785',
+        'vega__altair-1092',
+        'vega__altair-974',
+        'vega__altair-830',
+        'matplotlib__matplotlib-27754',
+        'matplotlib__matplotlib-26926',
+        'matplotlib__matplotlib-26788',
+        'matplotlib__matplotlib-26586',
+        'sympy__sympy-26941',
+        'mwaskom__seaborn-3458',
+        'mwaskom__seaborn-3454',
+    ]
+    if instance_id in other_list:
+        return ('docker.io/luolin101/'.rstrip('/') + '/' + image_name).lower()
+    return (DOCKER_IMAGE_PREFIX.rstrip('/') + '/' + image_name).lower()
+
+
+def get_config(
+    instance: pd.Series,
+    metadata: EvalMetadata,
+) -> AppConfig:
+    # We use a different instance image for the each instance of swe-bench eval
+    use_official_image = bool(
+        'verified' in metadata.dataset.lower() or 'lite' in metadata.dataset.lower()
+    )
+    base_container_image = get_instance_docker_image(
+        instance['instance_id'], use_official_image
+    )
+    logger.info(
+        f'Using instance container image: {base_container_image}. '
+        f'Please make sure this image exists. '
+        f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
+    )
+
+    sandbox_config = get_default_sandbox_config_for_eval()
+    sandbox_config.base_container_image = base_container_image
+    sandbox_config.enable_auto_lint = True
+    sandbox_config.use_host_network = False
+    # Add platform to the sandbox config to solve issue 4401
+    sandbox_config.platform = 'linux/amd64'
+    sandbox_config.remote_runtime_resource_factor = get_instance_resource_factor(
+        dataset_name=metadata.dataset,
+        instance_id=instance['instance_id'],
+    )
+
+    config = AppConfig(
+        default_agent=metadata.agent_class,
+        run_as_openhands=False,
+        max_iterations=metadata.max_iterations,
+        runtime=os.environ.get('RUNTIME', 'docker'),
+        sandbox=sandbox_config,
+        # do not mount workspace
+        workspace_base=None,
+        workspace_mount_path=None,
+    )
+    config.set_llm_config(
+        update_llm_config_for_completions_logging(
+            metadata.llm_config, metadata.eval_output_dir, instance['instance_id']
+        )
+    )
+    agent_config = AgentConfig(
+        enable_jupyter=False,
+        enable_browsing=RUN_WITH_BROWSING,
+        enable_llm_editor=False,
+        condenser=metadata.condenser_config,
+        enable_prompt_extensions=False,
+    )
+    config.set_agent_config(agent_config)
+    return config
+
+
+def initialize_runtime(
+    runtime: Runtime,
+    instance: pd.Series,  # this argument is not required
+):
+    """Initialize the runtime for the agent.
+
+    This function is called before the runtime is used to run the agent.
+    """
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Initialization Fn')
+    logger.info('-' * 30)
+    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
+    obs: CmdOutputObservation
+
+    # Set instance id
+    action = CmdRunAction(
+        command=f"""echo 'export SWE_INSTANCE_ID={instance['instance_id']}' >> ~/.bashrc && echo 'export PIP_CACHE_DIR=~/.cache/pip' >> ~/.bashrc && echo "alias git='git --no-pager'" >> ~/.bashrc"""
+    )
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0, f'Failed to export SWE_INSTANCE_ID: {str(obs)}'
+    )
+
+    action = CmdRunAction(command="""export USER=$(whoami); echo USER=${USER} """)
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to export USER: {str(obs)}')
+
+    # inject the init script
+    script_dir = os.path.dirname(__file__)
+
+    # inject the instance info
+    action = CmdRunAction(command='mkdir -p /swe_util/eval_data/instances')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to create /swe_util/eval_data/instances: {str(obs)}',
+    )
+
+    swe_instance_json_name = 'swe-bench-instance.json'
+    with tempfile.TemporaryDirectory() as temp_dir:
+        # Construct the full path for the desired file name within the temporary directory
+        temp_file_path = os.path.join(temp_dir, swe_instance_json_name)
+        # Write to the file with the desired name within the temporary directory
+        with open(temp_file_path, 'w') as f:
+            if not isinstance(instance, dict):
+                json.dump([instance.to_dict()], f)
+            else:
+                json.dump([instance], f)
+
+        # Copy the file to the desired location
+        runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')
+
+        # inject the instance swe entry
+        runtime.copy_to(
+            str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
+            '/swe_util/',
+        )
+
+    action = CmdRunAction(command='cat ~/.bashrc')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to cat ~/.bashrc: {str(obs)}')
+
+    action = CmdRunAction(command='source ~/.bashrc')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    if isinstance(obs, ErrorObservation):
+        logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
+    assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')
+
+    action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
+    )
+
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+
+    action = CmdRunAction(command='git reset --hard')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to git reset --hard: {str(obs)}')
+
+    action = CmdRunAction(
+        command='for remote_name in $(git remote); do git remote remove "${remote_name}"; done'
+    )
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to remove git remotes: {str(obs)}')
+
+    action = CmdRunAction(command='which python')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0 and 'testbed' in obs.content,
+        f'Expected to find python interpreter from testbed, but got: {str(obs)}',
+    )
+
+    logger.info('-' * 30)
+    logger.info('END Runtime Initialization Fn')
+    logger.info('-' * 30)
+
+
+def complete_runtime(
+    runtime: Runtime,
+    instance: pd.Series,  # this argument is not required, but it is used to get the workspace_dir_name
+) -> dict[str, Any]:
+    """Complete the runtime for the agent.
+
+    This function is called before the runtime is used to run the agent.
+    If you need to do something in the sandbox to get the correctness metric after
+    the agent has run, modify this function.
+    """
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Completion Fn')
+    logger.info('-' * 30)
+    obs: CmdOutputObservation
+    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
+
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    if obs.exit_code == -1:
+        # The previous command is still running
+        # We need to kill previous command
+        logger.info('The previous command is still running, trying to kill it...')
+        action = CmdRunAction(command='C-c')
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+        # Then run the command again
+        action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+        action.set_hard_timeout(600)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+
+    action = CmdRunAction(command='git config --global core.pager ""')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git config --global core.pager "": {str(obs)}',
+    )
+
+    # First check for any git repositories in subdirectories
+    action = CmdRunAction(command='find . -type d -name .git -not -path "./.git"')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to find git repositories: {str(obs)}',
+    )
+
+    git_dirs = [p for p in obs.content.strip().split('\n') if p]
+    if git_dirs:
+        # Remove all .git directories in subdirectories
+        for git_dir in git_dirs:
+            action = CmdRunAction(command=f'rm -rf "{git_dir}"')
+            action.set_hard_timeout(600)
+            logger.info(action, extra={'msg_type': 'ACTION'})
+            obs = runtime.run_action(action)
+            logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+            assert_and_raise(
+                isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+                f'Failed to remove git directory {git_dir}: {str(obs)}',
+            )
+
+    # add all files
+    action = CmdRunAction(command='git add -A')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git add -A: {str(obs)}',
+    )
+
+    n_retries = 0
+    git_patch = None
+    while n_retries < 5:
+        action = CmdRunAction(
+            command=f'git diff --no-color --cached {instance["base_commit"]}'
+        )
+        action.set_hard_timeout(max(300 + 100 * n_retries, 600))
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+        n_retries += 1
+        if isinstance(obs, CmdOutputObservation):
+            if obs.exit_code == 0:
+                git_patch = obs.content.strip()
+                break
+            else:
+                logger.info('Failed to get git diff, retrying...')
+                sleep_if_should_continue(10)
+        elif isinstance(obs, ErrorObservation):
+            logger.error(f'Error occurred: {obs.content}. Retrying...')
+            sleep_if_should_continue(10)
+        else:
+            assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+
+    assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
+
+    logger.info('-' * 30)
+    logger.info('END Runtime Completion Fn')
+    logger.info('-' * 30)
+    return {'git_patch': git_patch}
+
+
+def process_instance(
+    instance: pd.Series,
+    metadata: EvalMetadata,
+    reset_logger: bool = True,
+    runtime_failure_count: int = 0,
+) -> EvalOutput:
+    config = get_config(instance, metadata)
+
+    # Setup the logger properly, so you can run multi-processing to parallelize the evaluation
+    if reset_logger:
+        log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
+        reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
+    else:
+        logger.info(f'Starting evaluation for instance {instance.instance_id}.')
+
+    # Increase resource_factor with increasing attempt_id
+    if runtime_failure_count > 0:
+        config.sandbox.remote_runtime_resource_factor = min(
+            config.sandbox.remote_runtime_resource_factor * (2**runtime_failure_count),
+            8,
+        )
+        logger.warning(
+            f'This is the {runtime_failure_count + 1}th attempt for instance {instance.instance_id}, setting resource factor to {config.sandbox.remote_runtime_resource_factor}'
+        )
+    runtime = create_runtime(config)
+    call_async_from_sync(runtime.connect)
+
+    try:
+        initialize_runtime(runtime, instance)
+
+        instruction = get_instruction(instance, metadata)
+
+        # Here's how you can run the agent (similar to the `main` function) and get the final task state
+        state: State | None = asyncio.run(
+            run_controller(
+                config=config,
+                initial_user_action=MessageAction(content=instruction),
+                runtime=runtime,
+                fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
+                    metadata.agent_class
+                ],
+            )
+        )
+
+        # if fatal error, throw EvalError to trigger re-run
+        if is_fatal_evaluation_error(state.last_error):
+            raise EvalException('Fatal error detected: ' + state.last_error)
+
+        # ======= THIS IS SWE-Bench specific =======
+        # Get git patch
+        return_val = complete_runtime(runtime, instance)
+        git_patch = return_val['git_patch']
+        logger.info(
+            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
+        )
+    finally:
+        runtime.close()
+    # ==========================================
+
+    # ======= Attempt to evaluate the agent's edits =======
+    # we use eval_infer.sh to evaluate the agent's edits, not here
+    # because the agent may alter the environment / testcases
+    test_result = {
+        'git_patch': git_patch,
+    }
+
+    # If you are working on some simpler benchmark that only evaluates the final model output (e.g., in a MessageAction)
+    # You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
+    if state is None:
+        raise ValueError('State should not be None.')
+
+    # NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
+    histories = [event_to_dict(event) for event in state.history]
+    metrics = get_metrics(state)
+
+    # Save the output
+    output = EvalOutput(
+        instance_id=instance.instance_id,
+        instruction=instruction,
+        instance=instance.to_dict(),  # SWE Bench specific
+        test_result=test_result,
+        metadata=metadata,
+        history=histories,
+        metrics=metrics,
+        error=state.last_error if state and state.last_error else None,
+    )
+    return output
+
+
+def filter_dataset(dataset: pd.DataFrame, filter_column: str) -> pd.DataFrame:
+    file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.toml')
+    if os.path.exists(file_path):
+        with open(file_path, 'r') as file:
+            data = toml.load(file)
+            if 'selected_ids' in data:
+                selected_ids = data['selected_ids']
+                logger.info(
+                    f'Filtering {len(selected_ids)} tasks from "selected_ids"...'
+                )
+                subset = dataset[dataset[filter_column].isin(selected_ids)]
+                logger.info(f'Retained {subset.shape[0]} tasks after filtering')
+                return subset
+    skip_ids = os.environ.get('SKIP_IDS', '').split(',')
+    if len(skip_ids) > 0:
+        logger.info(f'Filtering {len(skip_ids)} tasks from "SKIP_IDS"...')
+        return dataset[~dataset[filter_column].isin(skip_ids)]
+    return dataset
+
+
+# A list of instances that are known to be tricky to infer
+# (will cause runtime failure even with resource factor = 8)
+SWEGYM_EXCLUDE_IDS = [
+    'dask__dask-10422',
+    'pandas-dev__pandas-50548',
+    'pandas-dev__pandas-53672',
+    'pandas-dev__pandas-54174',
+    'pandas-dev__pandas-55518',
+    'pandas-dev__pandas-58383',
+    'pydata__xarray-6721',
+    'pytest-dev__pytest-10081',
+    'pytest-dev__pytest-7236',
+]
+
+if __name__ == '__main__':
+    parser = get_parser()
+    parser.add_argument(
+        '--dataset',
+        type=str,
+        default='princeton-nlp/SWE-bench',
+        help='data set to evaluate on, either full-test or lite-test',
+    )
+    parser.add_argument(
+        '--split',
+        type=str,
+        default='test',
+        help='split to evaluate on',
+    )
+    args, _ = parser.parse_known_args()
+
+    # NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
+    # so we don't need to manage file uploading to OpenHands's repo
+    dataset = load_dataset(args.dataset, split=args.split)
+    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
+    logger.info(
+        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
+    )
+    if 'SWE-Gym' in args.dataset:
+        swe_bench_tests = swe_bench_tests[
+            ~swe_bench_tests['instance_id'].isin(SWEGYM_EXCLUDE_IDS)
+        ]
+        logger.info(
+            f'{len(swe_bench_tests)} tasks left after excluding SWE-Gym excluded tasks'
+        )
+
+    llm_config = None
+    if args.llm_config:
+        llm_config = get_llm_config_arg(args.llm_config)
+        llm_config.log_completions = True
+        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        llm_config.modify_params = False
+
+    if llm_config is None:
+        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
+
+    details = {}
+    _agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
+
+    dataset_descrption = (
+        args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
+    )
+    metadata = make_metadata(
+        llm_config,
+        dataset_descrption,
+        args.agent_cls,
+        args.max_iterations,
+        args.eval_note,
+        args.eval_output_dir,
+        details=details,
+    )
+
+    output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
+    print(f'### OUTPUT FILE: {output_file} ###')
+    instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
+
+    if len(instances) > 0 and not isinstance(
+        instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
+    ):
+        for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
+            instances[col] = instances[col].apply(lambda x: str(x))
+
+    run_evaluation(
+        instances,
+        metadata,
+        output_file,
+        args.eval_num_workers,
+        process_instance,
+        timeout_seconds=8 * 60 * 60,  # 8 hour PER instance should be more than enough
+        max_retries=5,
+    )
@@ -0,0 +1,157 @@
+xingyaoww/sweb.eval.x86_64.astropy_s_astropy-11693:latest
+xingyaoww/sweb.eval.x86_64.astropy_s_astropy-13838:latest
+xingyaoww/sweb.eval.x86_64.astropy_s_astropy-14295:latest
+xingyaoww/sweb.eval.x86_64.astropy_s_astropy-8292:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13908:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13980:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13983:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-13984:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-14043:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-14623:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-19763:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20470:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20518:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20584:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20761:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-20826:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21443:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21490:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21550:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21568:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-21617:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22865:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22871:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-22931:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23047:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23111:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-23412:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24088:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24177:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24189:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24570:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24691:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24749:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24768:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24849:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24870:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-24971:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25287:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25334:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25340:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25346:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25405:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25499:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25565:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25640:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25667:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25779:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-26078:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-26466:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2576:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2846:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-2979:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3180:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3187:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3202:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3216:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3217:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3276:latest
+xingyaoww/sweb.eval.x86_64.mwaskom_s_seaborn-3394:latest
+xingyaoww/sweb.eval.x86_64.pydata_s_xarray-4182:latest
+xingyaoww/sweb.eval.x86_64.pydata_s_xarray-5682:latest
+xingyaoww/sweb.eval.x86_64.pylint-dev_s_pylint-4551:latest
+xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-13087:latest
+xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-13618:latest
+xingyaoww/sweb.eval.x86_64.scikit-learn_s_scikit-learn-14067:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10048:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10097:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10191:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-10435:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-11266:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-11502:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-7615:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-7757:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8028:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8056:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8075:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8120:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8265:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8278:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8620:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8621:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8638:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-8658:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9229:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9230:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9289:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9320:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9350:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9464:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9673:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9698:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9797:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9982:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9987:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9997:latest
+xingyaoww/sweb.eval.x86_64.sphinx-doc_s_sphinx-9999:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-11787:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-11788:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-13264:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-13840:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15151:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15304:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15625:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-15976:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-16003:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-17067:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-17115:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-18922:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-21769:latest
+xingyaoww/sweb.eval.x86_64.sympy_s_sympy-24723:latest
+luolin101/sweb.eval.x86_64.plotly_s_plotly.py-4083:latest
+luolin101/sweb.eval.x86_64.plotly_s_plotly.py-2600:latest
+luolin101/sweb.eval.x86_64.plotly_s_plotly.py-2591:latest
+luolin101/sweb.eval.x86_64.plotly_s_plotly.py-1966:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-6503:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-6098:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-5616:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-5354:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-5058:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-4378:latest
+luolin101/sweb.eval.x86_64.networkx_s_networkx-3764:latest
+luolin101/sweb.eval.x86_64.vega_s_altair-2785:latest
+luolin101/sweb.eval.x86_64.vega_s_altair-1092:latest
+luolin101/sweb.eval.x86_64.vega_s_altair-974:latest
+luolin101/sweb.eval.x86_64.vega_s_altair-830:latest
+luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-27754:latest
+luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26926:latest
+luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26788:latest
+luolin101/sweb.eval.x86_64.matplotlib_s_matplotlib-26586:latest
+luolin101/sweb.eval.x86_64.sympy_s_sympy-26941:latest
+luolin101/sweb.eval.x86_64.mwaskom_s_seaborn-3458:latest
+luolin101/sweb.eval.x86_64.mwaskom_s_seaborn-3454:latest
+xingyaoww/sweb.eval.x86_64.matplotlib_s_matplotlib-25631:latest
+xingyaoww/sweb.env.x86_64.428468730904ff6b4232aa:latest
+xingyaoww/sweb.env.x86_64.89a9e6df7ab7bcb9e010c8:latest
+xingyaoww/sweb.env.x86_64.15374367de368534f261e3:latest
+xingyaoww/sweb.env.x86_64.6b007979cf533f0f3016e8:latest
+xingyaoww/sweb.env.x86_64.b382c45e0a94d34ef0fc86:latest
+xingyaoww/sweb.env.x86_64.7037e8c448a4b8ebfe9b13:latest
+xingyaoww/sweb.env.x86_64.31244378a92e3bcce809ac:latest
+xingyaoww/sweb.env.x86_64.efa6065ed5bf204410fd53:latest
+xingyaoww/sweb.env.x86_64.a0efca7a0fe6719dbf65c2:latest
+xingyaoww/sweb.env.x86_64.502d8fc6ebccd881244091:latest
+luolin101/sweb.env.x86_64.eb002359cfcbe2edb56088:latest
+xingyaoww/sweb.env.x86_64.d905bb51fb68acc5d4221b:latest
+xingyaoww/sweb.env.x86_64.aa92880033da20ca313928:latest
+luolin101/sweb.env.x86_64.c6d251a05e0af7688b64fd:latest
+xingyaoww/sweb.env.x86_64.c795f4b88616b8462021ed:latest
+luolin101/sweb.env.x86_64.1e5a06e76ee016d067d77e:latest
+luolin101/sweb.env.x86_64.2e03d8e4d4bd373937a9ef:latest
+luolin101/sweb.env.x86_64.4c16026920d27ea78f3b7a:latest
+luolin101/sweb.env.x86_64.d15120dfdbda9831e9646b:latest
+luolin101/sweb.env.x86_64.c581ba273c3275679773dd:latest
+luolin101/sweb.env.x86_64.dc800a1bbe275c5de0c4aa:latest
+luolin101/sweb.env.x86_64.59bd7d84a0939c7caba7e6:latest
+xingyaoww/sweb.env.x86_64.0d80c7dec81ee2f2f513e2:latest
+xingyaoww/sweb.base.x86_64:latest
@@ -0,0 +1,62 @@
+#!/bin/bash
+set -e
+
+LEVEL=$1
+# three levels:
+# - base, keyword "sweb.base"
+# - env, keyword "sweb.env"
+# - instance, keyword "sweb.eval"
+SET=$2
+
+if [ -z "$LEVEL" ]; then
+    echo "Usage: $0 <cache_level> <set>"
+    echo "cache_level: base, env, or instance"
+    echo "set: lite, full"
+    exit 1
+fi
+
+if [ -z "$SET" ]; then
+    echo "Usage: $0 <cache_level> <set>"
+    echo "cache_level: base, env, or instance"
+    echo "set: lite, full, default is lite"
+    SET="lite"
+fi
+
+
+if [ "$SET" == "full" ]; then
+    IMAGE_FILE="$(dirname "$0")/all-visualswebench-full-instance-images.txt"
+else
+    IMAGE_FILE="$(dirname "$0")/all-visualswebench-full-instance-images.txt"
+fi
+
+# Define a pattern based on the level
+case $LEVEL in
+    base)
+        PATTERN="sweb.base"
+        ;;
+    env)
+        PATTERN="sweb.base\|sweb.env"
+        ;;
+    instance)
+        PATTERN="sweb.base\|sweb.env\|sweb.eval"
+        ;;
+    *)
+        echo "Invalid cache level: $LEVEL"
+        echo "Valid levels are: base, env, instance"
+        exit 1
+        ;;
+esac
+
+echo "Pulling docker images for [$LEVEL] level"
+
+echo "Pattern: $PATTERN"
+echo "Image file: $IMAGE_FILE"
+
+# Read each line from the file, filter by pattern, and pull the docker image
+grep "$PATTERN" "$IMAGE_FILE" | while IFS= read -r image; do
+    echo "Pulling $image into $image"
+    docker pull $image
+    # replace _s_ to __ in the image name
+    renamed_image=$(echo "$image" | sed 's|.*/||; s/_s_/__/g')
+    docker tag $image $renamed_image
+done
@@ -0,0 +1,141 @@
+#!/bin/bash
+
+PROCESS_FILEPATH=$1
+if [ -z "$PROCESS_FILEPATH" ]; then
+    echo "Error: PROCESS_FILEPATH is empty. Usage: ./eval_infer.sh <output_file> [instance_id] [dataset_name] [split]"
+    exit 1
+fi
+
+if [ ! -f $PROCESS_FILEPATH ]; then
+    echo "Error: $PROCESS_FILEPATH is not a file"
+    exit 1
+fi
+
+# If instance_id is empty, it means we want to eval on the whole $PROCESS_FILEPATH
+# otherwise, we want to eval on the instance_id
+INSTANCE_ID=$2
+DATASET_NAME=${3:-"luolin101/Visual-SWE-bench"}
+SPLIT=${4:-"test"}
+
+echo "INSTANCE_ID: $INSTANCE_ID"
+echo "DATASET_NAME: $DATASET_NAME"
+echo "SPLIT: $SPLIT"
+
+PROCESS_FILEPATH=$(realpath $PROCESS_FILEPATH)
+FILE_DIR=$(dirname $PROCESS_FILEPATH)
+FILE_NAME=$(basename $PROCESS_FILEPATH)
+
+echo "Evaluating $FILE_NAME @ $FILE_DIR"
+
+# ================================================
+# detect whether PROCESS_FILEPATH is in OH format or in SWE-bench format
+echo "=============================================================="
+echo "Detecting whether PROCESS_FILEPATH is in OH format or in SWE-bench format"
+echo "=============================================================="
+# SWE-bench format is a JSONL where every line has three fields: model_name_or_path, instance_id, and model_patch
+function is_swebench_format() {
+    # Read the first line of the file
+    read -r first_line < "$PROCESS_FILEPATH"
+
+    # Use jq to check if the first line has the required fields
+    echo "$first_line" | jq -e '. | has("model_name_or_path") and has("instance_id") and has("model_patch")' > /dev/null
+
+    if [ $? -ne 0 ]; then
+        return 1 # Return 1 if the first line does not have the required fields
+    fi
+
+    return 0 # Return 0 if the first line has the required fields
+}
+# Call the function with the file path
+is_swebench_format "$PROCESS_FILEPATH"
+IS_SWEBENCH_FORMAT=$?
+# Use the result in an if-else statement
+if [ $IS_SWEBENCH_FORMAT -eq 0 ]; then
+    echo "The file IS in SWE-bench format."
+    SWEBENCH_FORMAT_JSONL=$PROCESS_FILEPATH
+else
+    echo "The file IS NOT in SWE-bench format."
+
+    # ==== Convert OH format to SWE-bench format ====
+    echo "Merged output file with fine-grained report will be saved to $FILE_DIR"
+    poetry run python3 evaluation/benchmarks/swe_bench/scripts/eval/convert_oh_output_to_swe_json.py $PROCESS_FILEPATH
+    # replace .jsonl with .swebench.jsonl in filename
+    SWEBENCH_FORMAT_JSONL=${PROCESS_FILEPATH/.jsonl/.swebench.jsonl}
+    echo "SWEBENCH_FORMAT_JSONL: $SWEBENCH_FORMAT_JSONL"
+    # assert that the file exists
+    if [ ! -f $SWEBENCH_FORMAT_JSONL ]; then
+        echo "Error: $SWEBENCH_FORMAT_JSONL does not exist. There is probably an error in the conversion process."
+        exit 1
+    fi
+    SWEBENCH_FORMAT_JSONL=$(realpath $SWEBENCH_FORMAT_JSONL)
+fi
+# ================================================
+
+echo "=============================================================="
+echo "Running SWE-bench evaluation"
+echo "=============================================================="
+
+RUN_ID=$(date +"%Y%m%d_%H%M%S")
+N_PROCESS=16
+
+if [ -z "$INSTANCE_ID" ]; then
+    echo "Running SWE-bench evaluation on the whole input file..."
+    # Default to SWE-Bench-lite
+    # change `--dataset_name` and `--split` to alter dataset
+
+    poetry run python -m visualswebench.harness.run_evaluation \
+        --dataset_name "$DATASET_NAME" \
+        --split "$SPLIT" \
+        --predictions_path $SWEBENCH_FORMAT_JSONL \
+        --timeout 1800 \
+        --cache_level instance \
+        --max_workers $N_PROCESS \
+        --run_id $RUN_ID
+
+    # get the "model_name_or_path" from the first line of the SWEBENCH_FORMAT_JSONL
+    MODEL_NAME_OR_PATH=$(jq -r '.model_name_or_path' $SWEBENCH_FORMAT_JSONL | head -n 1)
+    echo "MODEL_NAME_OR_PATH: $MODEL_NAME_OR_PATH"
+
+    RESULT_OUTPUT_DIR=$(dirname $SWEBENCH_FORMAT_JSONL)
+    echo "RESULT_OUTPUT_DIR: $RESULT_OUTPUT_DIR"
+
+    # move the eval results to the target directory
+    mkdir -p $RESULT_OUTPUT_DIR
+    # rm eval_outputs directory if it exists
+    if [ -d $RESULT_OUTPUT_DIR/eval_outputs ]; then
+        rm -rf $RESULT_OUTPUT_DIR/eval_outputs
+    fi
+
+    mv logs/run_evaluation/$RUN_ID/$MODEL_NAME_OR_PATH $RESULT_OUTPUT_DIR
+    mv $RESULT_OUTPUT_DIR/$MODEL_NAME_OR_PATH $RESULT_OUTPUT_DIR/eval_outputs
+    echo "RUN_ID: $RUN_ID" > $RESULT_OUTPUT_DIR/run_id.txt
+
+    # move report file
+    REPORT_PATH=$MODEL_NAME_OR_PATH.$RUN_ID.json
+    if [ -f $REPORT_PATH ]; then
+        # check if $RESULT_OUTPUT_DIR/report.json exists
+        if [ -f $RESULT_OUTPUT_DIR/report.json ]; then
+            echo "Report file $RESULT_OUTPUT_DIR/report.json already exists. Overwriting..."
+            if [ -f $RESULT_OUTPUT_DIR/report.json.bak ]; then
+                rm $RESULT_OUTPUT_DIR/report.json.bak
+            fi
+            mv $RESULT_OUTPUT_DIR/report.json $RESULT_OUTPUT_DIR/report.json.bak
+        fi
+
+        mv $REPORT_PATH $RESULT_OUTPUT_DIR/report.json
+    fi
+
+    poetry run python evaluation/benchmarks/swe_bench/scripts/eval/update_output_with_eval.py $PROCESS_FILEPATH
+
+else
+    echo "Running SWE-bench evaluation on the instance_id: $INSTANCE_ID"
+    poetry run python -m visualswebench.harness.run_evaluation \
+        --dataset_name "$DATASET_NAME" \
+        --split "$SPLIT" \
+        --predictions_path $SWEBENCH_FORMAT_JSONL \
+        --timeout 1800 \
+        --instance_ids $INSTANCE_ID \
+        --cache_level instance \
+        --max_workers $N_PROCESS \
+        --run_id $RUN_ID
+fi
@@ -0,0 +1,117 @@
+#!/bin/bash
+set -eo pipefail
+
+source "evaluation/utils/version_control.sh"
+
+MODEL_CONFIG=$1
+COMMIT_HASH=$2
+AGENT=$3
+EVAL_LIMIT=$4
+MAX_ITER=$5
+NUM_WORKERS=$6
+DATASET=$7
+SPLIT=$8
+N_RUNS=$9
+
+if [ -z "$NUM_WORKERS" ]; then
+  NUM_WORKERS=1
+  echo "Number of workers not specified, use default $NUM_WORKERS"
+fi
+checkout_eval_branch
+
+if [ -z "$AGENT" ]; then
+  echo "Agent not specified, use default CodeActAgent"
+  AGENT="CodeActAgent"
+fi
+
+if [ -z "$MAX_ITER" ]; then
+  echo "MAX_ITER not specified, use default 100"
+  MAX_ITER=100
+fi
+
+if [ -z "$USE_INSTANCE_IMAGE" ]; then
+  echo "USE_INSTANCE_IMAGE not specified, use default true"
+  USE_INSTANCE_IMAGE=true
+fi
+
+if [ -z "$RUN_WITH_BROWSING" ]; then
+  echo "RUN_WITH_BROWSING not specified, use default false"
+  RUN_WITH_BROWSING=false
+fi
+
+
+if [ -z "$DATASET" ]; then
+  echo "DATASET not specified, use default luolin101/Visual-SWE-bench"
+  DATASET="luolin101/Visual-SWE-bench"
+fi
+
+if [ -z "$SPLIT" ]; then
+  echo "SPLIT not specified, use default test"
+  SPLIT="test"
+fi
+
+export USE_INSTANCE_IMAGE=$USE_INSTANCE_IMAGE
+echo "USE_INSTANCE_IMAGE: $USE_INSTANCE_IMAGE"
+export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
+echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
+
+get_openhands_version
+
+echo "AGENT: $AGENT"
+echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
+echo "MODEL_CONFIG: $MODEL_CONFIG"
+echo "DATASET: $DATASET"
+echo "SPLIT: $SPLIT"
+
+# Default to NOT use Hint
+if [ -z "$USE_HINT_TEXT" ]; then
+  export USE_HINT_TEXT=false
+fi
+echo "USE_HINT_TEXT: $USE_HINT_TEXT"
+EVAL_NOTE="$OPENHANDS_VERSION"
+# if not using Hint, add -no-hint to the eval note
+if [ "$USE_HINT_TEXT" = false ]; then
+  EVAL_NOTE="$EVAL_NOTE-no-hint"
+fi
+
+if [ "$RUN_WITH_BROWSING" = true ]; then
+  EVAL_NOTE="$EVAL_NOTE-with-browsing"
+fi
+
+if [ -n "$EXP_NAME" ]; then
+  EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
+fi
+
+function run_eval() {
+  local eval_note=$1
+  COMMAND="poetry run python evaluation/benchmarks/visual_swe_bench/run_infer.py \
+    --agent-cls $AGENT \
+    --llm-config $MODEL_CONFIG \
+    --max-iterations $MAX_ITER \
+    --eval-num-workers $NUM_WORKERS \
+    --eval-note $eval_note \
+    --dataset $DATASET \
+    --split $SPLIT"
+
+  if [ -n "$EVAL_LIMIT" ]; then
+    echo "EVAL_LIMIT: $EVAL_LIMIT"
+    COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
+  fi
+
+  # Run the command
+  eval $COMMAND
+}
+
+unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
+if [ -z "$N_RUNS" ]; then
+  N_RUNS=1
+  echo "N_RUNS not specified, use default $N_RUNS"
+fi
+
+for i in $(seq 1 $N_RUNS); do
+  current_eval_note="$EVAL_NOTE-run_$i"
+  echo "EVAL_NOTE: $current_eval_note"
+  run_eval $current_eval_note
+done
+
+checkout_original_branch
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+source ~/.bashrc
+SWEUTIL_DIR=/swe_util
+
+# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
+# SWE_INSTANCE_ID=django__django-11099
+if [ -z "$SWE_INSTANCE_ID" ]; then
+    echo "Error: SWE_INSTANCE_ID is not set." >&2
+    exit 1
+fi
+
+# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
+item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
+
+if [[ -z "$item" ]]; then
+  echo "No item found for the provided instance ID."
+  exit 1
+fi
+
+WORKSPACE_NAME=$(echo "$item" | jq -r '(.repo | tostring) + "__" + (.version | tostring) | gsub("/"; "__")')
+
+echo "WORKSPACE_NAME: $WORKSPACE_NAME"
+
+# Clear the workspace
+if [ -d /workspace ]; then
+    rm -rf /workspace/*
+else
+    mkdir /workspace
+fi
+# Copy repo to workspace
+if [ -d /workspace/$WORKSPACE_NAME ]; then
+    rm -rf /workspace/$WORKSPACE_NAME
+fi
+mkdir -p /workspace
+cp -r /testbed /workspace/$WORKSPACE_NAME
+
+# Activate instance-specific environment
+. /opt/miniconda3/etc/profile.d/conda.sh
+conda activate testbed
@@ -10,11 +10,7 @@ describe("ChatMessage", () => {
    expect(screen.getByText("Hello, World!")).toBeInTheDocument();
  });

-  it("should render an assistant message", () => {
-    render(<ChatMessage type="assistant" message="Hello, World!" />);
-    expect(screen.getByTestId("assistant-message")).toBeInTheDocument();
-    expect(screen.getByText("Hello, World!")).toBeInTheDocument();
-  });
+  it.todo("should render an assistant message");

  it.skip("should support code syntax highlighting", () => {
    const code = "```js\nconsole.log('Hello, World!')\n```";
@@ -66,10 +62,7 @@ describe("ChatMessage", () => {

  it("should apply correct styles to inline code", () => {
    render(
-      <ChatMessage
-        type="assistant"
-        message="Here is some `inline code` text"
-      />,
+      <ChatMessage type="agent" message="Here is some `inline code` text" />,
    );
    const codeElement = screen.getByText("inline code");

@@ -1,11 +1,9 @@
 import { afterEach, beforeAll, describe, expect, it, vi } from "vitest";
-import { act, screen, waitFor, within } from "@testing-library/react";
+import { screen, waitFor, within } from "@testing-library/react";
 import userEvent from "@testing-library/user-event";
 import { renderWithProviders } from "test-utils";
 import type { Message } from "#/message";
-import { addUserMessage } from "#/state/chat-slice";
 import { SUGGESTIONS } from "#/utils/suggestions";
-import * as ChatSlice from "#/state/chat-slice";
 import { WsClientProviderStatus } from "#/context/ws-client-provider";
 import { ChatInterface } from "#/components/features/chat/chat-interface";

@@ -42,51 +40,10 @@ describe("Empty state", () => {
    vi.clearAllMocks();
  });

-  it("should render suggestions if empty", () => {
-    const { store } = renderWithProviders(<ChatInterface />, {
-      preloadedState: {
-        chat: {
-          messages: [],
-          systemMessage: {
-            content: "",
-            tools: [],
-            openhands_version: null,
-            agent_class: null
-          }
-        },
-      },
-    });
-
-    expect(screen.getByTestId("suggestions")).toBeInTheDocument();
-
-    act(() => {
-      store.dispatch(
-        addUserMessage({
-          content: "Hello",
-          imageUrls: [],
-          timestamp: new Date().toISOString(),
-          pending: true,
-        }),
-      );
-    });
-
-    expect(screen.queryByTestId("suggestions")).not.toBeInTheDocument();
-  });
+  it.todo("should render suggestions if empty");

  it("should render the default suggestions", () => {
-    renderWithProviders(<ChatInterface />, {
-      preloadedState: {
-        chat: {
-          messages: [],
-          systemMessage: {
-            content: "",
-            tools: [],
-            openhands_version: null,
-            agent_class: null
-          }
-        },
-      },
-    });
+    renderWithProviders(<ChatInterface />);

    const suggestions = screen.getByTestId("suggestions");
    const repoSuggestions = Object.keys(SUGGESTIONS.repo);
@@ -110,21 +67,8 @@ describe("Empty state", () => {
        status: WsClientProviderStatus.CONNECTED,
        isLoadingMessages: false,
      }));
-      const addUserMessageSpy = vi.spyOn(ChatSlice, "addUserMessage");
      const user = userEvent.setup();
-      const { store } = renderWithProviders(<ChatInterface />, {
-        preloadedState: {
-          chat: {
-            messages: [],
-            systemMessage: {
-              content: "",
-              tools: [],
-              openhands_version: null,
-              agent_class: null
-            }
-          },
-        },
-      });
+      renderWithProviders(<ChatInterface />);

      const suggestions = screen.getByTestId("suggestions");
      const displayedSuggestions = within(suggestions).getAllByRole("button");
@@ -133,9 +77,7 @@ describe("Empty state", () => {
      await user.click(displayedSuggestions[0]);

      // user message loaded to input
-      expect(addUserMessageSpy).not.toHaveBeenCalled();
      expect(screen.queryByTestId("suggestions")).toBeInTheDocument();
-      expect(store.getState().chat.messages).toHaveLength(0);
      expect(input).toHaveValue(displayedSuggestions[0].textContent);
    },
  );
@@ -149,19 +91,7 @@ describe("Empty state", () => {
        isLoadingMessages: false,
      }));
      const user = userEvent.setup();
-      const { rerender } = renderWithProviders(<ChatInterface />, {
-        preloadedState: {
-          chat: {
-            messages: [],
-            systemMessage: {
-              content: "",
-              tools: [],
-              openhands_version: null,
-              agent_class: null
-            }
-          },
-        },
-      });
+      const { rerender } = renderWithProviders(<ChatInterface />);

      const suggestions = screen.getByTestId("suggestions");
      const displayedSuggestions = within(suggestions).getAllByRole("button");
@@ -22,7 +22,7 @@ const renderRepoConnector = () => {
      path: "/conversations/:conversationId",
    },
    {
-      Component: Outlet,
+      Component: () => <Outlet />,
      path: "/settings",
      children: [
        {
@@ -11,7 +11,7 @@ import { MOCK_TASKS } from "#/mocks/task-suggestions-handlers";
 const renderTaskSuggestions = () => {
  const RouterStub = createRoutesStub([
    {
-      Component: TaskSuggestions,
+      Component: () => <TaskSuggestions />,
      path: "/",
    },
    {
@@ -1,92 +1,11 @@
-import { render, screen } from "@testing-library/react";
-import { describe, it, expect, vi } from "vitest";
-import { Messages } from "#/components/features/chat/messages";
-import type { Message } from "#/message";
-import { renderWithProviders } from "test-utils";
-
-// Mock the useParams hook to provide a conversationId
-vi.mock("react-router", async () => {
-  const actual = await vi.importActual<typeof import("react-router")>("react-router");
-  return {
-    ...actual,
-    useParams: () => ({ conversationId: "test-conversation-id" }),
-  };
-});
+import { describe, it } from "vitest";

 describe("File Operations Messages", () => {
-  it("should show success indicator for successful file read operation", () => {
-    const messages: Message[] = [
-      {
-        type: "action",
-        translationID: "read_file_contents",
-        content: "Successfully read file contents",
-        success: true,
-        sender: "assistant",
-        timestamp: new Date().toISOString(),
-      },
-    ];
+  it.todo("should show success indicator for successful file read operation");

-    renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
+  it.todo("should show failure indicator for failed file read operation");

-    const statusIcon = screen.getByTestId("status-icon");
-    expect(statusIcon).toBeInTheDocument();
-    expect(statusIcon.closest("svg")).toHaveClass("fill-success");
-  });
+  it.todo("should show success indicator for successful file edit operation");

-  it("should show failure indicator for failed file read operation", () => {
-    const messages: Message[] = [
-      {
-        type: "action",
-        translationID: "read_file_contents",
-        content: "Failed to read file contents",
-        success: false,
-        sender: "assistant",
-        timestamp: new Date().toISOString(),
-      },
-    ];
-
-    renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
-
-    const statusIcon = screen.getByTestId("status-icon");
-    expect(statusIcon).toBeInTheDocument();
-    expect(statusIcon.closest("svg")).toHaveClass("fill-danger");
-  });
-
-  it("should show success indicator for successful file edit operation", () => {
-    const messages: Message[] = [
-      {
-        type: "action",
-        translationID: "edit_file_contents",
-        content: "Successfully edited file contents",
-        success: true,
-        sender: "assistant",
-        timestamp: new Date().toISOString(),
-      },
-    ];
-
-    renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
-
-    const statusIcon = screen.getByTestId("status-icon");
-    expect(statusIcon).toBeInTheDocument();
-    expect(statusIcon.closest("svg")).toHaveClass("fill-success");
-  });
-
-  it("should show failure indicator for failed file edit operation", () => {
-    const messages: Message[] = [
-      {
-        type: "action",
-        translationID: "edit_file_contents",
-        content: "Failed to edit file contents",
-        success: false,
-        sender: "assistant",
-        timestamp: new Date().toISOString(),
-      },
-    ];
-
-    renderWithProviders(<Messages messages={messages} isAwaitingUserConfirmation={false} />);
-
-    const statusIcon = screen.getByTestId("status-icon");
-    expect(statusIcon).toBeInTheDocument();
-    expect(statusIcon.closest("svg")).toHaveClass("fill-danger");
-  });
+  it.todo("should show failure indicator for failed file edit operation");
 });
@@ -2,7 +2,6 @@ import { describe, it, expect, vi, beforeEach } from "vitest";
 import { render, waitFor } from "@testing-library/react";
 import React from "react";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
-import * as ChatSlice from "#/state/chat-slice";
 import {
  updateStatusWhenErrorMessagePresent,
  WsClientProvider,
@@ -11,42 +10,15 @@ import {

 describe("Propagate error message", () => {
  it("should do nothing when no message was passed from server", () => {
-    const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
    updateStatusWhenErrorMessagePresent(null);
    updateStatusWhenErrorMessagePresent(undefined);
    updateStatusWhenErrorMessagePresent({});
    updateStatusWhenErrorMessagePresent({ message: null });
-
-    expect(addErrorMessageSpy).not.toHaveBeenCalled();
  });

-  it("should display error to user when present", () => {
-    const message = "We have a problem!";
-    const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
-    updateStatusWhenErrorMessagePresent({ message });
+  it.todo("should display error to user when present");

-    expect(addErrorMessageSpy).toHaveBeenCalledWith({
-      message,
-      status_update: true,
-      type: "error",
-    });
-  });
-
-  it("should display error including translation id when present", () => {
-    const message = "We have a problem!";
-    const addErrorMessageSpy = vi.spyOn(ChatSlice, "addErrorMessage");
-    updateStatusWhenErrorMessagePresent({
-      message,
-      data: { msg_id: "..id.." },
-    });
-
-    expect(addErrorMessageSpy).toHaveBeenCalledWith({
-      message,
-      id: "..id..",
-      status_update: true,
-      type: "error",
-    });
-  });
+  it.todo("should display error including translation id when present");
 });

 // Create a mock for socket.io-client
@@ -59,11 +59,7 @@ describe("useTerminal", () => {
  it("should render", () => {
    renderWithProviders(<TestTerminalComponent commands={[]} />, {
      preloadedState: {
-        agent: { 
-          curAgentState: AgentState.RUNNING,
-          currentAgentType: "CodeActAgent",
-          isDelegated: false
-        },
+        agent: { curAgentState: AgentState.RUNNING },
        cmd: { commands: [] },
      },
    });
@@ -77,11 +73,7 @@ describe("useTerminal", () => {

    renderWithProviders(<TestTerminalComponent commands={commands} />, {
      preloadedState: {
-        agent: { 
-          curAgentState: AgentState.RUNNING,
-          currentAgentType: "CodeActAgent",
-          isDelegated: false
-        },
+        agent: { curAgentState: AgentState.RUNNING },
        cmd: { commands },
      },
    });
@@ -108,11 +100,7 @@ describe("useTerminal", () => {
      />,
      {
        preloadedState: {
-          agent: { 
-            curAgentState: AgentState.RUNNING,
-            currentAgentType: "CodeActAgent",
-            isDelegated: false
-          },
+          agent: { curAgentState: AgentState.RUNNING },
          cmd: { commands },
        },
      },
@@ -22,7 +22,7 @@ const MOCK_GET_SECRETS_RESPONSE: GetSecretsResponse["custom_secrets"] = [

 const RouterStub = createRoutesStub([
  {
-    Component: Outlet,
+    Component: () => <Outlet />,
    path: "/settings",
    children: [
      {
@@ -1,146 +0,0 @@
-import { describe, it, expect, vi, beforeEach } from "vitest";
-import { handleStatusMessage, handleActionMessage } from "#/services/actions";
-import store from "#/store";
-import { trackError } from "#/utils/error-handler";
-import ActionType from "#/types/action-type";
-import { ActionMessage } from "#/types/message";
-
-// Mock dependencies
-vi.mock("#/utils/error-handler", () => ({
-  trackError: vi.fn(),
-}));
-
-vi.mock("#/store", () => ({
-  default: {
-    dispatch: vi.fn(),
-  },
-}));
-
-describe("Actions Service", () => {
-  beforeEach(() => {
-    vi.clearAllMocks();
-  });
-
-  describe("handleStatusMessage", () => {
-    it("should dispatch info messages to status state", () => {
-      const message = {
-        type: "info",
-        message: "Runtime is not available",
-        id: "runtime.unavailable",
-        status_update: true as const,
-      };
-
-      handleStatusMessage(message);
-
-      expect(store.dispatch).toHaveBeenCalledWith(expect.objectContaining({
-        payload: message,
-      }));
-    });
-
-    it("should log error messages and display them in chat", () => {
-      const message = {
-        type: "error",
-        message: "Runtime connection failed",
-        id: "runtime.connection.failed",
-        status_update: true as const,
-      };
-
-      handleStatusMessage(message);
-
-      expect(trackError).toHaveBeenCalledWith({
-        message: "Runtime connection failed",
-        source: "chat",
-        metadata: { msgId: "runtime.connection.failed" },
-      });
-
-      expect(store.dispatch).toHaveBeenCalledWith(expect.objectContaining({
-        payload: message,
-      }));
-    });
-  });
-
-  describe("handleActionMessage", () => {
-    it("should use first-person perspective for task completion messages", () => {
-      // Test partial completion
-      const messagePartial: ActionMessage = {
-        id: 1,
-        action: ActionType.FINISH,
-        source: "agent",
-        message: "",
-        timestamp: new Date().toISOString(),
-        args: {
-          final_thought: "",
-          task_completed: "partial",
-          outputs: "",
-          thought: ""
-        }
-      };
-
-      // Mock implementation to capture the message
-      let capturedPartialMessage = "";
-      (store.dispatch as any).mockImplementation((action: any) => {
-        if (action.type === "chat/addAssistantMessage" &&
-            action.payload.includes("believe that the task was **completed partially**")) {
-          capturedPartialMessage = action.payload;
-        }
-      });
-
-      handleActionMessage(messagePartial);
-      expect(capturedPartialMessage).toContain("I believe that the task was **completed partially**");
-
-      // Test not completed
-      const messageNotCompleted: ActionMessage = {
-        id: 2,
-        action: ActionType.FINISH,
-        source: "agent",
-        message: "",
-        timestamp: new Date().toISOString(),
-        args: {
-          final_thought: "",
-          task_completed: "false",
-          outputs: "",
-          thought: ""
-        }
-      };
-
-      // Mock implementation to capture the message
-      let capturedNotCompletedMessage = "";
-      (store.dispatch as any).mockImplementation((action: any) => {
-        if (action.type === "chat/addAssistantMessage" &&
-            action.payload.includes("believe that the task was **not completed**")) {
-          capturedNotCompletedMessage = action.payload;
-        }
-      });
-
-      handleActionMessage(messageNotCompleted);
-      expect(capturedNotCompletedMessage).toContain("I believe that the task was **not completed**");
-
-      // Test completed successfully
-      const messageCompleted: ActionMessage = {
-        id: 3,
-        action: ActionType.FINISH,
-        source: "agent",
-        message: "",
-        timestamp: new Date().toISOString(),
-        args: {
-          final_thought: "",
-          task_completed: "true",
-          outputs: "",
-          thought: ""
-        }
-      };
-
-      // Mock implementation to capture the message
-      let capturedCompletedMessage = "";
-      (store.dispatch as any).mockImplementation((action: any) => {
-        if (action.type === "chat/addAssistantMessage" &&
-            action.payload.includes("believe that the task was **completed successfully**")) {
-          capturedCompletedMessage = action.payload;
-        }
-      });
-
-      handleActionMessage(messageCompleted);
-      expect(capturedCompletedMessage).toContain("I believe that the task was **completed successfully**");
-    });
-  });
-});
@@ -1,51 +0,0 @@
-import { beforeEach, describe, expect, it, vi } from "vitest";
-import { handleObservationMessage } from "#/services/observations";
-import store from "#/store";
-import { ObservationMessage } from "#/types/message";
-
-// Mock dependencies
-vi.mock("#/store", () => ({
-  default: {
-    dispatch: vi.fn(),
-  },
-}));
-
-describe("Observations Service", () => {
-  beforeEach(() => {
-    vi.clearAllMocks();
-  });
-
-  describe("handleObservationMessage", () => {
-    const createErrorMessage = (): ObservationMessage => ({
-      id: 14,
-      timestamp: "2025-04-14T13:37:54.451843",
-      message: "The action has not been executed.",
-      cause: 12,
-      observation: "error",
-      content: "The action has not been executed.",
-      extras: {
-        error_id: "",
-        metadata: {},
-      },
-    });
-
-    it("should dispatch error messages exactly once", () => {
-      const errorMessage = createErrorMessage();
-
-      handleObservationMessage(errorMessage);
-
-      expect(store.dispatch).toHaveBeenCalledTimes(1);
-      expect(store.dispatch).toHaveBeenCalledWith({
-        type: "chat/addAssistantObservation",
-        payload: expect.objectContaining({
-          observation: "error",
-          content: "The action has not been executed.",
-          source: "user",
-          extras: {
-            error_id: "",
-          },
-        }),
-      });
-    });
-  });
-});
@@ -1,8 +1,4 @@
-import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
-import { handleObservationMessage } from "#/services/observations";
-import { setScreenshotSrc, setUrl } from "#/state/browser-slice";
-import ObservationType from "#/types/observation-type";
-import store from "#/store";
+import { describe, it, vi, beforeEach, afterEach } from "vitest";

 // Mock the store module
 vi.mock("#/store", () => ({
@@ -20,43 +16,9 @@ describe("handleObservationMessage", () => {
    vi.resetAllMocks();
  });

-  it("updates browser state when receiving a browse observation", () => {
-    const message = {
-      id: "test-id",
-      cause: "test-cause",
-      observation: ObservationType.BROWSE,
-      content: "test content",
-      message: "test message",
-      extras: {
-        url: "https://example.com",
-        screenshot: "base64-screenshot-data",
-      },
-    };
+  it.todo("updates browser state when receiving a browse observation");

-    handleObservationMessage(message);
-
-    // Check that setScreenshotSrc and setUrl were called with the correct values
-    expect(store.dispatch).toHaveBeenCalledWith(setScreenshotSrc("base64-screenshot-data"));
-    expect(store.dispatch).toHaveBeenCalledWith(setUrl("https://example.com"));
-  });
-
-  it("updates browser state when receiving a browse_interactive observation", () => {
-    const message = {
-      id: "test-id",
-      cause: "test-cause",
-      observation: ObservationType.BROWSE_INTERACTIVE,
-      content: "test content",
-      message: "test message",
-      extras: {
-        url: "https://example.com",
-        screenshot: "base64-screenshot-data",
-      },
-    };
-
-    handleObservationMessage(message);
-
-    // Check that setScreenshotSrc and setUrl were called with the correct values
-    expect(store.dispatch).toHaveBeenCalledWith(setScreenshotSrc("base64-screenshot-data"));
-    expect(store.dispatch).toHaveBeenCalledWith(setUrl("https://example.com"));
-  });
+  it.todo(
+    "updates browser state when receiving a browse_interactive observation",
+  );
 });
@@ -8,30 +8,30 @@
  },
  "dependencies": {
    "@heroui/react": "2.7.8",
-    "@microlink/react-json-view": "^1.26.1",
+    "@microlink/react-json-view": "^1.26.2",
    "@monaco-editor/react": "^4.7.0-rc.0",
-    "@react-router/node": "^7.5.3",
-    "@react-router/serve": "^7.5.3",
+    "@react-router/node": "^7.6.0",
+    "@react-router/serve": "^7.6.0",
    "@react-types/shared": "^3.29.0",
-    "@reduxjs/toolkit": "^2.7.0",
+    "@reduxjs/toolkit": "^2.8.2",
    "@stripe/react-stripe-js": "^3.7.0",
    "@stripe/stripe-js": "^7.3.0",
-    "@tanstack/react-query": "^5.75.4",
+    "@tanstack/react-query": "^5.76.1",
    "@vitejs/plugin-react": "^4.4.0",
    "@xterm/addon-fit": "^0.10.0",
    "@xterm/xterm": "^5.4.0",
    "axios": "^1.9.0",
    "clsx": "^2.1.1",
    "eslint-config-airbnb-typescript": "^18.0.0",
-    "framer-motion": "^12.10.0",
-    "i18next": "^25.1.1",
+    "framer-motion": "^12.12.1",
+    "i18next": "^25.1.3",
    "i18next-browser-languagedetector": "^8.1.0",
    "i18next-http-backend": "^3.0.2",
-    "isbot": "^5.1.27",
+    "isbot": "^5.1.28",
    "jose": "^6.0.11",
-    "lucide-react": "^0.507.0",
+    "lucide-react": "^0.511.0",
    "monaco-editor": "^0.52.2",
-    "posthog-js": "^1.239.1",
+    "posthog-js": "^1.242.2",
    "react": "^19.1.0",
    "react-dom": "^19.1.0",
    "react-highlight": "^0.15.0",
@@ -40,15 +40,15 @@
    "react-icons": "^5.5.0",
    "react-markdown": "^10.1.0",
    "react-redux": "^9.2.0",
-    "react-router": "^7.5.3",
+    "react-router": "^7.6.0",
    "react-syntax-highlighter": "^15.6.1",
    "react-textarea-autosize": "^8.5.9",
    "remark-gfm": "^4.0.1",
    "sirv-cli": "^3.0.1",
    "socket.io-client": "^4.8.1",
-    "tailwind-merge": "^3.2.0",
+    "tailwind-merge": "^3.3.0",
    "vite": "^6.3.5",
-    "web-vitals": "^3.5.2",
+    "web-vitals": "^5.0.1",
    "ws": "^8.18.2"
  },
  "scripts": {
@@ -83,16 +83,16 @@
    "@babel/types": "^7.27.0",
    "@mswjs/socket.io-binding": "^0.1.1",
    "@playwright/test": "^1.52.0",
-    "@react-router/dev": "^7.5.3",
+    "@react-router/dev": "^7.6.0",
    "@tailwindcss/typography": "^0.5.16",
    "@tanstack/eslint-plugin-query": "^5.74.7",
    "@testing-library/dom": "^10.4.0",
    "@testing-library/jest-dom": "^6.6.1",
    "@testing-library/react": "^16.3.0",
    "@testing-library/user-event": "^14.6.1",
-    "@types/node": "^22.15.12",
-    "@types/react": "^19.1.3",
-    "@types/react-dom": "^19.1.3",
+    "@types/node": "^22.15.18",
+    "@types/react": "^19.1.4",
+    "@types/react-dom": "^19.1.5",
    "@types/react-highlight": "^0.12.8",
    "@types/react-syntax-highlighter": "^15.5.13",
    "@types/ws": "^8.18.1",
@@ -104,7 +104,7 @@
    "eslint": "^8.57.0",
    "eslint-config-airbnb": "^19.0.4",
    "eslint-config-airbnb-typescript": "^18.0.0",
-    "eslint-config-prettier": "^10.1.3",
+    "eslint-config-prettier": "^10.1.5",
    "eslint-plugin-import": "^2.29.1",
    "eslint-plugin-jsx-a11y": "^6.10.2",
    "eslint-plugin-prettier": "^5.4.0",
@@ -113,7 +113,7 @@
    "eslint-plugin-unused-imports": "^4.1.4",
    "husky": "^9.1.7",
    "jsdom": "^26.1.0",
-    "lint-staged": "^15.5.2",
+    "lint-staged": "^16.0.0",
    "msw": "^2.6.6",
    "postcss": "^8.5.2",
    "prettier": "^3.5.3",
@@ -1,4 +1,4 @@
-import { useDispatch, useSelector } from "react-redux";
+import { useSelector } from "react-redux";
 import React from "react";
 import posthog from "posthog-js";
 import { useParams } from "react-router";
@@ -8,7 +8,6 @@ import { convertImageToBase64 } from "#/utils/convert-image-to-base-64";
 import { TrajectoryActions } from "../trajectory/trajectory-actions";
 import { createChatMessage } from "#/services/chat-service";
 import { InteractiveChatBox } from "./interactive-chat-box";
-import { addUserMessage } from "#/state/chat-slice";
 import { RootState } from "#/store";
 import { AgentState } from "#/types/agent-state";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
@@ -25,6 +24,11 @@ import { LoadingSpinner } from "#/components/shared/loading-spinner";
 import { useGetTrajectory } from "#/hooks/mutation/use-get-trajectory";
 import { downloadTrajectory } from "#/utils/download-trajectory";
 import { displayErrorToast } from "#/utils/custom-toast-handlers";
+import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
+import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
+import i18n from "#/i18n";
+import { ErrorMessageBanner } from "./error-message-banner";
+import { shouldRenderEvent } from "./event-content-helpers/should-render-event";

 function getEntryPoint(
  hasRepository: boolean | null,
@@ -36,14 +40,15 @@ function getEntryPoint(
 }

 export function ChatInterface() {
-  const { send, isLoadingMessages } = useWsClient();
-  const dispatch = useDispatch();
+  const { getErrorMessage } = useWSErrorMessage();
+  const { send, isLoadingMessages, parsedEvents } = useWsClient();
+  const { setOptimisticUserMessage, getOptimisticUserMessage } =
+    useOptimisticUserMessage();
  const { t } = useTranslation();
  const scrollRef = React.useRef<HTMLDivElement>(null);
  const { scrollDomToBottom, onChatBodyScroll, hitBottom } =
    useScrollToBottom(scrollRef);

-  const { messages } = useSelector((state: RootState) => state.chat);
  const { curAgentState } = useSelector((state: RootState) => state.agent);

  const [feedbackPolarity, setFeedbackPolarity] = React.useState<
@@ -57,8 +62,13 @@ export function ChatInterface() {
  const params = useParams();
  const { mutate: getTrajectory } = useGetTrajectory();

+  const optimisticUserMessage = getOptimisticUserMessage();
+  const errorMessage = getErrorMessage();
+
+  const events = parsedEvents.filter(shouldRenderEvent);
+
  const handleSendMessage = async (content: string, files: File[]) => {
-    if (messages.length === 0) {
+    if (events.length === 0) {
      posthog.capture("initial_query_submitted", {
        entry_point: getEntryPoint(
          selectedRepository !== null,
@@ -69,7 +79,7 @@ export function ChatInterface() {
      });
    } else {
      posthog.capture("user_message_sent", {
-        session_message_count: messages.length,
+        session_message_count: events.length,
        current_message_length: content.length,
      });
    }
@@ -77,9 +87,8 @@ export function ChatInterface() {
    const imageUrls = await Promise.all(promises);

    const timestamp = new Date().toISOString();
-    const pending = true;
-    dispatch(addUserMessage({ content, imageUrls, timestamp, pending }));
    send(createChatMessage(content, imageUrls, timestamp));
+    setOptimisticUserMessage(content);
    setMessageToSend(null);
  };

@@ -120,7 +129,7 @@ export function ChatInterface() {

  return (
    <div className="h-full flex flex-col justify-between">
-      {messages.length === 0 && (
+      {events.length === 0 && !optimisticUserMessage && (
        <ChatSuggestions onSuggestionsClick={setMessageToSend} />
      )}

@@ -137,7 +146,7 @@ export function ChatInterface() {

        {!isLoadingMessages && (
          <Messages
-            messages={messages}
+            messages={events}
            isAwaitingUserConfirmation={
              curAgentState === AgentState.AWAITING_USER_CONFIRMATION
            }
@@ -170,6 +179,12 @@ export function ChatInterface() {
          {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
        </div>

+        {errorMessage && (
+          <ErrorMessageBanner
+            message={i18n.exists(errorMessage) ? t(errorMessage) : errorMessage}
+          />
+        )}
+
        <InteractiveChatBox
          onSubmit={handleSendMessage}
          onStop={handleStop}
@@ -6,10 +6,11 @@ import { cn } from "#/utils/utils";
 import { ul, ol } from "../markdown/list";
 import { CopyToClipboardButton } from "#/components/shared/buttons/copy-to-clipboard-button";
 import { anchor } from "../markdown/anchor";
+import { OpenHandsSourceType } from "#/types/core/base";
 import { paragraph } from "../markdown/paragraph";

 interface ChatMessageProps {
-  type: "user" | "assistant";
+  type: OpenHandsSourceType;
  message: string;
 }

@@ -49,7 +50,7 @@ export function ChatMessage({
        "rounded-xl relative",
        "flex flex-col gap-2",
        type === "user" && " max-w-[305px] p-4 bg-tertiary self-end",
-        type === "assistant" && "mt-6 max-w-full bg-transparent",
+        type === "agent" && "mt-6 max-w-full bg-transparent",
      )}
    >
      <CopyToClipboardButton
@@ -0,0 +1,11 @@
+interface ErrorMessageBannerProps {
+  message: string;
+}
+
+export function ErrorMessageBanner({ message }: ErrorMessageBannerProps) {
+  return (
+    <div className="w-full rounded-lg p-2 text-black border border-red-800 bg-red-500">
+      {message}
+    </div>
+  );
+}
@@ -0,0 +1,56 @@
+import React from "react";
+import Markdown from "react-markdown";
+import remarkGfm from "remark-gfm";
+import { useTranslation } from "react-i18next";
+import { code } from "../markdown/code";
+import { ol, ul } from "../markdown/list";
+import ArrowDown from "#/icons/angle-down-solid.svg?react";
+import ArrowUp from "#/icons/angle-up-solid.svg?react";
+import i18n from "#/i18n";
+
+interface ErrorMessageProps {
+  errorId?: string;
+  defaultMessage: string;
+}
+
+export function ErrorMessage({ errorId, defaultMessage }: ErrorMessageProps) {
+  const { t } = useTranslation();
+  const [showDetails, setShowDetails] = React.useState(false);
+
+  const hasValidTranslationId = !!errorId && i18n.exists(errorId);
+  const errorKey = hasValidTranslationId
+    ? errorId
+    : "CHAT_INTERFACE$AGENT_ERROR_MESSAGE";
+
+  return (
+    <div className="flex flex-col gap-2 border-l-2 pl-2 my-2 py-2 border-danger text-sm w-full">
+      <div className="font-bold text-danger">
+        {t(errorKey)}
+        <button
+          type="button"
+          onClick={() => setShowDetails((prev) => !prev)}
+          className="cursor-pointer text-left"
+        >
+          {showDetails ? (
+            <ArrowUp className="h-4 w-4 ml-2 inline fill-danger" />
+          ) : (
+            <ArrowDown className="h-4 w-4 ml-2 inline fill-danger" />
+          )}
+        </button>
+      </div>
+
+      {showDetails && (
+        <Markdown
+          components={{
+            code,
+            ul,
+            ol,
+          }}
+          remarkPlugins={[remarkGfm]}
+        >
+          {defaultMessage}
+        </Markdown>
+      )}
+    </div>
+  );
+}
@@ -0,0 +1,125 @@
+import { ActionSecurityRisk } from "#/state/security-analyzer-slice";
+import {
+  FileWriteAction,
+  CommandAction,
+  IPythonAction,
+  BrowseAction,
+  BrowseInteractiveAction,
+  MCPAction,
+  ThinkAction,
+  OpenHandsAction,
+  FinishAction,
+} from "#/types/core/actions";
+import { getDefaultEventContent, MAX_CONTENT_LENGTH } from "./shared";
+
+const getRiskText = (risk: ActionSecurityRisk) => {
+  switch (risk) {
+    case ActionSecurityRisk.LOW:
+      return "Low Risk";
+    case ActionSecurityRisk.MEDIUM:
+      return "Medium Risk";
+    case ActionSecurityRisk.HIGH:
+      return "High Risk";
+    case ActionSecurityRisk.UNKNOWN:
+    default:
+      return "Unknown Risk";
+  }
+};
+
+const getWriteActionContent = (event: FileWriteAction): string => {
+  let { content } = event.args;
+  if (content.length > MAX_CONTENT_LENGTH) {
+    content = `${event.args.content.slice(0, MAX_CONTENT_LENGTH)}...`;
+  }
+  return `${event.args.path}\n${content}`;
+};
+
+const getRunActionContent = (event: CommandAction): string => {
+  let content = `Command:\n\`${event.args.command}\``;
+
+  if (event.args.confirmation_state === "awaiting_confirmation") {
+    content += `\n\n${getRiskText(event.args.security_risk)}`;
+  }
+
+  return content;
+};
+
+const getIPythonActionContent = (event: IPythonAction): string => {
+  let content = `\`\`\`\n${event.args.code}\n\`\`\``;
+
+  if (event.args.confirmation_state === "awaiting_confirmation") {
+    content += `\n\n${getRiskText(event.args.security_risk)}`;
+  }
+
+  return content;
+};
+
+const getBrowseActionContent = (event: BrowseAction): string =>
+  `Browsing ${event.args.url}`;
+
+const getBrowseInteractiveActionContent = (event: BrowseInteractiveAction) =>
+  `**Action:**\n\n\`\`\`python\n${event.args.browser_actions}\n\`\`\``;
+
+const getMcpActionContent = (event: MCPAction): string => {
+  // Format MCP action with name and arguments
+  const name = event.args.name || "";
+  const args = event.args.arguments || {};
+  let details = `**MCP Tool Call:** ${name}\n\n`;
+  // Include thought if available
+  if (event.args.thought) {
+    details += `\n\n**Thought:**\n${event.args.thought}`;
+  }
+  details += `\n\n**Arguments:**\n\`\`\`json\n${JSON.stringify(args, null, 2)}\n\`\`\``;
+  return details;
+};
+
+const getThinkActionContent = (event: ThinkAction): string =>
+  event.args.thought;
+
+const getFinishActionContent = (event: FinishAction): string => {
+  let content = event.args.final_thought;
+
+  switch (event.args.task_completed) {
+    case "success":
+      content +=
+        "\n\n\nI believe that the task was **completed successfully**.";
+      break;
+    case "failure":
+      content += "\n\n\nI believe that the task was **not completed**.";
+      break;
+    case "partial":
+    default:
+      content += "\n\n\nI believe that the task was **completed partially**.";
+      break;
+  }
+
+  return content.trim();
+};
+
+const getNoContentActionContent = (): string => "";
+
+export const getActionContent = (event: OpenHandsAction): string => {
+  switch (event.action) {
+    case "read":
+    case "edit":
+      return getNoContentActionContent();
+    case "write":
+      return getWriteActionContent(event);
+    case "run":
+      return getRunActionContent(event);
+    case "run_ipython":
+      return getIPythonActionContent(event);
+    case "browse":
+      return getBrowseActionContent(event);
+    case "browse_interactive":
+      return getBrowseInteractiveActionContent(event);
+    case "call_tool_mcp":
+      return getMcpActionContent(event);
+    case "think":
+      return getThinkActionContent(event);
+    case "finish":
+      return getFinishActionContent(event);
+    default:
+      return getDefaultEventContent(event);
+  }
+};
@@ -0,0 +1,70 @@
+import { Trans } from "react-i18next";
+import { OpenHandsAction } from "#/types/core/actions";
+import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
+import { OpenHandsObservation } from "#/types/core/observations";
+import { MonoComponent } from "../mono-component";
+import { PathComponent } from "../path-component";
+import { getActionContent } from "./get-action-content";
+import { getObservationContent } from "./get-observation-content";
+
+const hasPathProperty = (
+  obj: Record<string, unknown>,
+): obj is { path: string } => typeof obj.path === "string";
+
+const hasCommandProperty = (
+  obj: Record<string, unknown>,
+): obj is { command: string } => typeof obj.command === "string";
+
+const trimText = (text: string, maxLength: number): string => {
+  if (!text) return "";
+  return text.length > maxLength ? `${text.substring(0, maxLength)}...` : text;
+};
+
+export const getEventContent = (
+  event: OpenHandsAction | OpenHandsObservation,
+) => {
+  let title: React.ReactNode = "";
+  let details: string = "";
+
+  if (isOpenHandsAction(event)) {
+    title = (
+      <Trans
+        i18nKey={`ACTION_MESSAGE$${event.action.toUpperCase()}`}
+        values={{
+          path: hasPathProperty(event.args) && event.args.path,
+          command:
+            hasCommandProperty(event.args) && trimText(event.args.command, 80),
+        }}
+        components={{
+          path: <PathComponent />,
+          cmd: <MonoComponent />,
+        }}
+      />
+    );
+    details = getActionContent(event);
+  }
+
+  if (isOpenHandsObservation(event)) {
+    title = (
+      <Trans
+        i18nKey={`OBSERVATION_MESSAGE$${event.observation.toUpperCase()}`}
+        values={{
+          path: hasPathProperty(event.extras) && event.extras.path,
+          command:
+            hasCommandProperty(event.extras) &&
+            trimText(event.extras.command, 80),
+        }}
+        components={{
+          path: <PathComponent />,
+          cmd: <MonoComponent />,
+        }}
+      />
+    );
+    details = getObservationContent(event);
+  }
+
+  return {
+    title: title ?? "Unknown event",
+    details: details ?? "Unknown event",
+  };
+};
@@ -0,0 +1,133 @@
+import {
+  ReadObservation,
+  CommandObservation,
+  IPythonObservation,
+  EditObservation,
+  BrowseObservation,
+  OpenHandsObservation,
+  RecallObservation,
+} from "#/types/core/observations";
+import { getObservationResult } from "./get-observation-result";
+import { getDefaultEventContent, MAX_CONTENT_LENGTH } from "./shared";
+
+const getReadObservationContent = (event: ReadObservation): string =>
+  `\`\`\`\n${event.content}\n\`\`\``;
+
+const getCommandObservationContent = (
+  event: CommandObservation | IPythonObservation,
+): string => {
+  let { content } = event;
+  if (content.length > MAX_CONTENT_LENGTH) {
+    content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
+  }
+  return `Output:\n\`\`\`sh\n${content.trim() || "[Command finished execution with no output]"}\n\`\`\``;
+};
+
+const getEditObservationContent = (
+  event: EditObservation,
+  successMessage: boolean,
+): string => {
+  if (successMessage) {
+    return `\`\`\`diff\n${event.extras.diff}\n\`\`\``; // Content is already truncated by the ACI
+  }
+  return event.content;
+};
+
+const getBrowseObservationContent = (event: BrowseObservation) => {
+  let contentDetails = `**URL:** ${event.extras.url}\n`;
+  if (event.extras.error) {
+    contentDetails += `\n\n**Error:**\n${event.extras.error}\n`;
+  }
+  contentDetails += `\n\n**Output:**\n${event.content}`;
+  if (contentDetails.length > MAX_CONTENT_LENGTH) {
+    contentDetails = `${contentDetails.slice(0, MAX_CONTENT_LENGTH)}...(truncated)`;
+  }
+  return contentDetails;
+};
+
+const getMcpObservationContent = (event: OpenHandsObservation): string => {
+  let { content } = event;
+  if (content.length > MAX_CONTENT_LENGTH) {
+    content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
+  }
+  return `**Output:**\n\`\`\`\n${content.trim() || "[MCP Tool finished execution with no output]"}\n\`\`\``;
+};
+
+const getRecallObservationContent = (event: RecallObservation): string => {
+  let content = "";
+
+  if (event.extras.recall_type === "workspace_context") {
+    if (event.extras.repo_name) {
+      content += `\n\n**Repository:** ${event.extras.repo_name}`;
+    }
+    if (event.extras.repo_directory) {
+      content += `\n\n**Directory:** ${event.extras.repo_directory}`;
+    }
+    if (event.extras.date) {
+      content += `\n\n**Date:** ${event.extras.date}`;
+    }
+    if (
+      event.extras.runtime_hosts &&
+      Object.keys(event.extras.runtime_hosts).length > 0
+    ) {
+      content += `\n\n**Available Hosts**`;
+      for (const [host, port] of Object.entries(event.extras.runtime_hosts)) {
+        content += `\n\n- ${host} (port ${port})`;
+      }
+    }
+    if (event.extras.repo_instructions) {
+      content += `\n\n**Repository Instructions:**\n\n${event.extras.repo_instructions}`;
+    }
+    if (event.extras.additional_agent_instructions) {
+      content += `\n\n**Additional Instructions:**\n\n${event.extras.additional_agent_instructions}`;
+    }
+  }
+
+  // Handle microagent knowledge
+  if (
+    event.extras.microagent_knowledge &&
+    event.extras.microagent_knowledge.length > 0
+  ) {
+    content += `\n\n**Triggered Microagent Knowledge:**`;
+    for (const knowledge of event.extras.microagent_knowledge) {
+      content += `\n\n- **${knowledge.name}** (triggered by keyword: ${knowledge.trigger})\n\n\`\`\`\n${knowledge.content}\n\`\`\``;
+    }
+  }
+
+  if (
+    event.extras.custom_secrets_descriptions &&
+    Object.keys(event.extras.custom_secrets_descriptions).length > 0
+  ) {
+    content += `\n\n**Custom Secrets**`;
+    for (const [name, description] of Object.entries(
+      event.extras.custom_secrets_descriptions,
+    )) {
+      content += `\n\n- $${name}: ${description}`;
+    }
+  }
+
+  return content;
+};
+
+export const getObservationContent = (event: OpenHandsObservation): string => {
+  switch (event.observation) {
+    case "read":
+      return getReadObservationContent(event);
+    case "edit":
+      return getEditObservationContent(
+        event,
+        getObservationResult(event) === "success",
+      );
+    case "run_ipython":
+    case "run":
+      return getCommandObservationContent(event);
+    case "browse":
+      return getBrowseObservationContent(event);
+    case "mcp":
+      return getMcpObservationContent(event);
+    case "recall":
+      return getRecallObservationContent(event);
+    default:
+      return getDefaultEventContent(event);
+  }
+};
@@ -0,0 +1,26 @@
+import { OpenHandsObservation } from "#/types/core/observations";
+
+export type ObservationResultStatus = "success" | "error" | "timeout";
+
+export const getObservationResult = (event: OpenHandsObservation) => {
+  const hasContent = event.content.length > 0;
+  const contentIncludesError = event.content.toLowerCase().includes("error:");
+
+  switch (event.observation) {
+    case "run": {
+      const exitCode = event.extras.metadata.exit_code;
+
+      if (exitCode === -1) return "timeout"; // Command timed out
+      if (exitCode === 0) return "success"; // Command executed successfully
+      return "error"; // Command failed
+    }
+    case "run_ipython":
+    case "read":
+    case "edit":
+    case "mcp":
+      if (!hasContent || contentIncludesError) return "error";
+      return "success"; // Content is valid
+    default:
+      return "success";
+  }
+};
@@ -0,0 +1,8 @@
+import { OpenHandsAction } from "#/types/core/actions";
+import { OpenHandsObservation } from "#/types/core/observations";
+
+export const MAX_CONTENT_LENGTH = 1000;
+
+export const getDefaultEventContent = (
+  event: OpenHandsAction | OpenHandsObservation,
+): string => `\`\`\`json\n${JSON.stringify(event, null, 2)}\n\`\`\``;
@@ -0,0 +1,27 @@
+import { OpenHandsAction } from "#/types/core/actions";
+import { OpenHandsEventType } from "#/types/core/base";
+import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
+import { OpenHandsObservation } from "#/types/core/observations";
+
+const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
+  "system",
+  "agent_state_changed",
+  "change_agent_state",
+];
+
+const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
+
+export const shouldRenderEvent = (
+  event: OpenHandsAction | OpenHandsObservation,
+) => {
+  if (isOpenHandsAction(event)) {
+    const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
+    return !noRenderList.includes(event.action);
+  }
+
+  if (isOpenHandsObservation(event)) {
+    return !COMMON_NO_RENDER_LIST.includes(event.observation);
+  }
+
+  return true;
+};
@@ -0,0 +1,123 @@
+import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
+import { I18nKey } from "#/i18n/declaration";
+import { OpenHandsAction } from "#/types/core/actions";
+import {
+  isUserMessage,
+  isErrorObservation,
+  isAssistantMessage,
+  isOpenHandsAction,
+  isOpenHandsObservation,
+  isFinishAction,
+  isRejectObservation,
+} from "#/types/core/guards";
+import { OpenHandsObservation } from "#/types/core/observations";
+import { ImageCarousel } from "../images/image-carousel";
+import { ChatMessage } from "./chat-message";
+import { ErrorMessage } from "./error-message";
+import { getObservationResult } from "./event-content-helpers/get-observation-result";
+import { getEventContent } from "./event-content-helpers/get-event-content";
+import { ExpandableMessage } from "./expandable-message";
+import { GenericEventMessage } from "./generic-event-message";
+
+const hasThoughtProperty = (
+  obj: Record<string, unknown>,
+): obj is { thought: string } => "thought" in obj && !!obj.thought;
+
+interface EventMessageProps {
+  event: OpenHandsAction | OpenHandsObservation;
+  hasObservationPair: boolean;
+  isFirstMessageWithResolverTrigger: boolean;
+  isAwaitingUserConfirmation: boolean;
+  isLastMessage: boolean;
+}
+
+export function EventMessage({
+  event,
+  hasObservationPair,
+  isFirstMessageWithResolverTrigger,
+  isAwaitingUserConfirmation,
+  isLastMessage,
+}: EventMessageProps) {
+  const shouldShowConfirmationButtons =
+    isLastMessage && event.source === "agent" && isAwaitingUserConfirmation;
+
+  const isFirstUserMessageWithResolverTrigger =
+    isFirstMessageWithResolverTrigger && isUserMessage(event);
+
+  // Special case: First user message with resolver trigger
+  if (isFirstUserMessageWithResolverTrigger) {
+    return (
+      <div>
+        <ExpandableMessage
+          type="action"
+          message={event.args.content}
+          id={I18nKey.CHAT$RESOLVER_INSTRUCTIONS}
+        />
+        {event.args.image_urls && event.args.image_urls.length > 0 && (
+          <ImageCarousel size="small" images={event.args.image_urls} />
+        )}
+      </div>
+    );
+  }
+
+  if (isErrorObservation(event)) {
+    return (
+      <ErrorMessage
+        errorId={event.extras.error_id}
+        defaultMessage={event.message}
+      />
+    );
+  }
+
+  if (
+    hasObservationPair &&
+    isOpenHandsAction(event) &&
+    hasThoughtProperty(event.args)
+  ) {
+    return <ChatMessage type="agent" message={event.args.thought} />;
+  }
+
+  if (isFinishAction(event)) {
+    return (
+      <ChatMessage type="agent" message={getEventContent(event).details} />
+    );
+  }
+
+  if (isUserMessage(event) || isAssistantMessage(event)) {
+    return (
+      <ChatMessage
+        type={event.source}
+        message={isUserMessage(event) ? event.args.content : event.message}
+      >
+        {event.args.image_urls && event.args.image_urls.length > 0 && (
+          <ImageCarousel size="small" images={event.args.image_urls} />
+        )}
+        {shouldShowConfirmationButtons && <ConfirmationButtons />}
+      </ChatMessage>
+    );
+  }
+
+  if (isRejectObservation(event)) {
+    return <ChatMessage type="agent" message={event.content} />;
+  }
+
+  return (
+    <div>
+      {isOpenHandsAction(event) && hasThoughtProperty(event.args) && (
+        <ChatMessage type="agent" message={event.args.thought} />
+      )}
+
+      <GenericEventMessage
+        title={getEventContent(event).title}
+        details={getEventContent(event).details}
+        success={
+          isOpenHandsObservation(event)
+            ? getObservationResult(event)
+            : undefined
+        }
+      />
+
+      {shouldShowConfirmationButtons && <ConfirmationButtons />}
+    </div>
+  );
+}
@@ -0,0 +1,61 @@
+import React from "react";
+import Markdown from "react-markdown";
+import remarkGfm from "remark-gfm";
+import { code } from "../markdown/code";
+import { ol, ul } from "../markdown/list";
+import ArrowDown from "#/icons/angle-down-solid.svg?react";
+import ArrowUp from "#/icons/angle-up-solid.svg?react";
+import { SuccessIndicator } from "./success-indicator";
+import { ObservationResultStatus } from "./event-content-helpers/get-observation-result";
+
+interface GenericEventMessageProps {
+  title: React.ReactNode;
+  details: string;
+  success?: ObservationResultStatus;
+}
+
+export function GenericEventMessage({
+  title,
+  details,
+  success,
+}: GenericEventMessageProps) {
+  const [showDetails, setShowDetails] = React.useState(false);
+
+  return (
+    <div className="flex flex-col gap-2 border-l-2 pl-2 my-2 py-2 border-neutral-300 text-sm w-full">
+      <div className="flex items-center justify-between font-bold text-neutral-300">
+        <div>
+          {title}
+          {details && (
+            <button
+              type="button"
+              onClick={() => setShowDetails((prev) => !prev)}
+              className="cursor-pointer text-left"
+            >
+              {showDetails ? (
+                <ArrowUp className="h-4 w-4 ml-2 inline fill-neutral-300" />
+              ) : (
+                <ArrowDown className="h-4 w-4 ml-2 inline fill-neutral-300" />
+              )}
+            </button>
+          )}
+        </div>
+
+        {success && <SuccessIndicator status={success} />}
+      </div>
+
+      {showDetails && (
+        <Markdown
+          components={{
+            code,
+            ul,
+            ol,
+          }}
+          remarkPlugins={[remarkGfm]}
+        >
+          {details}
+        </Markdown>
+      )}
+    </div>
+  );
+}
@@ -1,80 +1,82 @@
 import React from "react";
-import type { Message } from "#/message";
-import { ChatMessage } from "#/components/features/chat/chat-message";
-import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
-import { ImageCarousel } from "../images/image-carousel";
-import { ExpandableMessage } from "./expandable-message";
 import { useUserConversation } from "#/hooks/query/use-user-conversation";
 import { useConversation } from "#/context/conversation-context";
-import { I18nKey } from "#/i18n/declaration";
+import { OpenHandsAction } from "#/types/core/actions";
+import { OpenHandsObservation } from "#/types/core/observations";
+import { isOpenHandsAction, isOpenHandsObservation } from "#/types/core/guards";
+import { OpenHandsEventType } from "#/types/core/base";
+import { EventMessage } from "./event-message";
+import { ChatMessage } from "./chat-message";
+import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
+
+const COMMON_NO_RENDER_LIST: OpenHandsEventType[] = [
+  "system",
+  "agent_state_changed",
+  "change_agent_state",
+];
+
+const ACTION_NO_RENDER_LIST: OpenHandsEventType[] = ["recall"];
+
+const shouldRenderEvent = (event: OpenHandsAction | OpenHandsObservation) => {
+  if (isOpenHandsAction(event)) {
+    const noRenderList = COMMON_NO_RENDER_LIST.concat(ACTION_NO_RENDER_LIST);
+    return !noRenderList.includes(event.action);
+  }
+
+  if (isOpenHandsObservation(event)) {
+    return !COMMON_NO_RENDER_LIST.includes(event.observation);
+  }
+
+  return true;
+};

 interface MessagesProps {
-  messages: Message[];
+  messages: (OpenHandsAction | OpenHandsObservation)[];
  isAwaitingUserConfirmation: boolean;
 }

 export const Messages: React.FC<MessagesProps> = React.memo(
  ({ messages, isAwaitingUserConfirmation }) => {
+    const { getOptimisticUserMessage } = useOptimisticUserMessage();
    const { conversationId } = useConversation();
    const { data: conversation } = useUserConversation(conversationId || null);

+    const optimisticUserMessage = getOptimisticUserMessage();
+
    // Check if conversation metadata has trigger=resolver
    const isResolverTrigger = conversation?.trigger === "resolver";

-    return messages.map((message, index) => {
-      const shouldShowConfirmationButtons =
-        messages.length - 1 === index &&
-        message.sender === "assistant" &&
-        isAwaitingUserConfirmation;
+    const actionHasObservationPair = React.useCallback(
+      (event: OpenHandsAction | OpenHandsObservation): boolean => {
+        if (isOpenHandsAction(event)) {
+          return !!messages.some(
+            (msg) => isOpenHandsObservation(msg) && msg.cause === event.id,
+          );
+        }

-      const isFirstUserMessageWithResolverTrigger =
-        index === 0 && message.sender === "user" && isResolverTrigger;
+        return false;
+      },
+      [messages],
+    );

-      // Special case: First user message with resolver trigger
-      if (isFirstUserMessageWithResolverTrigger) {
-        return (
-          <div key={index}>
-            <ExpandableMessage
-              type="action"
-              message={message.content}
-              id={I18nKey.CHAT$RESOLVER_INSTRUCTIONS}
-            />
-            {message.imageUrls && message.imageUrls.length > 0 && (
-              <ImageCarousel size="small" images={message.imageUrls} />
-            )}
-          </div>
-        );
-      }
+    return (
+      <>
+        {messages.filter(shouldRenderEvent).map((message, index) => (
+          <EventMessage
+            key={index}
+            event={message}
+            hasObservationPair={actionHasObservationPair(message)}
+            isFirstMessageWithResolverTrigger={index === 0 && isResolverTrigger}
+            isAwaitingUserConfirmation={isAwaitingUserConfirmation}
+            isLastMessage={messages.length - 1 === index}
+          />
+        ))}

-      if (message.type === "error" || message.type === "action") {
-        return (
-          <div key={index}>
-            <ExpandableMessage
-              type={message.type}
-              id={message.translationID}
-              message={message.content}
-              success={message.success}
-              observation={message.observation}
-              action={message.action}
-            />
-            {shouldShowConfirmationButtons && <ConfirmationButtons />}
-          </div>
-        );
-      }
-
-      return (
-        <ChatMessage
-          key={index}
-          type={message.sender}
-          message={message.content}
-        >
-          {message.imageUrls && message.imageUrls.length > 0 && (
-            <ImageCarousel size="small" images={message.imageUrls} />
-          )}
-          {shouldShowConfirmationButtons && <ConfirmationButtons />}
-        </ChatMessage>
-      );
-    });
+        {optimisticUserMessage && (
+          <ChatMessage type="user" message={optimisticUserMessage} />
+        )}
+      </>
+    );
  },
 );

@@ -0,0 +1,35 @@
+import { FaClock } from "react-icons/fa";
+import CheckCircle from "#/icons/check-circle-solid.svg?react";
+import XCircle from "#/icons/x-circle-solid.svg?react";
+import { ObservationResultStatus } from "./event-content-helpers/get-observation-result";
+
+interface SuccessIndicatorProps {
+  status: ObservationResultStatus;
+}
+
+export function SuccessIndicator({ status }: SuccessIndicatorProps) {
+  return (
+    <span className="flex-shrink-0">
+      {status === "success" && (
+        <CheckCircle
+          data-testid="status-icon"
+          className="h-4 w-4 ml-2 inline fill-success"
+        />
+      )}
+
+      {status === "error" && (
+        <XCircle
+          data-testid="status-icon"
+          className="h-4 w-4 ml-2 inline fill-danger"
+        />
+      )}
+
+      {status === "timeout" && (
+        <FaClock
+          data-testid="status-icon"
+          className="h-4 w-4 ml-2 inline fill-yellow-500"
+        />
+      )}
+    </span>
+  );
+}
@@ -9,7 +9,6 @@ import { AgentState } from "#/types/agent-state";
 import { useWsClient } from "#/context/ws-client-provider";
 import { IGNORE_TASK_STATE_MAP } from "#/ignore-task-state-map.constant";
 import { ActionButton } from "#/components/shared/buttons/action-button";
-import { AgentModeToggle } from "./agent-mode-toggle";

 export function AgentControlBar() {
  const { t } = useTranslation();
@@ -24,29 +23,25 @@ export function AgentControlBar() {

  return (
    <div className="flex justify-between items-center gap-20">
-      <div className="flex items-center gap-4">
-        <ActionButton
-          isDisabled={
-            curAgentState !== AgentState.RUNNING &&
-            curAgentState !== AgentState.PAUSED
-          }
-          content={
-            curAgentState === AgentState.PAUSED
-              ? t(I18nKey.AGENT$RESUME_TASK)
-              : t(I18nKey.AGENT$PAUSE_TASK)
-          }
-          action={
-            curAgentState === AgentState.PAUSED
-              ? AgentState.RUNNING
-              : AgentState.PAUSED
-          }
-          handleAction={handleAction}
-        >
-          {curAgentState === AgentState.PAUSED ? <PlayIcon /> : <PauseIcon />}
-        </ActionButton>
-
-        <AgentModeToggle />
-      </div>
+      <ActionButton
+        isDisabled={
+          curAgentState !== AgentState.RUNNING &&
+          curAgentState !== AgentState.PAUSED
+        }
+        content={
+          curAgentState === AgentState.PAUSED
+            ? t(I18nKey.AGENT$RESUME_TASK)
+            : t(I18nKey.AGENT$PAUSE_TASK)
+        }
+        action={
+          curAgentState === AgentState.PAUSED
+            ? AgentState.RUNNING
+            : AgentState.PAUSED
+        }
+        handleAction={handleAction}
+      >
+        {curAgentState === AgentState.PAUSED ? <PlayIcon /> : <PauseIcon />}
+      </ActionButton>
    </div>
  );
 }
@@ -1,72 +0,0 @@
-import { useSelector } from "react-redux";
-import { useTranslation } from "react-i18next";
-import { Switch } from "@heroui/react";
-import { useWsClient } from "#/context/ws-client-provider";
-import { RootState } from "#/store";
-import { cn } from "#/utils/utils";
-import {
-  generateDelegateToReadOnlyAction,
-  generateFinishDelegationAction,
-} from "#/services/agent-mode-service";
-import { AgentState } from "#/types/agent-state";
-import { I18nKey } from "#/i18n/declaration";
-
-export function AgentModeToggle() {
-  const { t } = useTranslation();
-  const { send } = useWsClient();
-
-  // Get agent type and state from Redux
-  const { currentAgentType, curAgentState } = useSelector(
-    (state: RootState) => state.agent,
-  );
-
-  // Compute if we're in read-only mode
-  const isReadOnly = currentAgentType === "ReadOnlyAgent";
-
-  // Check if toggle is disabled (should be disabled during certain agent states)
-  const isDisabled = [
-    AgentState.LOADING,
-    AgentState.INIT,
-    AgentState.ERROR,
-    AgentState.RATE_LIMITED,
-  ].includes(curAgentState);
-
-  const handleToggle = () => {
-    if (isReadOnly) {
-      // Currently in read-only mode, switch back to execute mode
-      send(generateFinishDelegationAction());
-    } else {
-      // Currently in execute mode, switch to read-only mode
-      send(generateDelegateToReadOnlyAction());
-    }
-  };
-
-  return (
-    <div className="flex items-center gap-2">
-      <Switch
-        isDisabled={isDisabled}
-        name="agent-mode"
-        isSelected={isReadOnly}
-        onValueChange={handleToggle}
-        classNames={{
-          thumb: cn("bg-white w-3 h-3"),
-          wrapper: cn(
-            "border border-[#D4D4D4] bg-white px-[6px] w-12 h-6",
-            "group-data-[selected=true]:border-transparent",
-            isReadOnly
-              ? "group-data-[selected=true]:bg-amber-600"
-              : "group-data-[selected=true]:bg-blue-600",
-          ),
-          label: "text-[#A3A3A3] text-xs",
-        }}
-      >
-        <span className="sr-only">{t(I18nKey.AGENT$MODE_TOGGLE_LABEL)}</span>
-        <span className="text-sm font-medium ml-2">
-          {isReadOnly
-            ? t(I18nKey.AGENT$MODE_READ_ONLY)
-            : t(I18nKey.AGENT$MODE_EXECUTE)}
-        </span>
-      </Switch>
-    </div>
-  );
-}
@@ -24,9 +24,7 @@ const notificationStates = [

 export function AgentStatusBar() {
  const { t, i18n } = useTranslation();
-  const { curAgentState, currentAgentType } = useSelector(
-    (state: RootState) => state.agent,
-  );
+  const { curAgentState } = useSelector((state: RootState) => state.agent);
  const { curStatusMessage } = useSelector((state: RootState) => state.status);
  const { status } = useWsClient();
  const { notify } = useNotification();
@@ -101,10 +99,6 @@ export function AgentStatusBar() {
    }
  }, [curAgentState, status, notify, t]);

-  // Determine agent mode badge color
-  const agentModeBadgeColor =
-    currentAgentType === "ReadOnlyAgent" ? "bg-amber-600" : "bg-blue-600";
-
  return (
    <div className="flex flex-col items-center">
      <div className="flex items-center bg-base-secondary px-2 py-1 text-gray-400 rounded-[100px] text-sm gap-[6px]">
@@ -112,15 +106,6 @@ export function AgentStatusBar() {
          className={`w-2 h-2 rounded-full animate-pulse ${indicatorColor}`}
        />
        <span className="text-sm text-stone-400">{t(statusMessage)}</span>
-
-        {/* Agent Mode Badge */}
-        <div
-          className={`ml-2 px-2 py-0.5 rounded-full text-xs text-white ${agentModeBadgeColor}`}
-        >
-          {currentAgentType === "ReadOnlyAgent"
-            ? t(I18nKey.AGENT$MODE_READ_ONLY)
-            : t(I18nKey.AGENT$MODE_EXECUTE)}
-        </div>
      </div>
    </div>
  );
@@ -15,8 +15,9 @@ import { cn } from "#/utils/utils";
 import { BaseModal } from "../../shared/modals/base-modal/base-modal";
 import { RootState } from "#/store";
 import { I18nKey } from "#/i18n/declaration";
-import { selectSystemMessage } from "#/state/chat-slice";
 import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
+import { useWsClient } from "#/context/ws-client-provider";
+import { isSystemMessage } from "#/types/core/guards";

 interface ConversationCardProps {
  onClick?: () => void;
@@ -52,15 +53,17 @@ export function ConversationCard({
  conversationId,
 }: ConversationCardProps) {
  const { t } = useTranslation();
+  const { parsedEvents } = useWsClient();
  const [contextMenuVisible, setContextMenuVisible] = React.useState(false);
  const [titleMode, setTitleMode] = React.useState<"view" | "edit">("view");
  const [metricsModalVisible, setMetricsModalVisible] = React.useState(false);
  const [systemModalVisible, setSystemModalVisible] = React.useState(false);
  const inputRef = React.useRef<HTMLInputElement>(null);

+  const systemMessage = parsedEvents.find(isSystemMessage);
+
  // Subscribe to metrics data from Redux store
  const metrics = useSelector((state: RootState) => state.metrics);
-  const systemMessage = useSelector(selectSystemMessage);

  const handleBlur = () => {
    if (inputRef.current?.value) {
@@ -365,7 +368,7 @@ export function ConversationCard({
      <SystemMessageModal
        isOpen={systemModalVisible}
        onClose={() => setSystemModalVisible(false)}
-        systemMessage={systemMessage}
+        systemMessage={systemMessage ? systemMessage.args : null}
      />
    </>
  );
@@ -6,6 +6,7 @@ import { cn } from "#/utils/utils";
 import { useUserRepositories } from "#/hooks/query/use-user-repositories";
 import { TaskIssueNumber } from "./task-issue-number";
 import { Provider } from "#/types/settings";
+import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";

 const getTaskTypeMap = (
  t: (key: string) => string,
@@ -21,6 +22,7 @@ interface TaskCardProps {
 }

 export function TaskCard({ task }: TaskCardProps) {
+  const { setOptimisticUserMessage } = useOptimisticUserMessage();
  const { data: repositories } = useUserRepositories();
  const { mutate: createConversation, isPending } = useCreateConversation();
  const isCreatingConversation = useIsCreatingConversation();
@@ -38,6 +40,7 @@ export function TaskCard({ task }: TaskCardProps) {

  const handleLaunchConversation = () => {
    const repo = getRepo(task.repo, task.git_provider);
+    setOptimisticUserMessage("Addressing task...");

    return createConversation({
      selectedRepository: repo,
@@ -24,6 +24,10 @@ export function JupyterCellOutput({ lines }: JupyterCellOutputProps) {
        {/* display the lines as plaintext or image */}
        {lines.map((line, index) => {
          if (line.type === "image") {
+            // Use markdown to display the image
+            const imageMarkdown = line.url
+              ? `![image](${line.url})`
+              : line.content;
            return (
              <div key={index}>
                <Markdown
@@ -32,7 +36,7 @@ export function JupyterCellOutput({ lines }: JupyterCellOutputProps) {
                  }}
                  urlTransform={(value: string) => value}
                >
-                  {line.content}
+                  {imageMarkdown}
                </Markdown>
              </div>
            );
@@ -12,8 +12,8 @@ export function JupyterCell({ cell }: JupyterCellProps) {
  const [lines, setLines] = React.useState<JupyterLine[]>([]);

  React.useEffect(() => {
-    setLines(parseCellContent(cell.content));
-  }, [cell.content]);
+    setLines(parseCellContent(cell.content, cell.imageUrls));
+  }, [cell.content, cell.imageUrls]);

  if (cell.type === "input") {
    return <JupytrerCellInput code={cell.content} />;
@@ -3,7 +3,7 @@ import { io, Socket } from "socket.io-client";
 import { useQueryClient } from "@tanstack/react-query";
 import EventLogger from "#/utils/event-logger";
 import { handleAssistantMessage } from "#/services/actions";
-import { showChatError } from "#/utils/error-handler";
+import { showChatError, trackError } from "#/utils/error-handler";
 import { useRate } from "#/hooks/use-rate";
 import { OpenHandsParsedEvent } from "#/types/core";
 import {
@@ -11,10 +11,26 @@ import {
  CommandAction,
  FileEditAction,
  FileWriteAction,
+  OpenHandsAction,
  UserMessageAction,
 } from "#/types/core/actions";
 import { Conversation } from "#/api/open-hands.types";
 import { useUserProviders } from "#/hooks/use-user-providers";
+import { OpenHandsObservation } from "#/types/core/observations";
+import {
+  isErrorObservation,
+  isOpenHandsAction,
+  isOpenHandsObservation,
+  isUserMessage,
+} from "#/types/core/guards";
+import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
+import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
+
+const hasValidMessageProperty = (obj: unknown): obj is { message: string } =>
+  typeof obj === "object" &&
+  obj !== null &&
+  "message" in obj &&
+  typeof obj.message === "string";

 const isOpenHandsEvent = (event: unknown): event is OpenHandsParsedEvent =>
  typeof event === "object" &&
@@ -35,14 +51,6 @@ const isFileEditAction = (
 const isCommandAction = (event: OpenHandsParsedEvent): event is CommandAction =>
  "action" in event && event.action === "run";

-const isUserMessage = (
-  event: OpenHandsParsedEvent,
-): event is UserMessageAction =>
-  "source" in event &&
-  "type" in event &&
-  event.source === "user" &&
-  event.type === "message";
-
 const isAssistantMessage = (
  event: OpenHandsParsedEvent,
 ): event is AssistantMessageAction =>
@@ -65,6 +73,7 @@ interface UseWsClient {
  status: WsClientProviderStatus;
  isLoadingMessages: boolean;
  events: Record<string, unknown>[];
+  parsedEvents: (OpenHandsAction | OpenHandsObservation)[];
  send: (event: Record<string, unknown>) => void;
 }

@@ -72,6 +81,7 @@ const WsClientContext = React.createContext<UseWsClient>({
  status: WsClientProviderStatus.DISCONNECTED,
  isLoadingMessages: true,
  events: [],
+  parsedEvents: [],
  send: () => {
    throw new Error("not connected");
  },
@@ -121,12 +131,17 @@ export function WsClientProvider({
  conversationId,
  children,
 }: React.PropsWithChildren<WsClientProviderProps>) {
+  const { removeOptimisticUserMessage } = useOptimisticUserMessage();
+  const { setErrorMessage, removeErrorMessage } = useWSErrorMessage();
  const queryClient = useQueryClient();
  const sioRef = React.useRef<Socket | null>(null);
  const [status, setStatus] = React.useState(
    WsClientProviderStatus.DISCONNECTED,
  );
  const [events, setEvents] = React.useState<Record<string, unknown>[]>([]);
+  const [parsedEvents, setParsedEvents] = React.useState<
+    (OpenHandsAction | OpenHandsObservation)[]
+  >([]);
  const lastEventRef = React.useRef<Record<string, unknown> | null>(null);
  const { providers } = useUserProviders();

@@ -146,6 +161,24 @@ export function WsClientProvider({

  function handleMessage(event: Record<string, unknown>) {
    if (isOpenHandsEvent(event)) {
+      if (isOpenHandsAction(event) || isOpenHandsObservation(event)) {
+        setParsedEvents((prevEvents) => [...prevEvents, event]);
+      }
+
+      if (isErrorObservation(event)) {
+        trackError({
+          message: event.message,
+          source: "chat",
+          metadata: { msgId: event.id },
+        });
+      } else {
+        removeErrorMessage();
+      }
+
+      if (isUserMessage(event)) {
+        removeOptimisticUserMessage();
+      }
+
      if (isMessageAction(event)) {
        messageRateHandler.record(new Date().getTime());
      }
@@ -156,7 +189,7 @@ export function WsClientProvider({
        isFileWriteAction(event) ||
        isCommandAction(event)
      ) {
-        queryClient.invalidateQueries({
+        queryClient.removeQueries({
          queryKey: ["file_changes", conversationId],
        });

@@ -202,11 +235,23 @@ export function WsClientProvider({
    sio.io.opts.query = sio.io.opts.query || {};
    sio.io.opts.query.latest_event_id = lastEventRef.current?.id;
    updateStatusWhenErrorMessagePresent(data);
+
+    setErrorMessage(
+      hasValidMessageProperty(data)
+        ? data.message
+        : "The WebSocket connection was closed.",
+    );
  }

  function handleError(data: unknown) {
    setStatus(WsClientProviderStatus.DISCONNECTED);
    updateStatusWhenErrorMessagePresent(data);
+
+    setErrorMessage(
+      hasValidMessageProperty(data)
+        ? data.message
+        : "An unknown error occurred on the WebSocket connection.",
+    );
  }

  React.useEffect(() => {
@@ -267,9 +312,10 @@ export function WsClientProvider({
      status,
      isLoadingMessages: messageRateHandler.isUnderThreshold,
      events,
+      parsedEvents,
      send,
    }),
-    [status, messageRateHandler.isUnderThreshold, events],
+    [status, messageRateHandler.isUnderThreshold, events, parsedEvents],
  );

  return <WsClientContext value={value}>{children}</WsClientContext>;
@@ -1,47 +0,0 @@
-import { useEffect } from "react";
-import { useDispatch } from "react-redux";
-import { setAgentType, setDelegationState } from "#/state/agent-slice";
-import ActionType from "#/types/action-type";
-
-/**
- * Hook to handle agent mode changes based on WebSocket events
- */
-export function useAgentModeHandler(events: Record<string, unknown>[]) {
-  const dispatch = useDispatch();
-
-  useEffect(() => {
-    // Process only the latest event
-    if (events.length === 0) return;
-
-    const latestEvent = events[events.length - 1];
-
-    // Handle agent delegation events
-    if (
-      "action" in latestEvent &&
-      latestEvent.action === ActionType.DELEGATE &&
-      "args" in latestEvent &&
-      typeof latestEvent.args === "object" &&
-      latestEvent.args !== null &&
-      "agent" in latestEvent.args
-    ) {
-      // A delegation is starting
-      dispatch(setDelegationState(true));
-      dispatch(setAgentType(latestEvent.args.agent as string));
-    }
-
-    // Handle agent delegate observation (delegation ended)
-    else if (
-      "observation" in latestEvent &&
-      latestEvent.observation === "delegate" &&
-      "data" in latestEvent &&
-      typeof latestEvent.data === "object" &&
-      latestEvent.data !== null &&
-      "status" in latestEvent.data &&
-      latestEvent.data.status === "finished"
-    ) {
-      // Delegation has ended, returning to parent agent
-      dispatch(setDelegationState(false));
-      dispatch(setAgentType("CodeActAgent")); // Reset to default agent
-    }
-  }, [events, dispatch]);
-}
@@ -1,19 +1,8 @@
 import React from "react";
-import { useDispatch } from "react-redux";
-import { useTranslation } from "react-i18next";
 import { useWsClient } from "#/context/ws-client-provider";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
-import { addErrorMessage } from "#/state/chat-slice";
 import { AgentState } from "#/types/agent-state";
-import { ErrorObservation } from "#/types/core/observations";
-import { useEndSession } from "./use-end-session";
-import {
-  displayErrorToast,
-  displaySuccessToast,
-} from "#/utils/custom-toast-handlers";
-import { setAgentType, setDelegationState } from "#/state/agent-slice";
-import ActionType from "#/types/action-type";
-import { I18nKey } from "#/i18n/declaration";
+import { displayErrorToast } from "#/utils/custom-toast-handlers";

 interface ServerError {
  error: boolean | string;
@@ -23,13 +12,8 @@ interface ServerError {

 const isServerError = (data: object): data is ServerError => "error" in data;

-const isErrorObservation = (data: object): data is ErrorObservation =>
-  "observation" in data && data.observation === "error";
-
 export const useHandleWSEvents = () => {
  const { events, send } = useWsClient();
-  const dispatch = useDispatch();
-  const { t } = useTranslation();

  React.useEffect(() => {
    if (!events.length) {
@@ -58,52 +42,5 @@ export const useHandleWSEvents = () => {
        send(generateAgentStateChangeEvent(AgentState.PAUSED));
      }
    }
-
-    if (isErrorObservation(event)) {
-      dispatch(
-        addErrorMessage({
-          id: event.extras?.error_id,
-          message: event.message,
-        }),
-      );
-    }
-
-    // Handle agent mode changes
-    // Handle agent delegation events
-    if (
-      "action" in event &&
-      event.action === ActionType.DELEGATE &&
-      "args" in event &&
-      typeof event.args === "object" &&
-      event.args !== null &&
-      "agent" in event.args
-    ) {
-      // A delegation is starting
-      const agentType = event.args.agent as string;
-      dispatch(setDelegationState(true));
-      dispatch(setAgentType(agentType));
-
-      // Show notification
-      if (agentType === "ReadOnlyAgent") {
-        displaySuccessToast(t(I18nKey.AGENT$MODE_READ_ONLY));
-      }
-    }
-    // Handle agent delegate observation (delegation ended)
-    else if (
-      "observation" in event &&
-      event.observation === "delegate" &&
-      "data" in event &&
-      typeof event.data === "object" &&
-      event.data !== null &&
-      "status" in event.data &&
-      event.data.status === "finished"
-    ) {
-      // Delegation has ended, returning to parent agent
-      dispatch(setDelegationState(false));
-      dispatch(setAgentType("CodeActAgent")); // Reset to default agent
-
-      // Show notification
-      displaySuccessToast(t(I18nKey.AGENT$MODE_EXECUTE));
-    }
-  }, [events.length, dispatch, send, t]);
+  }, [events.length]);
 };
@@ -0,0 +1,23 @@
+import { useQueryClient } from "@tanstack/react-query";
+
+export const useOptimisticUserMessage = () => {
+  const queryKey = ["optimistic_user_message"] as const;
+  const queryClient = useQueryClient();
+
+  const setOptimisticUserMessage = (message: string) => {
+    queryClient.setQueryData<string>(queryKey, message);
+  };
+
+  const getOptimisticUserMessage = () =>
+    queryClient.getQueryData<string>(queryKey);
+
+  const removeOptimisticUserMessage = () => {
+    queryClient.removeQueries({ queryKey });
+  };
+
+  return {
+    setOptimisticUserMessage,
+    getOptimisticUserMessage,
+    removeOptimisticUserMessage,
+  };
+};
@@ -0,0 +1,22 @@
+import { useQueryClient } from "@tanstack/react-query";
+
+export const useWSErrorMessage = () => {
+  const queryClient = useQueryClient();
+
+  const setErrorMessage = (message: string) => {
+    queryClient.setQueryData<string>(["error_message"], message);
+  };
+
+  const getErrorMessage = () =>
+    queryClient.getQueryData<string>(["error_message"]);
+
+  const removeErrorMessage = () => {
+    queryClient.removeQueries({ queryKey: ["error_message"] });
+  };
+
+  return {
+    setErrorMessage,
+    getErrorMessage,
+    removeErrorMessage,
+  };
+};
@@ -1,8 +1,5 @@
 // this file generate by script, don't modify it manually!!!
 export enum I18nKey {
-  AGENT$MODE_READ_ONLY = "AGENT$MODE_READ_ONLY",
-  AGENT$MODE_EXECUTE = "AGENT$MODE_EXECUTE",
-  AGENT$MODE_TOGGLE_LABEL = "AGENT$MODE_TOGGLE_LABEL",
  SECRETS$SECRET_VALUE_REQUIRED = "SECRETS$SECRET_VALUE_REQUIRED",
  SECRETS$ADD_SECRET = "SECRETS$ADD_SECRET",
  SECRETS$EDIT_SECRET = "SECRETS$EDIT_SECRET",
@@ -1,49 +1,4 @@
 {
-    "AGENT$MODE_READ_ONLY": {
-        "en": "Read-Only Mode",
-        "ja": "読み取り専用モード",
-        "zh-CN": "只读模式",
-        "zh-TW": "唯讀模式",
-        "ko-KR": "읽기 전용 모드",
-        "no": "Skrivebeskyttet modus",
-        "it": "Modalità di sola lettura",
-        "pt": "Modo somente leitura",
-        "es": "Modo de solo lectura",
-        "ar": "وضع القراءة فقط",
-        "fr": "Mode lecture seule",
-        "tr": "Salt okunur mod",
-        "de": "Nur-Lese-Modus"
-    },
-    "AGENT$MODE_EXECUTE": {
-        "en": "Execute Mode",
-        "ja": "実行モード",
-        "zh-CN": "执行模式",
-        "zh-TW": "執行模式",
-        "ko-KR": "실행 모드",
-        "no": "Utførelsesmodus",
-        "it": "Modalità di esecuzione",
-        "pt": "Modo de execução",
-        "es": "Modo de ejecución",
-        "ar": "وضع التنفيذ",
-        "fr": "Mode d'exécution",
-        "tr": "Yürütme modu",
-        "de": "Ausführungsmodus"
-    },
-    "AGENT$MODE_TOGGLE_LABEL": {
-        "en": "Toggle agent mode",
-        "ja": "エージェントモードを切り替える",
-        "zh-CN": "切换代理模式",
-        "zh-TW": "切換代理模式",
-        "ko-KR": "에이전트 모드 전환",
-        "no": "Bytt agentmodus",
-        "it": "Cambia modalità agente",
-        "pt": "Alternar modo do agente",
-        "es": "Cambiar modo del agente",
-        "ar": "تبديل وضع الوكيل",
-        "fr": "Basculer le mode de l'agent",
-        "tr": "Ajan modunu değiştir",
-        "de": "Agentenmodus umschalten"
-    },
    "SECRETS$SECRET_VALUE_REQUIRED": {
        "en": "Secret value is required",
        "ja": "シークレット値は必須です",
@@ -6429,20 +6384,20 @@
        "uk": "Завантажити файл"
    },
    "ACTION_MESSAGE$RUN": {
-        "en": "Running <cmd>{{action.payload.args.command}}</cmd>",
-        "zh-CN": "运行 <cmd>{{action.payload.args.command}}</cmd>",
-        "zh-TW": "執行 <cmd>{{action.payload.args.command}}</cmd>",
-        "ko-KR": "실행 <cmd>{{action.payload.args.command}}</cmd>",
-        "ja": "実行 <cmd>{{action.payload.args.command}}</cmd>",
-        "no": "Kjører <cmd>{{action.payload.args.command}}</cmd>",
-        "ar": "تشغيل <cmd>{{action.payload.args.command}}</cmd>",
-        "de": "Führt <cmd>{{action.payload.args.command}}</cmd> aus",
-        "fr": "Exécution de <cmd>{{action.payload.args.command}}</cmd>",
-        "it": "Esecuzione di <cmd>{{action.payload.args.command}}</cmd>",
-        "pt": "Executando <cmd>{{action.payload.args.command}}</cmd>",
-        "es": "Ejecutando <cmd>{{action.payload.args.command}}</cmd>",
-        "tr": "<cmd>{{action.payload.args.command}}</cmd> çalıştırılıyor",
-        "uk": "Виконую <cmd>{{action.payload.args.command}}</cmd>"
+        "en": "Running <cmd>{{command}}</cmd>",
+        "zh-CN": "运行 <cmd>{{command}}</cmd>",
+        "zh-TW": "執行 <cmd>{{command}}</cmd>",
+        "ko-KR": "실행 <cmd>{{command}}</cmd>",
+        "ja": "実行 <cmd>{{command}}</cmd>",
+        "no": "Kjører <cmd>{{command}}</cmd>",
+        "ar": "تشغيل <cmd>{{command}}</cmd>",
+        "de": "Führt <cmd>{{command}}</cmd> aus",
+        "fr": "Exécution de <cmd>{{command}}</cmd>",
+        "it": "Esecuzione di <cmd>{{command}}</cmd>",
+        "pt": "Executando <cmd>{{command}}</cmd>",
+        "es": "Ejecutando <cmd>{{command}}</cmd>",
+        "tr": "<cmd>{{command}}</cmd> çalıştırılıyor",
+        "uk": "Виконую <cmd>{{command}}</cmd>"
    },
    "ACTION_MESSAGE$RUN_IPYTHON": {
        "en": "Running a Python command",
@@ -6477,52 +6432,52 @@
        "uk": "Викликаю інструмент MCP: {{action.payload.args.name}}"
    },
    "ACTION_MESSAGE$READ": {
-        "en": "Reading <path>{{action.payload.args.path}}</path>",
-        "zh-CN": "读取 <path>{{action.payload.args.path}}</path>",
-        "zh-TW": "讀取 <path>{{action.payload.args.path}}</path>",
-        "ko-KR": "읽기 <path>{{action.payload.args.path}}</path>",
-        "ja": "読み取り <path>{{action.payload.args.path}}</path>",
-        "no": "Leser <path>{{action.payload.args.path}}</path>",
-        "ar": "قراءة <path>{{action.payload.args.path}}</path>",
-        "de": "Liest <path>{{action.payload.args.path}}</path>",
-        "fr": "Lecture de <path>{{action.payload.args.path}}</path>",
-        "it": "Lettura di <path>{{action.payload.args.path}}</path>",
-        "pt": "Lendo <path>{{action.payload.args.path}}</path>",
-        "es": "Leyendo <path>{{action.payload.args.path}}</path>",
-        "tr": "<path>{{action.payload.args.path}}</path> okunuyor",
-        "uk": "Читаю <path>{{action.payload.args.path}}</path>"
+        "en": "Reading <path>{{path}}</path>",
+        "zh-CN": "读取 <path>{{path}}</path>",
+        "zh-TW": "讀取 <path>{{path}}</path>",
+        "ko-KR": "읽기 <path>{{path}}</path>",
+        "ja": "読み取り <path>{{path}}</path>",
+        "no": "Leser <path>{{path}}</path>",
+        "ar": "قراءة <path>{{path}}</path>",
+        "de": "Liest <path>{{path}}</path>",
+        "fr": "Lecture de <path>{{path}}</path>",
+        "it": "Lettura di <path>{{path}}</path>",
+        "pt": "Lendo <path>{{path}}</path>",
+        "es": "Leyendo <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> okunuyor",
+        "uk": "Читаю <path>{{path}}</path>"
    },
    "ACTION_MESSAGE$EDIT": {
-        "en": "Editing <path>{{action.payload.args.path}}</path>",
-        "zh-CN": "编辑 <path>{{action.payload.args.path}}</path>",
-        "zh-TW": "編輯 <path>{{action.payload.args.path}}</path>",
-        "ko-KR": "편집 <path>{{action.payload.args.path}}</path>",
-        "ja": "編集 <path>{{action.payload.args.path}}</path>",
-        "no": "Redigerer <path>{{action.payload.args.path}}</path>",
-        "ar": "تحرير <path>{{action.payload.args.path}}</path>",
-        "de": "Bearbeitet <path>{{action.payload.args.path}}</path>",
-        "fr": "Modification de <path>{{action.payload.args.path}}</path>",
-        "it": "Modifica di <path>{{action.payload.args.path}}</path>",
-        "pt": "Editando <path>{{action.payload.args.path}}</path>",
-        "es": "Editando <path>{{action.payload.args.path}}</path>",
-        "tr": "<path>{{action.payload.args.path}}</path> düzenleniyor",
-        "uk": "Редагую <path>{{action.payload.args.path}}</path>"
+        "en": "Editing <path>{{path}}</path>",
+        "zh-CN": "编辑 <path>{{path}}</path>",
+        "zh-TW": "編輯 <path>{{path}}</path>",
+        "ko-KR": "편집 <path>{{path}}</path>",
+        "ja": "編集 <path>{{path}}</path>",
+        "no": "Redigerer <path>{{path}}</path>",
+        "ar": "تحرير <path>{{path}}</path>",
+        "de": "Bearbeitet <path>{{path}}</path>",
+        "fr": "Modification de <path>{{path}}</path>",
+        "it": "Modifica di <path>{{path}}</path>",
+        "pt": "Editando <path>{{path}}</path>",
+        "es": "Editando <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> düzenleniyor",
+        "uk": "Редагую <path>{{path}}</path>"
    },
    "ACTION_MESSAGE$WRITE": {
-        "en": "Writing to <path>{{action.payload.args.path}}</path>",
-        "zh-CN": "写入 <path>{{action.payload.args.path}}</path>",
-        "zh-TW": "寫入 <path>{{action.payload.args.path}}</path>",
-        "ko-KR": "쓰기 <path>{{action.payload.args.path}}</path>",
-        "ja": "書き込み <path>{{action.payload.args.path}}</path>",
-        "no": "Skriver til <path>{{action.payload.args.path}}</path>",
-        "ar": "الكتابة إلى <path>{{action.payload.args.path}}</path>",
-        "de": "Schreibt in <path>{{action.payload.args.path}}</path>",
-        "fr": "Écriture dans <path>{{action.payload.args.path}}</path>",
-        "it": "Scrittura su <path>{{action.payload.args.path}}</path>",
-        "pt": "Escrevendo em <path>{{action.payload.args.path}}</path>",
-        "es": "Escribiendo en <path>{{action.payload.args.path}}</path>",
-        "tr": "<path>{{action.payload.args.path}}</path> dosyasına yazılıyor",
-        "uk": "Записую в <path>{{action.payload.args.path}}</path>"
+        "en": "Writing to <path>{{path}}</path>",
+        "zh-CN": "写入 <path>{{path}}</path>",
+        "zh-TW": "寫入 <path>{{path}}</path>",
+        "ko-KR": "쓰기 <path>{{path}}</path>",
+        "ja": "書き込み <path>{{path}}</path>",
+        "no": "Skriver til <path>{{path}}</path>",
+        "ar": "الكتابة إلى <path>{{path}}</path>",
+        "de": "Schreibt in <path>{{path}}</path>",
+        "fr": "Écriture dans <path>{{path}}</path>",
+        "it": "Scrittura su <path>{{path}}</path>",
+        "pt": "Escrevendo em <path>{{path}}</path>",
+        "es": "Escribiendo en <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> dosyasına yazılıyor",
+        "uk": "Записую в <path>{{path}}</path>"
    },
    "ACTION_MESSAGE$BROWSE": {
        "en": "Browsing the web",
@@ -6589,20 +6544,20 @@
        "uk": "Системне повідомлення"
    },
    "OBSERVATION_MESSAGE$RUN": {
-        "en": "Ran <cmd>{{observation.payload.extras.command}}</cmd>",
-        "zh-CN": "运行 <cmd>{{observation.payload.extras.command}}</cmd>",
-        "zh-TW": "執行 <cmd>{{observation.payload.extras.command}}</cmd>",
-        "ko-KR": "실행 <cmd>{{observation.payload.extras.command}}</cmd>",
-        "ja": "実行 <cmd>{{observation.payload.extras.command}}</cmd>",
-        "no": "Kjørte <cmd>{{observation.payload.extras.command}}</cmd>",
-        "ar": "تم تشغيل <cmd>{{observation.payload.extras.command}}</cmd>",
-        "de": "Führte <cmd>{{observation.payload.extras.command}}</cmd> aus",
-        "fr": "A exécuté <cmd>{{observation.payload.extras.command}}</cmd>",
-        "it": "Ha eseguito <cmd>{{observation.payload.extras.command}}</cmd>",
-        "pt": "Executou <cmd>{{observation.payload.extras.command}}</cmd>",
-        "es": "Ejecutó <cmd>{{observation.payload.extras.command}}</cmd>",
-        "tr": "<cmd>{{observation.payload.extras.command}}</cmd> çalıştırıldı",
-        "uk": "Запустив <cmd>{{observation.payload.extras.command}}</cmd>"
+        "en": "Ran <cmd>{{command}}</cmd>",
+        "zh-CN": "运行 <cmd>{{command}}</cmd>",
+        "zh-TW": "執行 <cmd>{{command}}</cmd>",
+        "ko-KR": "실행 <cmd>{{command}}</cmd>",
+        "ja": "実行 <cmd>{{command}}</cmd>",
+        "no": "Kjørte <cmd>{{command}}</cmd>",
+        "ar": "تم تشغيل <cmd>{{command}}</cmd>",
+        "de": "Führte <cmd>{{command}}</cmd> aus",
+        "fr": "A exécuté <cmd>{{command}}</cmd>",
+        "it": "Ha eseguito <cmd>{{command}}</cmd>",
+        "pt": "Executou <cmd>{{command}}</cmd>",
+        "es": "Ejecutó <cmd>{{command}}</cmd>",
+        "tr": "<cmd>{{command}}</cmd> çalıştırıldı",
+        "uk": "Запустив <cmd>{{command}}</cmd>"
    },
    "OBSERVATION_MESSAGE$RUN_IPYTHON": {
        "en": "Ran a Python command",
@@ -6621,52 +6576,52 @@
        "uk": "Виконав команду Python"
    },
    "OBSERVATION_MESSAGE$READ": {
-        "en": "Read <path>{{observation.payload.extras.path}}</path>",
-        "zh-CN": "读取 <path>{{observation.payload.extras.path}}</path>",
-        "zh-TW": "讀取 <path>{{observation.payload.extras.path}}</path>",
-        "ko-KR": "읽기 <path>{{observation.payload.extras.path}}</path>",
-        "ja": "読み取り <path>{{observation.payload.extras.path}}</path>",
-        "no": "Leste <path>{{observation.payload.extras.path}}</path>",
-        "ar": "تمت قراءة <path>{{observation.payload.extras.path}}</path>",
-        "de": "Las <path>{{observation.payload.extras.path}}</path>",
-        "fr": "A lu <path>{{observation.payload.extras.path}}</path>",
-        "it": "Ha letto <path>{{observation.payload.extras.path}}</path>",
-        "pt": "Leu <path>{{observation.payload.extras.path}}</path>",
-        "es": "Leyó <path>{{observation.payload.extras.path}}</path>",
-        "tr": "<path>{{observation.payload.extras.path}}</path> okundu",
-        "uk": "Прочитав <path>{{observation.payload.extras.path}}</path>"
+        "en": "Read <path>{{path}}</path>",
+        "zh-CN": "读取 <path>{{path}}</path>",
+        "zh-TW": "讀取 <path>{{path}}</path>",
+        "ko-KR": "읽기 <path>{{path}}</path>",
+        "ja": "読み取り <path>{{path}}</path>",
+        "no": "Leste <path>{{path}}</path>",
+        "ar": "تمت قراءة <path>{{path}}</path>",
+        "de": "Las <path>{{path}}</path>",
+        "fr": "A lu <path>{{path}}</path>",
+        "it": "Ha letto <path>{{path}}</path>",
+        "pt": "Leu <path>{{path}}</path>",
+        "es": "Leyó <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> okundu",
+        "uk": "Прочитав <path>{{path}}</path>"
    },
    "OBSERVATION_MESSAGE$EDIT": {
-        "en": "Edited <path>{{observation.payload.extras.path}}</path>",
-        "zh-CN": "编辑 <path>{{observation.payload.extras.path}}</path>",
-        "zh-TW": "編輯 <path>{{observation.payload.extras.path}}</path>",
-        "ko-KR": "편집 <path>{{observation.payload.extras.path}}</path>",
-        "ja": "編集 <path>{{observation.payload.extras.path}}</path>",
-        "no": "Redigerte <path>{{observation.payload.extras.path}}</path>",
-        "ar": "تم تحرير <path>{{observation.payload.extras.path}}</path>",
-        "de": "Hat <path>{{observation.payload.extras.path}}</path> bearbeitet",
-        "fr": "A modifié <path>{{observation.payload.extras.path}}</path>",
-        "it": "Ha modificato <path>{{observation.payload.extras.path}}</path>",
-        "pt": "Editou <path>{{observation.payload.extras.path}}</path>",
-        "es": "Editó <path>{{observation.payload.extras.path}}</path>",
-        "tr": "<path>{{observation.payload.extras.path}}</path> düzenlendi",
-        "uk": "Відредагував <path>{{observation.payload.extras.path}}</path>"
+        "en": "Edited <path>{{path}}</path>",
+        "zh-CN": "编辑 <path>{{path}}</path>",
+        "zh-TW": "編輯 <path>{{path}}</path>",
+        "ko-KR": "편집 <path>{{path}}</path>",
+        "ja": "編集 <path>{{path}}</path>",
+        "no": "Redigerte <path>{{path}}</path>",
+        "ar": "تم تحرير <path>{{path}}</path>",
+        "de": "Hat <path>{{path}}</path> bearbeitet",
+        "fr": "A modifié <path>{{path}}</path>",
+        "it": "Ha modificato <path>{{path}}</path>",
+        "pt": "Editou <path>{{path}}</path>",
+        "es": "Editó <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> düzenlendi",
+        "uk": "Відредагував <path>{{path}}</path>"
    },
    "OBSERVATION_MESSAGE$WRITE": {
-        "en": "Wrote to <path>{{observation.payload.extras.path}}</path>",
-        "zh-CN": "写入 <path>{{observation.payload.extras.path}}</path>",
-        "zh-TW": "寫入 <path>{{observation.payload.extras.path}}</path>",
-        "ko-KR": "쓰기 <path>{{observation.payload.extras.path}}</path>",
-        "ja": "書き込み <path>{{observation.payload.extras.path}}</path>",
-        "no": "Skrev til <path>{{observation.payload.extras.path}}</path>",
-        "ar": "تمت الكتابة إلى <path>{{observation.payload.extras.path}}</path>",
-        "de": "Hat in <path>{{observation.payload.extras.path}}</path> geschrieben",
-        "fr": "A écrit dans <path>{{observation.payload.extras.path}}</path>",
-        "it": "Ha scritto su <path>{{observation.payload.extras.path}}</path>",
-        "pt": "Escreveu em <path>{{observation.payload.extras.path}}</path>",
-        "es": "Escribió en <path>{{observation.payload.extras.path}}</path>",
-        "tr": "<path>{{observation.payload.extras.path}}</path> dosyasına yazıldı",
-        "uk": "Записав на <path>{{observation.payload.extras.path}}</path>"
+        "en": "Wrote to <path>{{path}}</path>",
+        "zh-CN": "写入 <path>{{path}}</path>",
+        "zh-TW": "寫入 <path>{{path}}</path>",
+        "ko-KR": "쓰기 <path>{{path}}</path>",
+        "ja": "書き込み <path>{{path}}</path>",
+        "no": "Skrev til <path>{{path}}</path>",
+        "ar": "تمت الكتابة إلى <path>{{path}}</path>",
+        "de": "Hat in <path>{{path}}</path> geschrieben",
+        "fr": "A écrit dans <path>{{path}}</path>",
+        "it": "Ha scritto su <path>{{path}}</path>",
+        "pt": "Escreveu em <path>{{path}}</path>",
+        "es": "Escribió en <path>{{path}}</path>",
+        "tr": "<path>{{path}}</path> dosyasına yazıldı",
+        "uk": "Записав на <path>{{path}}</path>"
    },
    "OBSERVATION_MESSAGE$BROWSE": {
        "en": "Browsing completed",
@@ -13,7 +13,6 @@ import {
  useConversation,
 } from "#/context/conversation-context";
 import { Controls } from "#/components/features/controls/controls";
-import { clearMessages, addUserMessage } from "#/state/chat-slice";
 import { clearTerminal } from "#/state/command-slice";
 import { useEffectOnce } from "#/hooks/use-effect-once";
 import GlobeIcon from "#/icons/globe.svg?react";
@@ -34,7 +33,6 @@ import Security from "#/components/shared/modals/security/security";
 import { useUserConversation } from "#/hooks/query/use-user-conversation";
 import { ServedAppLabel } from "#/components/layout/served-app-label";
 import { useSettings } from "#/hooks/query/use-settings";
-import { clearFiles, clearInitialPrompt } from "#/state/initial-query-slice";
 import { RootState } from "#/store";
 import { displayErrorToast } from "#/utils/custom-toast-handlers";
 import { useDocumentTitleFromState } from "#/hooks/use-document-title-from-state";
@@ -49,9 +47,7 @@ function AppContent() {
  const { data: conversation, isFetched } = useUserConversation(
    conversationId || null,
  );
-  const { initialPrompt, files } = useSelector(
-    (state: RootState) => state.initialQuery,
-  );
+
  const { curAgentState } = useSelector((state: RootState) => state.agent);
  const dispatch = useDispatch();
  const navigate = useNavigate();
@@ -71,25 +67,11 @@ function AppContent() {
  }, [conversation, isFetched]);

  React.useEffect(() => {
-    dispatch(clearMessages());
    dispatch(clearTerminal());
    dispatch(clearJupyter());
-    if (conversationId && (initialPrompt || files.length > 0)) {
-      dispatch(
-        addUserMessage({
-          content: initialPrompt || "",
-          imageUrls: files || [],
-          timestamp: new Date().toISOString(),
-          pending: true,
-        }),
-      );
-      dispatch(clearInitialPrompt());
-      dispatch(clearFiles());
-    }
  }, [conversationId]);

  useEffectOnce(() => {
-    dispatch(clearMessages());
    dispatch(clearTerminal());
    dispatch(clearJupyter());
  });
@@ -4,7 +4,6 @@ import { StatusMessage } from "#/types/message";
 import { queryClient } from "#/query-client-config";
 import store from "#/store";
 import { setCurStatusMessage } from "#/state/status-slice";
-import { addErrorMessage } from "#/state/chat-slice";
 import { trackError } from "#/utils/error-handler";

 // Mock dependencies
@@ -101,9 +100,6 @@ describe("handleStatusMessage", () => {
      metadata: { msgId: "ERROR_ID" },
    });

-    // Verify that store.dispatch was called with addErrorMessage
-    expect(store.dispatch).toHaveBeenCalledWith(addErrorMessage(statusMessage));
-
    // Verify that queryClient.invalidateQueries was not called
    expect(queryClient.invalidateQueries).not.toHaveBeenCalled();
  });
@@ -1,13 +1,5 @@
-import {
-  addAssistantMessage,
-  addAssistantAction,
-  addUserMessage,
-  addErrorMessage,
-} from "#/state/chat-slice";
 import { trackError } from "#/utils/error-handler";
 import { appendSecurityAnalyzerInput } from "#/state/security-analyzer-slice";
-import { setCode, setActiveFilepath } from "#/state/code-slice";
-import { appendJupyterInput } from "#/state/jupyter-slice";
 import { setCurStatusMessage } from "#/state/status-slice";
 import { setMetrics } from "#/state/metrics-slice";
 import store from "#/store";
@@ -21,67 +13,6 @@ import { handleObservationMessage } from "./observations";
 import { appendInput } from "#/state/command-slice";
 import { queryClient } from "#/query-client-config";

-const messageActions = {
-  [ActionType.BROWSE]: (message: ActionMessage) => {
-    if (!message.args.thought && message.message) {
-      store.dispatch(addAssistantMessage(message.message));
-    }
-  },
-  [ActionType.BROWSE_INTERACTIVE]: (message: ActionMessage) => {
-    if (!message.args.thought && message.message) {
-      store.dispatch(addAssistantMessage(message.message));
-    }
-  },
-  [ActionType.WRITE]: (message: ActionMessage) => {
-    const { path, content } = message.args;
-    store.dispatch(setActiveFilepath(path));
-    store.dispatch(setCode(content));
-  },
-  [ActionType.MESSAGE]: (message: ActionMessage) => {
-    if (message.source === "user") {
-      store.dispatch(
-        addUserMessage({
-          content: message.args.content,
-          imageUrls:
-            typeof message.args.image_urls === "string"
-              ? [message.args.image_urls]
-              : message.args.image_urls,
-          timestamp: message.timestamp,
-          pending: false,
-        }),
-      );
-    } else {
-      store.dispatch(addAssistantMessage(message.args.content));
-    }
-  },
-  [ActionType.RUN_IPYTHON]: (message: ActionMessage) => {
-    if (message.args.confirmation_state !== "rejected") {
-      store.dispatch(appendJupyterInput(message.args.code));
-    }
-  },
-  [ActionType.FINISH]: (message: ActionMessage) => {
-    store.dispatch(addAssistantMessage(message.args.final_thought));
-    let successPrediction = "";
-    if (message.args.task_completed === "partial") {
-      successPrediction =
-        "I believe that the task was **completed partially**.";
-    } else if (message.args.task_completed === "false") {
-      successPrediction = "I believe that the task was **not completed**.";
-    } else if (message.args.task_completed === "true") {
-      successPrediction =
-        "I believe that the task was **completed successfully**.";
-    }
-    if (successPrediction) {
-      // if final_thought is not empty, add a new line before the success prediction
-      if (message.args.final_thought) {
-        store.dispatch(addAssistantMessage(`\n${successPrediction}`));
-      } else {
-        store.dispatch(addAssistantMessage(successPrediction));
-      }
-    }
-  },
-};
-
 export function handleActionMessage(message: ActionMessage) {
  if (message.args?.hidden) {
    return;
@@ -103,26 +34,6 @@ export function handleActionMessage(message: ActionMessage) {
  if ("args" in message && "security_risk" in message.args) {
    store.dispatch(appendSecurityAnalyzerInput(message));
  }
-
-  if (message.source === "agent") {
-    // Only add thought as a message if it's not a "think" action
-    if (
-      message.args &&
-      message.args.thought &&
-      message.action !== ActionType.THINK
-    ) {
-      store.dispatch(addAssistantMessage(message.args.thought));
-    }
-    // Need to convert ActionMessage to RejectAction
-    // @ts-expect-error TODO: fix
-    store.dispatch(addAssistantAction(message));
-  }
-
-  if (message.action in messageActions) {
-    const actionFn =
-      messageActions[message.action as keyof typeof messageActions];
-    actionFn(message);
-  }
 }

 export function handleStatusMessage(message: StatusMessage) {
@@ -146,11 +57,6 @@ export function handleStatusMessage(message: StatusMessage) {
      source: "chat",
      metadata: { msgId: message.id },
    });
-    store.dispatch(
-      addErrorMessage({
-        ...message,
-      }),
-    );
  }
 }

@@ -161,33 +67,5 @@ export function handleAssistantMessage(message: Record<string, unknown>) {
    handleObservationMessage(message as unknown as ObservationMessage);
  } else if (message.status_update) {
    handleStatusMessage(message as unknown as StatusMessage);
-  } else if (message.error) {
-    // Handle error messages from the server
-    const errorMessage =
-      typeof message.message === "string"
-        ? message.message
-        : String(message.message || "Unknown error");
-    trackError({
-      message: errorMessage,
-      source: "websocket",
-      metadata: { raw_message: message },
-    });
-    store.dispatch(
-      addErrorMessage({
-        message: errorMessage,
-      }),
-    );
-  } else {
-    const errorMsg = "Unknown message type received";
-    trackError({
-      message: errorMsg,
-      source: "chat",
-      metadata: { raw_message: message },
-    });
-    store.dispatch(
-      addErrorMessage({
-        message: errorMsg,
-      }),
-    );
  }
 }
@@ -1,24 +0,0 @@
-import ActionType from "#/types/action-type";
-
-export const generateDelegateToReadOnlyAction = () => ({
-  action: ActionType.DELEGATE,
-  args: {
-    agent: "ReadOnlyAgent",
-    inputs: {
-      task: "Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes.",
-    },
-    thought: "Switching to read-only mode at user's request",
-  },
-});
-
-export const generateFinishDelegationAction = () => ({
-  action: ActionType.FINISH,
-  args: {
-    message:
-      "Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.",
-    task_completed: "true",
-    outputs: {
-      mode_switch: true,
-    },
-  },
-});
@@ -2,14 +2,9 @@ import { setCurrentAgentState } from "#/state/agent-slice";
 import { setUrl, setScreenshotSrc } from "#/state/browser-slice";
 import store from "#/store";
 import { ObservationMessage } from "#/types/message";
-import { AgentState } from "#/types/agent-state";
 import { appendOutput } from "#/state/command-slice";
 import { appendJupyterOutput } from "#/state/jupyter-slice";
 import ObservationType from "#/types/observation-type";
-import {
-  addAssistantMessage,
-  addAssistantObservation,
-} from "#/state/chat-slice";

 export function handleObservationMessage(message: ObservationMessage) {
  switch (message.observation) {
@@ -26,8 +21,14 @@ export function handleObservationMessage(message: ObservationMessage) {
      break;
    }
    case ObservationType.RUN_IPYTHON:
-      // FIXME: render this as markdown
-      store.dispatch(appendJupyterOutput(message.content));
+      store.dispatch(
+        appendJupyterOutput({
+          content: message.content,
+          imageUrls: Array.isArray(message.extras?.image_urls)
+            ? message.extras.image_urls
+            : undefined,
+        }),
+      );
      break;
    case ObservationType.BROWSE:
    case ObservationType.BROWSE_INTERACTIVE:
@@ -42,11 +43,6 @@ export function handleObservationMessage(message: ObservationMessage) {
      store.dispatch(setCurrentAgentState(message.extras.agent_state));
      break;
    case ObservationType.DELEGATE:
-      // TODO: better UI for delegation result (#2309)
-      if (message.content) {
-        store.dispatch(addAssistantMessage(message.content));
-      }
-      break;
    case ObservationType.READ:
    case ObservationType.EDIT:
    case ObservationType.THINK:
@@ -56,107 +52,13 @@ export function handleObservationMessage(message: ObservationMessage) {
    case ObservationType.MCP:
      break; // We don't display the default message for these observations
    default:
-      store.dispatch(addAssistantMessage(message.message));
      break;
  }
  if (!message.extras?.hidden) {
    // Convert the message to the appropriate observation type
    const { observation } = message;
-    const baseObservation = {
-      ...message,
-      source: "agent" as const,
-    };

    switch (observation) {
-      case "agent_state_changed":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "agent_state_changed" as const,
-            extras: {
-              agent_state: (message.extras.agent_state as AgentState) || "idle",
-            },
-          }),
-        );
-        break;
-      case "recall":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "recall" as const,
-            extras: {
-              ...(message.extras || {}),
-              recall_type:
-                (message.extras?.recall_type as
-                  | "workspace_context"
-                  | "knowledge") || "knowledge",
-            },
-          }),
-        );
-        break;
-      case "run":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "run" as const,
-            extras: {
-              command: String(message.extras.command || ""),
-              metadata: message.extras.metadata,
-              hidden: Boolean(message.extras.hidden),
-            },
-          }),
-        );
-        break;
-      case "read":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation,
-            extras: {
-              path: String(message.extras.path || ""),
-              impl_source: String(message.extras.impl_source || ""),
-            },
-          }),
-        );
-        break;
-      case "edit":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation,
-            extras: {
-              path: String(message.extras.path || ""),
-              diff: String(message.extras.diff || ""),
-              impl_source: String(message.extras.impl_source || ""),
-            },
-          }),
-        );
-        break;
-      case "run_ipython":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "run_ipython" as const,
-            extras: {
-              code: String(message.extras.code || ""),
-            },
-          }),
-        );
-        break;
-      case "delegate":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "delegate" as const,
-            extras: {
-              outputs:
-                typeof message.extras.outputs === "object"
-                  ? (message.extras.outputs as Record<string, unknown>)
-                  : {},
-            },
-          }),
-        );
-        break;
      case "browse":
        if (message.extras?.screenshot) {
          store.dispatch(setScreenshotSrc(message.extras.screenshot));
@@ -164,45 +66,6 @@ export function handleObservationMessage(message: ObservationMessage) {
        if (message.extras?.url) {
          store.dispatch(setUrl(message.extras.url));
        }
-
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "browse" as const,
-            extras: {
-              url: String(message.extras.url || ""),
-              screenshot: String(message.extras.screenshot || ""),
-              error: Boolean(message.extras.error),
-              open_page_urls: Array.isArray(message.extras.open_page_urls)
-                ? message.extras.open_page_urls
-                : [],
-              active_page_index: Number(message.extras.active_page_index || 0),
-              dom_object:
-                typeof message.extras.dom_object === "object"
-                  ? (message.extras.dom_object as Record<string, unknown>)
-                  : {},
-              axtree_object:
-                typeof message.extras.axtree_object === "object"
-                  ? (message.extras.axtree_object as Record<string, unknown>)
-                  : {},
-              extra_element_properties:
-                typeof message.extras.extra_element_properties === "object"
-                  ? (message.extras.extra_element_properties as Record<
-                      string,
-                      unknown
-                    >)
-                  : {},
-              last_browser_action: String(
-                message.extras.last_browser_action || "",
-              ),
-              last_browser_action_error:
-                message.extras.last_browser_action_error,
-              focused_element_bid: String(
-                message.extras.focused_element_bid || "",
-              ),
-            },
-          }),
-        );
        break;
      case "browse_interactive":
        if (message.extras?.screenshot) {
@@ -211,65 +74,6 @@ export function handleObservationMessage(message: ObservationMessage) {
        if (message.extras?.url) {
          store.dispatch(setUrl(message.extras.url));
        }
-
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "browse_interactive" as const,
-            extras: {
-              url: String(message.extras.url || ""),
-              screenshot: String(message.extras.screenshot || ""),
-              error: Boolean(message.extras.error),
-              open_page_urls: Array.isArray(message.extras.open_page_urls)
-                ? message.extras.open_page_urls
-                : [],
-              active_page_index: Number(message.extras.active_page_index || 0),
-              dom_object:
-                typeof message.extras.dom_object === "object"
-                  ? (message.extras.dom_object as Record<string, unknown>)
-                  : {},
-              axtree_object:
-                typeof message.extras.axtree_object === "object"
-                  ? (message.extras.axtree_object as Record<string, unknown>)
-                  : {},
-              extra_element_properties:
-                typeof message.extras.extra_element_properties === "object"
-                  ? (message.extras.extra_element_properties as Record<
-                      string,
-                      unknown
-                    >)
-                  : {},
-              last_browser_action: String(
-                message.extras.last_browser_action || "",
-              ),
-              last_browser_action_error:
-                message.extras.last_browser_action_error,
-              focused_element_bid: String(
-                message.extras.focused_element_bid || "",
-              ),
-            },
-          }),
-        );
-        break;
-      case "error":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "error" as const,
-            source: "user" as const,
-            extras: {
-              error_id: message.extras.error_id,
-            },
-          }),
-        );
-        break;
-      case "mcp":
-        store.dispatch(
-          addAssistantObservation({
-            ...baseObservation,
-            observation: "mcp" as const,
-          }),
-        );
        break;
      default:
        // For any unhandled observation types, just ignore them
@@ -5,23 +5,14 @@ export const agentSlice = createSlice({
  name: "agent",
  initialState: {
    curAgentState: AgentState.LOADING,
-    currentAgentType: "CodeActAgent", // Default agent type
-    isDelegated: false, // Track if we're in a delegation
  },
  reducers: {
    setCurrentAgentState: (state, action) => {
      state.curAgentState = action.payload;
    },
-    setAgentType: (state, action) => {
-      state.currentAgentType = action.payload;
-    },
-    setDelegationState: (state, action) => {
-      state.isDelegated = action.payload;
-    },
  },
 });

-export const { setCurrentAgentState, setAgentType, setDelegationState } =
-  agentSlice.actions;
+export const { setCurrentAgentState } = agentSlice.actions;

 export default agentSlice.reducer;
@@ -1,380 +0,0 @@
-import { createSlice, PayloadAction } from "@reduxjs/toolkit";
-import type { Message } from "#/message";
-
-import { ActionSecurityRisk } from "#/state/security-analyzer-slice";
-import { OpenHandsAction } from "#/types/core/actions";
-import { OpenHandsEventType } from "#/types/core/base";
-import {
-  CommandObservation,
-  IPythonObservation,
-  OpenHandsObservation,
-  RecallObservation,
-} from "#/types/core/observations";
-
-type SliceState = {
-  messages: Message[];
-  systemMessage: {
-    content: string;
-    tools: Array<Record<string, unknown>> | null;
-    openhands_version: string | null;
-    agent_class: string | null;
-  } | null;
-};
-
-const MAX_CONTENT_LENGTH = 1000;
-
-const HANDLED_ACTIONS: OpenHandsEventType[] = [
-  "run",
-  "run_ipython",
-  "write",
-  "read",
-  "browse",
-  "browse_interactive",
-  "edit",
-  "recall",
-  "think",
-  "system",
-  "call_tool_mcp",
-  "mcp",
-];
-
-function getRiskText(risk: ActionSecurityRisk) {
-  switch (risk) {
-    case ActionSecurityRisk.LOW:
-      return "Low Risk";
-    case ActionSecurityRisk.MEDIUM:
-      return "Medium Risk";
-    case ActionSecurityRisk.HIGH:
-      return "High Risk";
-    case ActionSecurityRisk.UNKNOWN:
-    default:
-      return "Unknown Risk";
-  }
-}
-
-const initialState: SliceState = {
-  messages: [],
-  systemMessage: null,
-};
-
-export const chatSlice = createSlice({
-  name: "chat",
-  initialState,
-  reducers: {
-    addUserMessage(
-      state,
-      action: PayloadAction<{
-        content: string;
-        imageUrls: string[];
-        timestamp: string;
-        pending?: boolean;
-      }>,
-    ) {
-      const message: Message = {
-        type: "thought",
-        sender: "user",
-        content: action.payload.content,
-        imageUrls: action.payload.imageUrls,
-        timestamp: action.payload.timestamp || new Date().toISOString(),
-        pending: !!action.payload.pending,
-      };
-      // Remove any pending messages
-      let i = state.messages.length;
-      while (i) {
-        i -= 1;
-        const m = state.messages[i] as Message;
-        if (m.pending) {
-          state.messages.splice(i, 1);
-        }
-      }
-      state.messages.push(message);
-    },
-
-    addAssistantMessage(state: SliceState, action: PayloadAction<string>) {
-      const message: Message = {
-        type: "thought",
-        sender: "assistant",
-        content: action.payload,
-        imageUrls: [],
-        timestamp: new Date().toISOString(),
-        pending: false,
-      };
-      state.messages.push(message);
-    },
-
-    addAssistantAction(
-      state: SliceState,
-      action: PayloadAction<OpenHandsAction>,
-    ) {
-      const actionID = action.payload.action;
-      if (!HANDLED_ACTIONS.includes(actionID)) {
-        return;
-      }
-      const translationID = `ACTION_MESSAGE$${actionID.toUpperCase()}`;
-      let text = "";
-
-      if (actionID === "system") {
-        // Store the system message in the state
-        state.systemMessage = {
-          content: action.payload.args.content,
-          tools: action.payload.args.tools,
-          openhands_version: action.payload.args.openhands_version,
-          agent_class: action.payload.args.agent_class,
-        };
-        // Don't add a message for system actions
-        return;
-      }
-      if (actionID === "run") {
-        text = `Command:\n\`${action.payload.args.command}\``;
-      } else if (actionID === "run_ipython") {
-        text = `\`\`\`\n${action.payload.args.code}\n\`\`\``;
-      } else if (actionID === "write") {
-        let { content } = action.payload.args;
-        if (content.length > MAX_CONTENT_LENGTH) {
-          content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
-        }
-        text = `${action.payload.args.path}\n${content}`;
-      } else if (actionID === "browse") {
-        text = `Browsing ${action.payload.args.url}`;
-      } else if (actionID === "browse_interactive") {
-        // Include the browser_actions in the content
-        text = `**Action:**\n\n\`\`\`python\n${action.payload.args.browser_actions}\n\`\`\``;
-      } else if (actionID === "recall") {
-        // skip recall actions
-        return;
-      } else if (actionID === "call_tool_mcp") {
-        // Format MCP action with name and arguments
-        const name = action.payload.args.name || "";
-        const args = action.payload.args.arguments || {};
-        text = `**MCP Tool Call:** ${name}\n\n`;
-        // Include thought if available
-        if (action.payload.args.thought) {
-          text += `\n\n**Thought:**\n${action.payload.args.thought}`;
-        }
-        text += `\n\n**Arguments:**\n\`\`\`json\n${JSON.stringify(args, null, 2)}\n\`\`\``;
-      }
-      if (actionID === "run" || actionID === "run_ipython") {
-        if (
-          action.payload.args.confirmation_state === "awaiting_confirmation"
-        ) {
-          text += `\n\n${getRiskText(action.payload.args.security_risk as unknown as ActionSecurityRisk)}`;
-        }
-      } else if (actionID === "think") {
-        text = action.payload.args.thought;
-      }
-      const message: Message = {
-        type: "action",
-        sender: "assistant",
-        translationID,
-        eventID: action.payload.id,
-        content: text,
-        imageUrls: [],
-        timestamp: new Date().toISOString(),
-        action,
-      };
-
-      state.messages.push(message);
-    },
-
-    addAssistantObservation(
-      state: SliceState,
-      observation: PayloadAction<OpenHandsObservation>,
-    ) {
-      const observationID = observation.payload.observation;
-      if (!HANDLED_ACTIONS.includes(observationID)) {
-        return;
-      }
-
-      // Special handling for RecallObservation - create a new message instead of updating an existing one
-      if (observationID === "recall") {
-        const recallObs = observation.payload as RecallObservation;
-        let content = ``;
-
-        // Handle workspace context
-        if (recallObs.extras.recall_type === "workspace_context") {
-          if (recallObs.extras.repo_name) {
-            content += `\n\n**Repository:** ${recallObs.extras.repo_name}`;
-          }
-          if (recallObs.extras.repo_directory) {
-            content += `\n\n**Directory:** ${recallObs.extras.repo_directory}`;
-          }
-          if (recallObs.extras.date) {
-            content += `\n\n**Date:** ${recallObs.extras.date}`;
-          }
-          if (
-            recallObs.extras.runtime_hosts &&
-            Object.keys(recallObs.extras.runtime_hosts).length > 0
-          ) {
-            content += `\n\n**Available Hosts**`;
-            for (const [host, port] of Object.entries(
-              recallObs.extras.runtime_hosts,
-            )) {
-              content += `\n\n- ${host} (port ${port})`;
-            }
-          }
-          if (
-            recallObs.extras.custom_secrets_descriptions &&
-            Object.keys(recallObs.extras.custom_secrets_descriptions).length > 0
-          ) {
-            content += `\n\n**Custom Secrets**`;
-            for (const [name, description] of Object.entries(
-              recallObs.extras.custom_secrets_descriptions,
-            )) {
-              content += `\n\n- $${name}: ${description}`;
-            }
-          }
-          if (recallObs.extras.repo_instructions) {
-            content += `\n\n**Repository Instructions:**\n\n${recallObs.extras.repo_instructions}`;
-          }
-          if (recallObs.extras.additional_agent_instructions) {
-            content += `\n\n**Additional Instructions:**\n\n${recallObs.extras.additional_agent_instructions}`;
-          }
-        }
-
-        // Create a new message for the observation
-        // Use the correct translation ID format that matches what's in the i18n file
-        const translationID = `OBSERVATION_MESSAGE$${observationID.toUpperCase()}`;
-
-        // Handle microagent knowledge
-        if (
-          recallObs.extras.microagent_knowledge &&
-          recallObs.extras.microagent_knowledge.length > 0
-        ) {
-          content += `\n\n**Triggered Microagent Knowledge:**`;
-          for (const knowledge of recallObs.extras.microagent_knowledge) {
-            content += `\n\n- **${knowledge.name}** (triggered by keyword: ${knowledge.trigger})\n\n\`\`\`\n${knowledge.content}\n\`\`\``;
-          }
-        }
-
-        const message: Message = {
-          type: "action",
-          sender: "assistant",
-          translationID,
-          eventID: observation.payload.id,
-          content,
-          imageUrls: [],
-          timestamp: new Date().toISOString(),
-          success: true,
-        };
-
-        state.messages.push(message);
-        return; // Skip the normal observation handling below
-      }
-
-      // Normal handling for other observation types
-      const translationID = `OBSERVATION_MESSAGE$${observationID.toUpperCase()}`;
-      const causeID = observation.payload.cause;
-      const causeMessage = state.messages.find(
-        (message) => message.eventID === causeID,
-      );
-      if (!causeMessage) {
-        return;
-      }
-      causeMessage.translationID = translationID;
-      causeMessage.observation = observation;
-      // Set success property based on observation type
-      if (observationID === "run") {
-        const commandObs = observation.payload as CommandObservation;
-        // If exit_code is -1, it means the command timed out, so we set success to undefined
-        // to not show any status indicator
-        if (commandObs.extras.metadata.exit_code === -1) {
-          causeMessage.success = undefined;
-        } else {
-          causeMessage.success = commandObs.extras.metadata.exit_code === 0;
-        }
-      } else if (observationID === "run_ipython") {
-        // For IPython, we consider it successful if there's no error message
-        const ipythonObs = observation.payload as IPythonObservation;
-        causeMessage.success = !ipythonObs.content
-          .toLowerCase()
-          .includes("error:");
-      } else if (observationID === "read" || observationID === "edit") {
-        // For read/edit operations, we consider it successful if there's content and no error
-
-        if (observation.payload.extras.impl_source === "oh_aci") {
-          causeMessage.success =
-            observation.payload.content.length > 0 &&
-            !observation.payload.content.startsWith("ERROR:\n");
-        } else {
-          causeMessage.success =
-            observation.payload.content.length > 0 &&
-            !observation.payload.content.toLowerCase().includes("error:");
-        }
-      }
-
-      if (observationID === "run" || observationID === "run_ipython") {
-        let { content } = observation.payload;
-        if (content.length > MAX_CONTENT_LENGTH) {
-          content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
-        }
-        content = `${causeMessage.content}\n\nOutput:\n\`\`\`\n${content.trim() || "[Command finished execution with no output]"}\n\`\`\``;
-        causeMessage.content = content; // Observation content includes the action
-      } else if (observationID === "read") {
-        causeMessage.content = `\`\`\`\n${observation.payload.content}\n\`\`\``; // Content is already truncated by the ACI
-      } else if (observationID === "edit") {
-        if (causeMessage.success) {
-          causeMessage.content = `\`\`\`diff\n${observation.payload.extras.diff}\n\`\`\``; // Content is already truncated by the ACI
-        } else {
-          causeMessage.content = observation.payload.content;
-        }
-      } else if (observationID === "browse") {
-        let content = `**URL:** ${observation.payload.extras.url}\n`;
-        if (observation.payload.extras.error) {
-          content += `\n\n**Error:**\n${observation.payload.extras.error}\n`;
-        }
-        content += `\n\n**Output:**\n${observation.payload.content}`;
-        if (content.length > MAX_CONTENT_LENGTH) {
-          content = `${content.slice(0, MAX_CONTENT_LENGTH)}...(truncated)`;
-        }
-        causeMessage.content = content;
-      } else if (observationID === "mcp") {
-        // For MCP observations, we want to show the content as formatted output
-        // similar to how run/run_ipython actions are handled
-        let { content } = observation.payload;
-        if (content.length > MAX_CONTENT_LENGTH) {
-          content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
-        }
-        content = `${causeMessage.content}\n\n**Output:**\n\`\`\`\n${content.trim() || "[MCP Tool finished execution with no output]"}\n\`\`\``;
-        causeMessage.content = content; // Observation content includes the action
-        // Set success based on whether there's an error message
-        causeMessage.success = !observation.payload.content
-          .toLowerCase()
-          .includes("error:");
-      }
-    },
-
-    addErrorMessage(
-      state: SliceState,
-      action: PayloadAction<{ id?: string; message: string }>,
-    ) {
-      const { id, message } = action.payload;
-      state.messages.push({
-        translationID: id,
-        content: message,
-        type: "error",
-        sender: "assistant",
-        timestamp: new Date().toISOString(),
-      });
-    },
-
-    clearMessages(state: SliceState) {
-      state.messages = [];
-      state.systemMessage = null;
-    },
-  },
-});
-
-export const {
-  addUserMessage,
-  addAssistantMessage,
-  addAssistantAction,
-  addAssistantObservation,
-  addErrorMessage,
-  clearMessages,
-} = chatSlice.actions;
-
-// Selectors
-export const selectSystemMessage = (state: { chat: SliceState }) =>
-  state.chat.systemMessage;
-
-export default chatSlice.reducer;
@@ -3,6 +3,7 @@ import { createSlice } from "@reduxjs/toolkit";
 export type Cell = {
  content: string;
  type: "input" | "output";
+  imageUrls?: string[];
 };

 const initialCells: Cell[] = [];
@@ -17,7 +18,11 @@ export const jupyterSlice = createSlice({
      state.cells.push({ content: action.payload, type: "input" });
    },
    appendJupyterOutput: (state, action) => {
-      state.cells.push({ content: action.payload, type: "output" });
+      state.cells.push({
+        content: action.payload.content,
+        type: "output",
+        imageUrls: action.payload.imageUrls,
+      });
    },
    clearJupyter: (state) => {
      state.cells = [];
@@ -1,7 +1,6 @@
 import { combineReducers, configureStore } from "@reduxjs/toolkit";
 import agentReducer from "./state/agent-slice";
 import browserReducer from "./state/browser-slice";
-import chatReducer from "./state/chat-slice";
 import codeReducer from "./state/code-slice";
 import fileStateReducer from "./state/file-state-slice";
 import initialQueryReducer from "./state/initial-query-slice";
@@ -15,7 +14,6 @@ export const rootReducer = combineReducers({
  fileState: fileStateReducer,
  initialQuery: initialQueryReducer,
  browser: browserReducer,
-  chat: chatReducer,
  code: codeReducer,
  cmd: commandReducer,
  agent: agentReducer,
@@ -2,6 +2,7 @@ export type OpenHandsEventType =
  | "message"
  | "system"
  | "agent_state_changed"
+  | "change_agent_state"
  | "run"
  | "read"
  | "write"
@@ -16,11 +17,14 @@ export type OpenHandsEventType =
  | "error"
  | "recall"
  | "mcp"
-  | "call_tool_mcp";
+  | "call_tool_mcp"
+  | "user_rejected";
+
+export type OpenHandsSourceType = "agent" | "user" | "environment";

 interface OpenHandsBaseEvent {
  id: number;
-  source: "agent" | "user";
+  source: OpenHandsSourceType;
  message: string;
  timestamp: string; // ISO 8601
 }
@@ -0,0 +1,59 @@
+import { OpenHandsParsedEvent } from ".";
+import {
+  UserMessageAction,
+  AssistantMessageAction,
+  OpenHandsAction,
+  SystemMessageAction,
+} from "./actions";
+import {
+  CommandObservation,
+  ErrorObservation,
+  OpenHandsObservation,
+} from "./observations";
+
+export const isOpenHandsAction = (
+  event: OpenHandsParsedEvent,
+): event is OpenHandsAction => "action" in event;
+
+export const isOpenHandsObservation = (
+  event: OpenHandsParsedEvent,
+): event is OpenHandsObservation => "observation" in event;
+
+export const isUserMessage = (
+  event: OpenHandsParsedEvent,
+): event is UserMessageAction =>
+  isOpenHandsAction(event) &&
+  event.source === "user" &&
+  event.action === "message";
+
+export const isAssistantMessage = (
+  event: OpenHandsParsedEvent,
+): event is AssistantMessageAction =>
+  isOpenHandsAction(event) &&
+  event.source === "agent" &&
+  (event.action === "message" || event.action === "finish");
+
+export const isErrorObservation = (
+  event: OpenHandsParsedEvent,
+): event is ErrorObservation =>
+  isOpenHandsObservation(event) && event.observation === "error";
+
+export const isCommandObservation = (
+  event: OpenHandsParsedEvent,
+): event is CommandObservation =>
+  isOpenHandsObservation(event) && event.observation === "run";
+
+export const isFinishAction = (
+  event: OpenHandsParsedEvent,
+): event is AssistantMessageAction =>
+  isOpenHandsAction(event) && event.action === "finish";
+
+export const isSystemMessage = (
+  event: OpenHandsParsedEvent,
+): event is SystemMessageAction =>
+  isOpenHandsAction(event) && event.action === "system";
+
+export const isRejectObservation = (
+  event: OpenHandsParsedEvent,
+): event is OpenHandsObservation =>
+  isOpenHandsObservation(event) && event.observation === "user_rejected";
@@ -23,6 +23,7 @@ export interface IPythonObservation
  source: "agent";
  extras: {
    code: string;
+    image_urls?: string[];
  };
 }

@@ -137,6 +138,14 @@ export interface MCPObservation extends OpenHandsObservationEvent<"mcp"> {
  };
 }

+export interface UserRejectedObservation
+  extends OpenHandsObservationEvent<"user_rejected"> {
+  source: "agent";
+  extras: {
+    // Add any specific fields for MCP observations
+  };
+}
+
 export type OpenHandsObservation =
  | AgentStateChangeObservation
  | AgentThinkObservation
@@ -150,4 +159,5 @@ export type OpenHandsObservation =
  | EditObservation
  | ErrorObservation
  | RecallObservation
-  | MCPObservation;
+  | MCPObservation
+  | UserRejectedObservation;
@@ -1,26 +1,32 @@
-export type JupyterLine = { type: "plaintext" | "image"; content: string };
+export type JupyterLine = {
+  type: "plaintext" | "image";
+  content: string;
+  url?: string;
+};

-const IMAGE_PREFIX = "![image](data:image/png;base64,";
-
-export const parseCellContent = (content: string) => {
+export const parseCellContent = (content: string, imageUrls?: string[]) => {
  const lines: JupyterLine[] = [];
  let currentText = "";

+  // First, process the text content
  for (const line of content.split("\n")) {
-    if (line.startsWith(IMAGE_PREFIX)) {
-      if (currentText) {
-        lines.push({ type: "plaintext", content: currentText });
-        currentText = ""; // Reset after pushing plaintext
-      }
-      lines.push({ type: "image", content: line });
-    } else {
-      currentText += `${line}\n`;
-    }
+    currentText += `${line}\n`;
  }

  if (currentText) {
    lines.push({ type: "plaintext", content: currentText });
  }

+  // Then, add image lines if we have image URLs
+  if (imageUrls && imageUrls.length > 0) {
+    imageUrls.forEach((url) => {
+      lines.push({
+        type: "image",
+        content: `![image](${url})`,
+        url,
+      });
+    });
+  }
+
  return lines;
 };
@@ -37,3 +37,8 @@ Today's date is {{ runtime_info.date }} (UTC).
 {% endif %}
 </RUNTIME_INFORMATION>
 {% endif %}
+{% if runtime_info and runtime_info.context_message -%}
+<CONTEXT_MESSAGE>
+{{ runtime_info.context_message }}
+</CONTEXT_MESSAGE>
+{% endif %}
@@ -251,7 +251,8 @@ async def run_session(
    )

    # Add MCP tools to the agent
-    await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
+    if agent.config.enable_mcp:
+        await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)

    # Clear loading animation
    is_loaded.set()
@@ -28,6 +28,8 @@ class AgentConfig(BaseModel):
    """Whether to enable finish tool"""
    enable_prompt_extensions: bool = Field(default=True)
    """Whether to enable prompt extensions"""
+    enable_mcp: bool = Field(default=True)
+    """Whether to enable MCP tools"""
    disabled_microagents: list[str] = Field(default_factory=list)
    """A list of microagents to disable (by name, without .py extension, e.g. ["github", "lint"]). Default is None."""
    enable_history_truncation: bool = Field(default=True)
@@ -129,7 +129,8 @@ async def run_controller(
        )

    # Add MCP tools to the agent
-    await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)
+    if agent.config.enable_mcp:
+        await add_mcp_tools_to_agent(agent, runtime, memory, config.mcp)

    replay_events: list[Event] | None = None
    if config.replay_trajectory_path:
@@ -154,7 +154,7 @@ def create_memory(

    if runtime:
        # sets available hosts
-        memory.set_runtime_info(runtime, {})
+        memory.set_contextual_info(runtime, {})

        # loads microagents from repo/.openhands/microagents
        microagents: list[BaseMicroagent] = runtime.get_microagents_from_selected_repo(
@@ -75,6 +75,7 @@ class RecallObservation(Observation):
    additional_agent_instructions: str = ''
    date: str = ''
    custom_secrets_descriptions: dict[str, str] = field(default_factory=dict)
+    context_message: str | None = None

    # knowledge
    microagent_knowledge: list[MicroagentKnowledge] = field(default_factory=list)
@@ -170,6 +170,7 @@ class IPythonRunCellObservation(Observation):

    code: str
    observation: str = ObservationType.RUN_IPYTHON
+    image_urls: list[str] | None = None

    @property
    def error(self) -> bool:
@@ -184,4 +185,7 @@ class IPythonRunCellObservation(Observation):
        return True  # IPython cells are always considered successful

    def __str__(self) -> str:
-        return f'**IPythonRunCellObservation**\n{self.content}'
+        result = f'**IPythonRunCellObservation**\n{self.content}'
+        if self.image_urls:
+            result += f'\nImages: {len(self.image_urls)}'
+        return result
@@ -41,7 +41,7 @@ from openhands.events.observation.error import ErrorObservation
 from openhands.events.observation.mcp import MCPObservation
 from openhands.events.observation.observation import Observation
 from openhands.events.serialization.event import truncate_content
-from openhands.utils.prompt import PromptManager, RepositoryInfo, RuntimeInfo
+from openhands.utils.prompt import PromptManager, RepositoryInfo, ContextualInfo


 class ConversationMemory:
@@ -360,7 +360,7 @@ class ConversationMemory:
            message = Message(role='user', content=[TextContent(text=text)])
        elif isinstance(obs, IPythonRunCellObservation):
            text = obs.content
-            # replace base64 images with a placeholder
+            # Clean up any remaining base64 images in text content
            splitted = text.split('\n')
            for i, line in enumerate(splitted):
                if '![image](data:image/png;base64,' in line:
@@ -369,7 +369,15 @@ class ConversationMemory:
                    )
            text = '\n'.join(splitted)
            text = truncate_content(text, max_message_chars)
-            message = Message(role='user', content=[TextContent(text=text)])
+
+            # Create message content with text
+            content = [TextContent(text=text)]
+
+            # Add image URLs if available and vision is active
+            if vision_is_active and obs.image_urls:
+                content.append(ImageContent(image_urls=obs.image_urls))
+
+            message = Message(role='user', content=content)
        elif isinstance(obs, FileEditObservation):
            text = truncate_content(str(obs), max_message_chars)
            message = Message(role='user', content=[TextContent(text=text)])
@@ -447,14 +455,14 @@ class ConversationMemory:
                date = obs.date

                if obs.runtime_hosts or obs.additional_agent_instructions:
-                    runtime_info = RuntimeInfo(
+                    runtime_info = ContextualInfo(
                        available_hosts=obs.runtime_hosts,
                        additional_agent_instructions=obs.additional_agent_instructions,
                        date=date,
                        custom_secrets_descriptions=obs.custom_secrets_descriptions,
                    )
                else:
-                    runtime_info = RuntimeInfo(
+                    runtime_info = ContextualInfo(
                        date=date,
                        custom_secrets_descriptions=obs.custom_secrets_descriptions,
                    )
@@ -22,7 +22,7 @@ from openhands.microagent import (
    load_microagents_from_dir,
 )
 from openhands.runtime.base import Runtime
-from openhands.utils.prompt import RepositoryInfo, RuntimeInfo
+from openhands.utils.prompt import RepositoryInfo, ContextualInfo

 GLOBAL_MICROAGENTS_DIR = os.path.join(
    os.path.dirname(os.path.dirname(openhands.__file__)),
@@ -31,8 +31,8 @@ GLOBAL_MICROAGENTS_DIR = os.path.join(


 class Memory:
-    """
-    Memory is a component that listens to the EventStream for information retrieval actions
+    """Memory is a component that listens to the EventStream for information retrieval actions.
+    
    (a RecallAction) and publishes observations with the content (such as RecallObservation).
    """

@@ -64,7 +64,7 @@ class Memory:

        # Store repository / runtime info to send them to the templating later
        self.repository_info: RepositoryInfo | None = None
-        self.runtime_info: RuntimeInfo | None = None
+        self.runtime_info: ContextualInfo | None = None

        # Load global microagents (Knowledge + Repo)
        # from typically OpenHands/microagents (i.e., the PUBLIC microagents)
@@ -131,7 +131,6 @@ class Memory:
        This method collects information from all available repo microagents and concatenates their contents.
        Multiple repo microagents are supported, and their contents will be concatenated with newlines between them.
        """
-
        # Create WORKSPACE_CONTEXT info:
        # - repository_info
        # - runtime_info
@@ -180,6 +179,9 @@ class Memory:
                custom_secrets_descriptions=self.runtime_info.custom_secrets_descriptions
                if self.runtime_info is not None
                else {},
+                context_message=self.runtime_info.context_message
+                if self.runtime_info and self.runtime_info.context_message is not None
+                else None,
            )
            return obs
        return None
@@ -189,7 +191,6 @@ class Memory:
        event: RecallAction,
    ) -> RecallObservation | None:
        """When a microagent action triggers microagents, create a RecallObservation with structured data."""
-
        # Find any matched microagents based on the query
        microagent_knowledge = self._find_microagent_knowledge(event.query)

@@ -235,8 +236,7 @@ class Memory:
    def load_user_workspace_microagents(
        self, user_microagents: list[BaseMicroagent]
    ) -> None:
-        """
-        This method loads microagents from a user's cloned repo or workspace directory.
+        """This method loads microagents from a user's cloned repo or workspace directory.

        This is typically called from agent_session or setup once the workspace is cloned.
        """
@@ -250,9 +250,7 @@ class Memory:
                self.repo_microagents[user_microagent.name] = user_microagent

    def _load_global_microagents(self) -> None:
-        """
-        Loads microagents from the global microagents_dir
-        """
+        """Loads microagents from the global microagents_dir."""
        repo_agents, knowledge_agents = load_microagents_from_dir(
            GLOBAL_MICROAGENTS_DIR
        )
@@ -264,8 +262,7 @@ class Memory:
                self.repo_microagents[name] = agent

    def get_microagent_mcp_tools(self) -> list[MCPConfig]:
-        """
-        Get MCP tools from all repo microagents (always active)
+        """Get MCP tools from all repo microagents (always active).

        Returns:
            A list of MCP tools configurations from microagents
@@ -289,8 +286,11 @@ class Memory:
        else:
            self.repository_info = None

-    def set_runtime_info(
-        self, runtime: Runtime, custom_secrets_descriptions: dict[str, str]
+    def set_contextual_info(
+        self,
+        runtime: Runtime,
+        custom_secrets_descriptions: dict[str, str],
+        context_message: str | None = None,
    ) -> None:
        """Store runtime info (web hosts, ports, etc.)."""
        # e.g. { '127.0.0.1': 8080 }
@@ -298,15 +298,18 @@ class Memory:
        date = str(utc_now.date())

        if runtime.web_hosts or runtime.additional_agent_instructions:
-            self.runtime_info = RuntimeInfo(
+            self.runtime_info = ContextualInfo(
                available_hosts=runtime.web_hosts,
                additional_agent_instructions=runtime.additional_agent_instructions,
                date=date,
                custom_secrets_descriptions=custom_secrets_descriptions,
+                context_message=context_message,
            )
        else:
-            self.runtime_info = RuntimeInfo(
-                date=date, custom_secrets_descriptions=custom_secrets_descriptions
+            self.runtime_info = ContextualInfo(
+                date=date,
+                custom_secrets_descriptions=custom_secrets_descriptions,
+                context_message=context_message,
            )

    def send_error_message(self, message_id: str, message: str):
@@ -230,7 +230,7 @@ class IssueResolver:
        """Initialize the runtime for the agent.

        This function is called before the runtime is used to run the agent.
-        Currently it does nothing.
+        It sets up git configuration and runs the setup script if it exists.
        """
        logger.info('-' * 30)
        logger.info('BEGIN Runtime Completion Fn')
@@ -257,6 +257,14 @@ class IssueResolver:
        if not isinstance(obs, CmdOutputObservation) or obs.exit_code != 0:
            raise RuntimeError(f'Failed to set git config.\n{obs}')

+        # Run setup script if it exists
+        logger.info('Checking for .openhands/setup.sh script...')
+        runtime.maybe_run_setup_script()
+
+        # Setup git hooks if they exist
+        logger.info('Checking for .openhands/pre-commit.sh script...')
+        runtime.maybe_setup_git_hooks()
+
    async def complete_runtime(
        self,
        runtime: Runtime,
@@ -91,7 +91,6 @@ async def browse(
            active_page_index=obs.get(
                'active_page_index', -1
            ),  # index of the active page
-            dom_object=obs.get('dom_object', {}),  # DOM object
            axtree_object=obs.get('axtree_object', {}),  # accessibility tree object
            extra_element_properties=obs.get('extra_element_properties', {}),
            focused_element_bid=obs.get(
@@ -200,6 +200,11 @@ class LocalRuntime(ActionExecutionClient):
            headless_mode,
        )

+        #If there is an API key in the environment we use this in requests to the runtime
+        session_api_key = os.getenv("SESSION_API_KEY")
+        if session_api_key:
+            self.session.headers['X-Session-API-Key'] = session_api_key
+
    @property
    def action_execution_server_url(self) -> str:
        return self.api_url
@@ -153,10 +153,18 @@ class JupyterPlugin(Plugin):

        if not self.kernel.initialized:
            await self.kernel.initialize()
+
+        # Execute the code and get structured output
        output = await self.kernel.execute(action.code, timeout=action.timeout)
+
+        # Extract text content and image URLs from the structured output
+        text_content = output.get('text', '')
+        image_urls = output.get('images', [])
+
        return IPythonRunCellObservation(
-            content=output,
+            content=text_content,
            code=action.code,
+            image_urls=image_urls if image_urls else None,
        )

    async def run(self, action: Action) -> IPythonRunCellObservation:
@@ -139,7 +139,9 @@ class JupyterKernel:
        stop=stop_after_attempt(3),
        wait=wait_fixed(2),
    )  # type: ignore
-    async def execute(self, code: str, timeout: int = 120) -> str:
+    async def execute(
+        self, code: str, timeout: int = 120
+    ) -> dict[str, list[str] | str]:
        if not self.ws or self.ws.stream.closed():
            await self._connect()

@@ -171,7 +173,7 @@ class JupyterKernel:
        )
        logging.info(f'Executed code in jupyter kernel:\n{res}')

-        outputs: list[str] = []
+        outputs: list[dict] = []

        async def wait_for_messages() -> bool:
            execution_done = False
@@ -194,17 +196,23 @@ class JupyterKernel:

                if msg_type == 'error':
                    traceback = '\n'.join(msg_dict['content']['traceback'])
-                    outputs.append(traceback)
+                    outputs.append({'type': 'text', 'content': traceback})
                    execution_done = True
                elif msg_type == 'stream':
-                    outputs.append(msg_dict['content']['text'])
+                    outputs.append(
+                        {'type': 'text', 'content': msg_dict['content']['text']}
+                    )
                elif msg_type in ['execute_result', 'display_data']:
-                    outputs.append(msg_dict['content']['data']['text/plain'])
+                    outputs.append(
+                        {
+                            'type': 'text',
+                            'content': msg_dict['content']['data']['text/plain'],
+                        }
+                    )
                    if 'image/png' in msg_dict['content']['data']:
-                        # use markdone to display image (in case of large image)
-                        outputs.append(
-                            f'\n![image](data:image/png;base64,{msg_dict["content"]["data"]["image/png"]})\n'
-                        )
+                        # Store image data in structured format
+                        image_url = f'data:image/png;base64,{msg_dict["content"]["data"]["image/png"]}'
+                        outputs.append({'type': 'image', 'content': image_url})

                elif msg_type == 'execute_reply':
                    execution_done = True
@@ -225,19 +233,28 @@ class JupyterKernel:
            execution_done = await asyncio.wait_for(wait_for_messages(), timeout)
        except asyncio.TimeoutError:
            await interrupt_kernel()
-            return f'[Execution timed out ({timeout} seconds).]'
+            return {'text': f'[Execution timed out ({timeout} seconds).]', 'images': []}

-        if not outputs and execution_done:
-            ret = '[Code executed successfully with no output]'
+        # Process structured outputs
+        text_outputs = []
+        image_outputs = []
+
+        for output in outputs:
+            if output['type'] == 'text':
+                text_outputs.append(output['content'])
+            elif output['type'] == 'image':
+                image_outputs.append(output['content'])
+
+        if not text_outputs and execution_done:
+            text_content = '[Code executed successfully with no output]'
        else:
-            ret = ''.join(outputs)
+            text_content = ''.join(text_outputs)

-        # Remove ANSI
-        ret = strip_ansi(ret)
+        # Remove ANSI from text content
+        text_content = strip_ansi(text_content)

-        if os.environ.get('DEBUG'):
-            logging.info(f'OUTPUT:\n{ret}')
-        return ret
+        # Return a dictionary with text content and image URLs
+        return {'text': text_content, 'images': image_outputs}

    async def shutdown_async(self) -> None:
        if self.kernel_id:
@@ -267,7 +284,9 @@ class ExecuteHandler(tornado.web.RequestHandler):

        output = await self.jupyter_kernel.execute(code)

-        self.write(output)
+        # Set content type to JSON and return the structured output
+        self.set_header('Content-Type', 'application/json')
+        self.write(json_encode(output))


 def make_app() -> tornado.web.Application:
@@ -6,10 +6,8 @@ import socketio

 from openhands.core.config import AppConfig
 from openhands.events.action import MessageAction
-from openhands.events.event_store import EventStore
 from openhands.server.config.server_config import ServerConfig
 from openhands.server.data_models.agent_loop_info import AgentLoopInfo
-from openhands.server.data_models.conversation_info import ConversationInfo
 from openhands.server.monitoring import MonitoringListener
 from openhands.server.session.conversation import Conversation
 from openhands.storage.conversation.conversation_store import ConversationStore
@@ -83,6 +81,7 @@ class ConversationManager(ABC):
        user_id: str | None,
        initial_user_msg: MessageAction | None = None,
        replay_json: str | None = None,
+        context_message: str | None = None,
    ) -> AgentLoopInfo:
        """Start an event loop if one is not already running"""

@@ -245,12 +245,13 @@ class StandaloneConversationManager(ConversationManager):
        user_id: str | None,
        initial_user_msg: MessageAction | None = None,
        replay_json: str | None = None,
+        context_message: str | None = None,
    ) -> AgentLoopInfo:
        logger.info(f'maybe_start_agent_loop:{sid}', extra={'session_id': sid})
        session = self._local_agent_loops_by_sid.get(sid)
        if not session:
            session = await self._start_agent_loop(
-                sid, settings, user_id, initial_user_msg, replay_json
+                sid, settings, user_id, initial_user_msg, replay_json, context_message
            )
        return self._agent_loop_info_from_session(session)

@@ -261,6 +262,7 @@ class StandaloneConversationManager(ConversationManager):
        user_id: str | None,
        initial_user_msg: MessageAction | None = None,
        replay_json: str | None = None,
+        context_message: str | None = None,
    ) -> Session:
        logger.info(f'starting_agent_loop:{sid}', extra={'session_id': sid})

@@ -304,7 +306,9 @@ class StandaloneConversationManager(ConversationManager):
        )
        self._local_agent_loops_by_sid[sid] = session
        asyncio.create_task(
-            session.initialize_agent(settings, initial_user_msg, replay_json)
+            session.initialize_agent(
+                settings, initial_user_msg, replay_json, context_message
+            )
        )
        # This does not get added when resuming an existing conversation
        try:
@@ -475,17 +479,17 @@ class StandaloneConversationManager(ConversationManager):
                continue
            results.append(self._agent_loop_info_from_session(session))
        return results
-    
+
    def _agent_loop_info_from_session(self, session: Session):
        return AgentLoopInfo(
            conversation_id=session.sid,
            url=self._get_conversation_url(session.sid),
-            api_key=None,
+            session_api_key=None,
            event_store=session.agent_session.event_stream,
        )

    def _get_conversation_url(self, conversation_id: str):
-        return f"/conversations/{conversation_id}"
+        return f'/conversations/{conversation_id}'


 def _last_updated_at_key(conversation: ConversationMetadata) -> float:
@@ -10,5 +10,5 @@ class AgentLoopInfo:
    """
    conversation_id: str
    url: str | None
-    api_key: str | None
+    session_api_key: str | None
    event_store: EventStoreABC
@@ -20,5 +20,5 @@ class ConversationInfo:
    trigger: ConversationTrigger | None = None
    num_connections: int = 0
    url: str | None = None
-    api_key: str | None = None
+    session_api_key: str | None = None
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
@@ -10,6 +10,7 @@ from openhands.server.middleware import (
    InMemoryRateLimiter,
    LocalhostCORSMiddleware,
    RateLimitMiddleware,
+    SessionApiKeyMiddleware,
 )
 from openhands.server.static import SPAStaticFiles

@@ -32,4 +33,8 @@ base_app.add_middleware(
 )
 base_app.middleware('http')(AttachConversationMiddleware(base_app))

+session_api_key = os.getenv('SESSION_API_KEY')
+if session_api_key:
+    base_app.middleware('http')(SessionApiKeyMiddleware(session_api_key))
+
 app = socketio.ASGIApp(sio, other_asgi_app=base_app)
@@ -1,4 +1,5 @@
 import asyncio
+import os
 from types import MappingProxyType
 from typing import Any
 from urllib.parse import parse_qs
@@ -72,6 +73,9 @@ async def connect(connection_id: str, environ: dict) -> None:
            logger.error('No conversation_id in query params')
            raise ConnectionRefusedError('No conversation_id in query params')

+        if _invalid_session_api_key(query_params):
+            raise ConnectionRefusedError('invalid_session_api_key')
+
        cookies_str = environ.get('HTTP_COOKIE', '')
        # Get Authorization header from the environment
        # Headers in WSGI/ASGI are prefixed with 'HTTP_' and have dashes replaced with underscores
@@ -160,3 +164,13 @@ async def oh_action(connection_id: str, data: dict[str, Any]) -> None:
 async def disconnect(connection_id: str) -> None:
    logger.info(f'sio:disconnect:{connection_id}')
    await conversation_manager.disconnect_from_session(connection_id)
+
+
+def _invalid_session_api_key(query_params: dict[str, list[Any]]):
+    session_api_key = os.getenv('SESSION_API_KEY')
+    if not session_api_key:
+        return False
+    query_api_keys = query_params['session_api_key']
+    if not query_api_keys:
+        return True
+    return query_api_keys[0] != session_api_key
@@ -1,5 +1,6 @@
 import asyncio
 from collections import defaultdict
+from dataclasses import dataclass
 from datetime import datetime, timedelta
 from typing import Any
 from urllib.parse import urlparse
@@ -192,3 +193,26 @@ class AttachConversationMiddleware(SessionMiddlewareInterface):
            await self._detach_session(request)

        return response
+
+
+@dataclass
+class SessionApiKeyMiddleware:
+    """Middleware which ensures that all requests contain a header with the token given"""
+
+    session_api_key: str
+
+    async def __call__(
+        self, request: Request, call_next: RequestResponseEndpoint
+    ) -> Response:
+        if (
+            request.method != 'OPTIONS'
+            and request.url.path != '/alive'
+            and request.url.path != '/server_info'
+        ):
+            if self.session_api_key != request.headers.get('X-Session-API-Key'):
+                return JSONResponse(
+                    {'code': 'invalid_session_api_key'},
+                    status_code=status.HTTP_401_UNAUTHORIZED,
+                )
+        response = await call_next(request)
+        return response
@@ -61,6 +61,7 @@ class InitSessionRequest(BaseModel):
    image_urls: list[str] | None = None
    replay_json: str | None = None
    suggested_task: SuggestedTask | None = None
+    context_message: str | None = None

    model_config = {'extra': 'forbid'}

@@ -69,7 +70,7 @@ class InitSessionResponse(BaseModel):
    status: str
    conversation_id: str
    conversation_url: str
-    api_key: str | None
+    session_api_key: str | None
    message: str | None = None


@@ -84,6 +85,7 @@ async def _create_new_conversation(
    replay_json: str | None,
    conversation_trigger: ConversationTrigger = ConversationTrigger.GUI,
    attach_convo_id: bool = False,
+    context_message: str | None = None,
 ) -> AgentLoopInfo:
    logger.info(
        'Creating conversation',
@@ -120,6 +122,7 @@ async def _create_new_conversation(
    session_init_args['selected_repository'] = selected_repository
    session_init_args['custom_secrets'] = custom_secrets
    session_init_args['selected_branch'] = selected_branch
+    session_init_args['context_message'] = context_message
    conversation_init_data = ConversationInitData(**session_init_args)
    logger.info('Loading conversation store')
    conversation_store = await ConversationStoreImpl.get_instance(config, user_id)
@@ -169,6 +172,7 @@ async def _create_new_conversation(
        user_id,
        initial_user_msg=initial_message_action,
        replay_json=replay_json,
+        context_message=context_message,
    )
    logger.info(f'Finished initializing conversation {agent_loop_info.conversation_id}')
    return agent_loop_info
@@ -195,6 +199,7 @@ async def new_conversation(
    replay_json = data.replay_json
    suggested_task = data.suggested_task
    git_provider = data.git_provider
+    context_message = data.context_message

    conversation_trigger = ConversationTrigger.GUI

@@ -222,13 +227,14 @@ async def new_conversation(
            image_urls=image_urls,
            replay_json=replay_json,
            conversation_trigger=conversation_trigger,
+            context_message=context_message,
        )

        return InitSessionResponse(
            status='ok',
            conversation_id=agent_loop_info.conversation_id,
            conversation_url=agent_loop_info.url,
-            api_key=agent_loop_info.api_key,
+            session_api_key=agent_loop_info.session_api_key,
        )
    except MissingSettingsError as e:
        return JSONResponse(
@@ -287,19 +293,25 @@ async def search_conversations(
    running_conversations = await conversation_manager.get_running_agent_loops(
        user_id, conversation_ids
    )
-    connection_ids_to_conversation_ids = await conversation_manager.get_connections(filter_to_sids=conversation_ids)
-    agent_loop_info = await conversation_manager.get_agent_loop_info(filter_to_sids=conversation_ids)
-    urls_by_conversation_id = {info.conversation_id: info.url for info in agent_loop_info}
+    connection_ids_to_conversation_ids = await conversation_manager.get_connections(
+        filter_to_sids=conversation_ids
+    )
+    agent_loop_info = await conversation_manager.get_agent_loop_info(
+        filter_to_sids=conversation_ids
+    )
+    agent_loop_info_by_conversation_id = {info.conversation_id: info for info in agent_loop_info}
    result = ConversationInfoResultSet(
        results=await wait_all(
            _get_conversation_info(
                conversation=conversation,
                is_running=conversation.conversation_id in running_conversations,
                num_connections=sum(
-                    1 for conversation_id in connection_ids_to_conversation_ids.values()
+                    1
+                    for conversation_id in connection_ids_to_conversation_ids.values()
                    if conversation_id == conversation.conversation_id
                ),
-                url=urls_by_conversation_id.get(conversation.conversation_id),
+                agent_loop_info=agent_loop_info_by_conversation_id.get(conversation.conversation_id),
+
            )
            for conversation in filtered_results
        ),
@@ -317,9 +329,9 @@ async def get_conversation(
        metadata = await conversation_store.get_metadata(conversation_id)
        is_running = await conversation_manager.is_agent_loop_running(conversation_id)
        num_connections = len(await conversation_manager.get_connections(filter_to_sids={conversation_id}))
-        agent_loop_info = await conversation_manager.get_agent_loop_info(filter_to_sids={conversation_id})
-        url = agent_loop_info[0].url if agent_loop_info else None
-        conversation_info = await _get_conversation_info(metadata, is_running, num_connections, url)
+        agent_loop_infos = await conversation_manager.get_agent_loop_info(filter_to_sids={conversation_id})
+        agent_loop_info = agent_loop_infos[0] if agent_loop_infos else None
+        conversation_info = await _get_conversation_info(metadata, is_running, num_connections, agent_loop_info)
        return conversation_info
    except FileNotFoundError:
        return None
@@ -348,7 +360,7 @@ async def _get_conversation_info(
    conversation: ConversationMetadata,
    is_running: bool,
    num_connections: int,
-    url: str | None,
+    agent_loop_info: AgentLoopInfo | None,
 ) -> ConversationInfo | None:
    try:
        title = conversation.title
@@ -365,7 +377,8 @@ async def _get_conversation_info(
                ConversationStatus.RUNNING if is_running else ConversationStatus.STOPPED
            ),
            num_connections=num_connections,
-            url=url,
+            url=agent_loop_info.url if agent_loop_info else None,
+            session_api_key=agent_loop_info.session_api_key if agent_loop_info else None,
        )
    except Exception as e:
        logger.error(
@@ -16,7 +16,11 @@ from openhands.core.schema.agent import AgentState
 from openhands.events.action import ChangeAgentStateAction, MessageAction
 from openhands.events.event import Event, EventSource
 from openhands.events.stream import EventStream
-from openhands.integrations.provider import CUSTOM_SECRETS_TYPE, PROVIDER_TOKEN_TYPE, ProviderHandler
+from openhands.integrations.provider import (
+    CUSTOM_SECRETS_TYPE,
+    PROVIDER_TOKEN_TYPE,
+    ProviderHandler,
+)
 from openhands.mcp import add_mcp_tools_to_agent
 from openhands.memory.memory import Memory
 from openhands.microagent.microagent import BaseMicroagent
@@ -91,6 +95,7 @@ class AgentSession:
        selected_branch: str | None = None,
        initial_message: MessageAction | None = None,
        replay_json: str | None = None,
+        context_message: str | None = None,
    ) -> None:
        """Starts the Agent session
        Parameters:
@@ -116,7 +121,9 @@ class AgentSession:
        finished = False  # For monitoring
        runtime_connected = False

-        custom_secrets_handler = UserSecrets(custom_secrets=custom_secrets if custom_secrets else {})
+        custom_secrets_handler = UserSecrets(
+            custom_secrets=custom_secrets if custom_secrets else {}
+        )

        try:
            self._create_security_analyzer(config.security.security_analyzer)
@@ -144,12 +151,13 @@ class AgentSession:
            self.memory = await self._create_memory(
                selected_repository=selected_repository,
                repo_directory=repo_directory,
-                custom_secrets_descriptions=custom_secrets_handler.get_custom_secrets_descriptions()
+                custom_secrets_descriptions=custom_secrets_handler.get_custom_secrets_descriptions(),
+                context_message=context_message,
            )

            # NOTE: this needs to happen before controller is created
            # so MCP tools can be included into the SystemMessageAction
-            if self.runtime and runtime_connected:
+            if self.runtime and runtime_connected and agent.config.enable_mcp:
                await add_mcp_tools_to_agent(agent, self.runtime, self.memory, config.mcp)

            if replay_json:
@@ -315,7 +323,7 @@ class AgentSession:
                provider_tokens=git_provider_tokens
                or cast(PROVIDER_TOKEN_TYPE, MappingProxyType({}))
            )
-            
+
            # Merge git provider tokens with custom secrets before passing over to runtime
            env_vars.update(await provider_handler.get_env_vars(expose_secrets=True))
            self.runtime = runtime_cls(
@@ -415,7 +423,11 @@ class AgentSession:
        return controller

    async def _create_memory(
-        self, selected_repository: str | None, repo_directory: str | None, custom_secrets_descriptions: dict[str, str]
+        self,
+        selected_repository: str | None,
+        repo_directory: str | None,
+        custom_secrets_descriptions: dict[str, str],
+        context_message: str | None = None,
    ) -> Memory:
        memory = Memory(
            event_stream=self.event_stream,
@@ -425,7 +437,9 @@ class AgentSession:

        if self.runtime:
            # sets available hosts and other runtime info
-            memory.set_runtime_info(self.runtime, custom_secrets_descriptions)
+            memory.set_contextual_info(
+                self.runtime, custom_secrets_descriptions, context_message
+            )

            # loads microagents from repo/.openhands/microagents
            microagents: list[BaseMicroagent] = await call_sync_from_async(
@@ -14,6 +14,7 @@ class ConversationInitData(Settings):
    selected_repository: str | None = Field(default=None)
    replay_json: str | None = Field(default=None)
    selected_branch: str | None = Field(default=None)
+    context_message: str | None = Field(default=None)

    model_config = {
        'arbitrary_types_allowed': True,
@@ -91,6 +91,7 @@ class Session:
        settings: Settings,
        initial_message: MessageAction | None,
        replay_json: str | None,
+        context_message: str | None = None,
    ) -> None:
        self.agent_session.event_stream.add_event(
            AgentStateChangedObservation('', AgentState.LOADING),
@@ -160,6 +161,7 @@ class Session:
            selected_repository = settings.selected_repository
            selected_branch = settings.selected_branch
            custom_secrets = settings.custom_secrets
+            context_message = settings.context_message

        try:
            await self.agent_session.start(
@@ -176,6 +178,7 @@ class Session:
                selected_branch=selected_branch,
                initial_message=initial_message,
                replay_json=replay_json,
+                context_message=context_message,
            )
        except MicroagentValidationError as e:
            self.logger.exception(f'Error creating agent_session: {e}')
@@ -10,11 +10,12 @@ from openhands.events.observation.agent import MicroagentKnowledge


@dataclass
-class RuntimeInfo:
+class ContextualInfo:
    date: str
    available_hosts: dict[str, int] = field(default_factory=dict)
    additional_agent_instructions: str = ''
    custom_secrets_descriptions: dict[str, str] = field(default_factory=dict)
+    context_message: str | None = None


@dataclass
@@ -26,8 +27,7 @@ class RepositoryInfo:


 class PromptManager:
-    """
-    Manages prompt templates and includes information from the user's workspace micro-agents and global micro-agents.
+    """Manages prompt templates and includes information from the user's workspace micro-agents and global micro-agents.

    This class is dedicated to loading and rendering prompts (system prompt, user prompt).

@@ -58,8 +58,9 @@ class PromptManager:
        return self.system_template.render().strip()

    def get_example_user_message(self) -> str:
-        """This is an initial user message that can be provided to the agent
-        before *actual* user instructions are provided.
+        """This is an initial user message that can be provided to the agent.
+        
+        Before *actual* user instructions are provided.

        It can be used to provide a demonstration of how the agent
        should behave in order to solve the user's task. And it may
@@ -67,13 +68,12 @@ class PromptManager:
        These additional context will convert the current generic agent
        into a more specialized agent that is tailored to the user's task.
        """
-
        return self.user_template.render().strip()

    def build_workspace_context(
        self,
        repository_info: RepositoryInfo | None,
-        runtime_info: RuntimeInfo | None,
+        runtime_info: ContextualInfo | None,
        repo_instructions: str = '',
    ) -> str:
        """Renders the additional info template with the stored repository/runtime info."""
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.

 [[package]]
 name = "aiohappyeyeballs"
@@ -2871,7 +2871,7 @@ grpcio = {version = ">=1.49.1,<2.0dev", optional = true, markers = "python_versi
 grpcio-status = {version = ">=1.49.1,<2.0.dev0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
 proto-plus = [
    {version = ">=1.25.0,<2.0.0dev", markers = "python_version >= \"3.13\""},
-    {version = ">=1.22.3,<2.0.0dev"},
+    {version = ">=1.22.3,<2.0.0dev", markers = "python_version < \"3.13\""},
 ]
 protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0.dev0"
 requests = ">=2.18.0,<3.0.0.dev0"
@@ -5144,14 +5144,14 @@ files = [

 [[package]]
 name = "mcp"
-version = "1.7.1"
+version = "1.9.0"
 description = "Model Context Protocol SDK"
 optional = false
 python-versions = ">=3.10"
 groups = ["main"]
 files = [
-    {file = "mcp-1.7.1-py3-none-any.whl", hash = "sha256:f7e6108977db6d03418495426c7ace085ba2341b75197f8727f96f9cfd30057a"},
-    {file = "mcp-1.7.1.tar.gz", hash = "sha256:eb4f1f53bd717f75dda8a1416e00804b831a8f3c331e23447a03b78f04b43a6e"},
+    {file = "mcp-1.9.0-py3-none-any.whl", hash = "sha256:9dfb89c8c56f742da10a5910a1f64b0d2ac2c3ed2bd572ddb1cfab7f35957178"},
+    {file = "mcp-1.9.0.tar.gz", hash = "sha256:905d8d208baf7e3e71d70c82803b89112e321581bcd2530f9de0fe4103d28749"},
 ]

 [package.dependencies]
@@ -5172,20 +5172,20 @@ ws = ["websockets (>=15.0.1)"]

 [[package]]
 name = "mcpm"
-version = "1.9.0"
+version = "1.12.0"
 description = "MCPM - Model Context Protocol Manager"
 optional = false
 python-versions = ">=3.10"
 groups = ["main"]
 files = [
-    {file = "mcpm-1.9.0-py3-none-any.whl", hash = "sha256:fc9efe355329bef6a30d201668f9752d6fbc46f9d3a2affda8d45b9c5240475e"},
-    {file = "mcpm-1.9.0.tar.gz", hash = "sha256:97c112cb6d40e9bbcb4091c1db79da4eeda256bfa48083fa1f3abb260b814686"},
+    {file = "mcpm-1.12.0-py3-none-any.whl", hash = "sha256:ed3a87300420bcdb9cd12ef290179fda5bd51eb2f4cd3e793084d83eed91b249"},
+    {file = "mcpm-1.12.0.tar.gz", hash = "sha256:e9d2b852b90d7fd62dede584f035dd6b2b3d068d233e96b82aead835f81a911a"},
 ]

 [package.dependencies]
 click = ">=8.1.3"
 duckdb = ">=1.2.2"
-mcp = ">=1.6.0"
+mcp = ">=1.8.0"
 prompt-toolkit = ">=3.0.0"
 psutil = ">=7.0.0"
 pydantic = ">=2.5.1"
@@ -8574,7 +8574,7 @@ description = "C version of reader, parser and emitter for ruamel.yaml derived f
 optional = false
 python-versions = ">=3.9"
 groups = ["main"]
-markers = "platform_python_implementation == \"CPython\" and python_version == \"3.12\""
+markers = "python_version < \"3.13\" and platform_python_implementation == \"CPython\""
 files = [
    {file = "ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl", hash = "sha256:11f891336688faf5156a36293a9c362bdc7c88f03a8a027c2c1d8e0bcde998e5"},
    {file = "ruamel.yaml.clib-0.2.12-cp310-cp310-manylinux2014_aarch64.whl", hash = "sha256:a606ef75a60ecf3d924613892cc603b154178ee25abb3055db5062da811fd969"},
@@ -9966,7 +9966,7 @@ description = "A language and compiler for custom Deep Learning operations"
 optional = false
 python-versions = "*"
 groups = ["evaluation"]
-markers = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and python_version == \"3.12\""
+markers = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and python_version < \"3.13\""
 files = [
    {file = "triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e1efef76935b2febc365bfadf74bcb65a6f959a9872e5bddf44cc9e0adce1e1a"},
    {file = "triton-3.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ce8520437c602fb633f1324cc3871c47bee3b67acf9756c1a66309b60e3216c"},
@@ -10267,6 +10267,39 @@ platformdirs = ">=3.9.1,<5"
 docs = ["furo (>=2023.7.26)", "proselint (>=0.13)", "sphinx (>=7.1.2,!=7.3)", "sphinx-argparse (>=0.4)", "sphinxcontrib-towncrier (>=0.2.1a0)", "towncrier (>=23.6)"]
 test = ["covdefaults (>=2.3)", "coverage (>=7.2.7)", "coverage-enable-subprocess (>=1)", "flaky (>=3.7)", "packaging (>=23.1)", "pytest (>=7.4)", "pytest-env (>=0.8.2)", "pytest-freezer (>=0.4.8) ; platform_python_implementation == \"PyPy\" or platform_python_implementation == \"CPython\" and sys_platform == \"win32\" and python_version >= \"3.13\"", "pytest-mock (>=3.11.1)", "pytest-randomly (>=3.12)", "pytest-timeout (>=2.1)", "setuptools (>=68)", "time-machine (>=2.10) ; platform_python_implementation == \"CPython\""]

+[[package]]
+name = "visualswebench"
+version = "0.0.0"
+description = ""
+optional = false
+python-versions = ">=3.8"
+groups = ["evaluation"]
+files = []
+develop = false
+
+[package.dependencies]
+beautifulsoup4 = "*"
+chardet = "*"
+datasets = "*"
+docker = "*"
+ghapi = "*"
+GitPython = "*"
+pre-commit = "*"
+python-dotenv = "*"
+requests = "*"
+rich = "*"
+tqdm = "*"
+unidiff = "*"
+
+[package.extras]
+inference = ["anthropic", "flash_attn", "jedi", "peft", "protobuf", "sentencepiece", "tenacity", "tiktoken", "torch", "transformers", "triton"]
+
+[package.source]
+type = "git"
+url = "https://github.com/luolin101/Visual-SWE-bench.git"
+reference = "HEAD"
+resolved_reference = "e12d06686202a778956bf4faa65330be23feb23e"
+
 [[package]]
 name = "watchdog"
 version = "6.0.0"
@@ -11265,4 +11298,4 @@ cffi = ["cffi (>=1.11)"]
 [metadata]
 lock-version = "2.1"
 python-versions = "^3.12,<3.14"
-content-hash = "998d731b9d5cfa2239ef2019e95b681c44c70294f467cf9fdadbf2d551422ebe"
+content-hash = "d45187a67e1326bd1d9c7c6a6c5cd9529688bdf95c34f611e7f9eeb8623c03a3"
@@ -83,8 +83,8 @@ prompt-toolkit = "^3.0.50"
 poetry = "^2.1.2"
 anyio = "4.9.0"
 pythonnet = "*"
-mcp = "1.7.1"
-mcpm = "1.9.0"
+mcp = "1.9.0"
+mcpm = "1.12.0"

 [tool.poetry.group.dev.dependencies]
 ruff = "0.11.10"
@@ -116,6 +116,7 @@ whatthepatch = "*"
 retry = "*"
 evaluate = "*"
 swebench = "^3.0.8"
+visualswebench = { git = "https://github.com/luolin101/Visual-SWE-bench.git" }
 swegym = { git = "https://github.com/SWE-Gym/SWE-Bench-Package.git" }
 commit0 = "*"
 func_timeout = "*"
@@ -129,7 +130,7 @@ browsergym-webarena = "0.13.3"
 browsergym-miniwob = "0.13.3"
 browsergym-visualwebarena = "0.13.3"
 boto3-stubs = { extras = [ "s3" ], version = "^1.37.19" }
-pyarrow = "20.0.0"                                                    # transitive dependency, pinned here to avoid conflicts
+pyarrow = "20.0.0"                                                             # transitive dependency, pinned here to avoid conflicts
 datasets = "*"

 [tool.poetry.scripts]
@@ -1,17 +1,17 @@
-import os
 from unittest import mock

 import pytest

 from openhands.core.config import SandboxConfig
+from openhands.events.action import CmdRunAction
 from openhands.resolver.resolve_issue import IssueResolver
-import openhands
+

 def assert_sandbox_config(
    config: SandboxConfig,
-    base_container_image = SandboxConfig.model_fields["base_container_image"].default,
-    runtime_container_image = f'ghcr.io/all-hands-ai/runtime:mock-nikolaik',  # Default to mock version
-    local_runtime_url = SandboxConfig.model_fields["local_runtime_url"].default,
+    base_container_image=SandboxConfig.model_fields['base_container_image'].default,
+    runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik',  # Default to mock version
+    local_runtime_url=SandboxConfig.model_fields['local_runtime_url'].default,
 ):
    """Helper function to assert the properties of the SandboxConfig object."""
    assert isinstance(config, SandboxConfig)
@@ -22,6 +22,7 @@ def assert_sandbox_config(
    assert config.timeout == 300
    assert config.local_runtime_url == local_runtime_url

+
 def test_setup_sandbox_config_default():
    """Test default configuration when no images provided and not experimental"""
    with mock.patch('openhands.__version__', 'mock'):
@@ -32,23 +33,25 @@ def test_setup_sandbox_config_default():
        )

        assert_sandbox_config(
-            config,
-            runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik'
+            config, runtime_container_image='ghcr.io/all-hands-ai/runtime:mock-nikolaik'
        )


 def test_setup_sandbox_config_both_images():
    """Test that providing both container images raises ValueError"""
-    with pytest.raises(ValueError, match="Cannot provide both runtime and base container images."):
+    with pytest.raises(
+        ValueError, match='Cannot provide both runtime and base container images.'
+    ):
        IssueResolver._setup_sandbox_config(
-            base_container_image="base-image",
-            runtime_container_image="runtime-image",
+            base_container_image='base-image',
+            runtime_container_image='runtime-image',
            is_experimental=False,
        )

+
 def test_setup_sandbox_config_base_only():
    """Test configuration when only base_container_image is provided"""
-    base_image = "custom-base-image"
+    base_image = 'custom-base-image'
    config = IssueResolver._setup_sandbox_config(
        base_container_image=base_image,
        runtime_container_image=None,
@@ -56,25 +59,22 @@ def test_setup_sandbox_config_base_only():
    )

    assert_sandbox_config(
-        config,
-        base_container_image=base_image,
-        runtime_container_image=None
+        config, base_container_image=base_image, runtime_container_image=None
    )

+
 def test_setup_sandbox_config_runtime_only():
    """Test configuration when only runtime_container_image is provided"""
-    runtime_image = "custom-runtime-image"
+    runtime_image = 'custom-runtime-image'
    config = IssueResolver._setup_sandbox_config(
        base_container_image=None,
        runtime_container_image=runtime_image,
        is_experimental=False,
    )

-    assert_sandbox_config(
-        config,
-        runtime_container_image=runtime_image
-    )
- 
+    assert_sandbox_config(config, runtime_container_image=runtime_image)
+
+
 def test_setup_sandbox_config_experimental():
    """Test configuration when experimental mode is enabled"""
    with mock.patch('openhands.__version__', 'mock'):
@@ -84,40 +84,67 @@ def test_setup_sandbox_config_experimental():
            is_experimental=True,
        )

-        assert_sandbox_config(
-            config,
-            runtime_container_image=None
-        )
+        assert_sandbox_config(config, runtime_container_image=None)

-@mock.patch("openhands.resolver.resolve_issue.os.getuid", return_value=0)
-@mock.patch("openhands.resolver.resolve_issue.get_unique_uid", return_value=1001)
+
+@mock.patch('openhands.resolver.resolve_issue.os.getuid', return_value=0)
+@mock.patch('openhands.resolver.resolve_issue.get_unique_uid', return_value=1001)
 def test_setup_sandbox_config_gitlab_ci(mock_get_unique_uid, mock_getuid):
    """Test GitLab CI specific configuration when running as root"""
    with mock.patch('openhands.__version__', 'mock'):
-        with mock.patch.object(IssueResolver, "GITLAB_CI", True):
+        with mock.patch.object(IssueResolver, 'GITLAB_CI', True):
            config = IssueResolver._setup_sandbox_config(
                base_container_image=None,
                runtime_container_image=None,
                is_experimental=False,
            )
-            
-            assert_sandbox_config(
-                config,
-                local_runtime_url="http://localhost"
-            )

-@mock.patch("openhands.resolver.resolve_issue.os.getuid", return_value=1000)
+            assert_sandbox_config(config, local_runtime_url='http://localhost')
+
+
+@mock.patch('openhands.resolver.resolve_issue.os.getuid', return_value=1000)
 def test_setup_sandbox_config_gitlab_ci_non_root(mock_getuid):
    """Test GitLab CI configuration when not running as root"""
    with mock.patch('openhands.__version__', 'mock'):
-        with mock.patch.object(IssueResolver, "GITLAB_CI", True):
+        with mock.patch.object(IssueResolver, 'GITLAB_CI', True):
            config = IssueResolver._setup_sandbox_config(
                base_container_image=None,
                runtime_container_image=None,
                is_experimental=False,
            )

-            assert_sandbox_config(
-                config,
-                local_runtime_url="http://localhost"
-            )
+            assert_sandbox_config(config, local_runtime_url='http://localhost')
+
+
+@mock.patch('openhands.events.observation.CmdOutputObservation')
+@mock.patch('openhands.runtime.base.Runtime')
+def test_initialize_runtime_runs_setup_script_and_git_hooks(
+    mock_runtime, mock_cmd_output
+):
+    """Test that initialize_runtime calls maybe_run_setup_script and maybe_setup_git_hooks"""
+
+    # Create a minimal resolver instance with just the methods we need
+    class MinimalResolver:
+        def initialize_runtime(self, runtime):
+            # This is the method we're testing
+            action = CmdRunAction(command='git config --global core.pager ""')
+            runtime.run_action(action)
+
+            # Run setup script if it exists
+            runtime.maybe_run_setup_script()
+
+            # Setup git hooks if they exist
+            runtime.maybe_setup_git_hooks()
+
+    resolver = MinimalResolver()
+
+    # Mock the runtime's run_action method to return a successful CmdOutputObservation
+    mock_cmd_output.return_value.exit_code = 0
+    mock_runtime.run_action.return_value = mock_cmd_output.return_value
+
+    # Call the method
+    resolver.initialize_runtime(mock_runtime)
+
+    # Verify that both methods were called
+    mock_runtime.maybe_run_setup_script.assert_called_once()
+    mock_runtime.maybe_setup_git_hooks.assert_called_once()
@@ -57,6 +57,10 @@ def mock_agent():
    agent.llm.metrics = Metrics()
    agent.llm.config = AppConfig().get_llm_config()

+    # Add config with enable_mcp attribute
+    agent.config = MagicMock(spec=AgentConfig)
+    agent.config.enable_mcp = True
+
    # Add a proper system message mock
    system_message = SystemMessageAction(
        content='Test system message', tools=['test_tool']
@@ -1,162 +0,0 @@
-import asyncio
-from unittest.mock import MagicMock, Mock
-from uuid import uuid4
-
-import pytest
-
-from openhands.agenthub.codeact_agent.codeact_agent import CodeActAgent
-from openhands.agenthub.readonly_agent.readonly_agent import ReadOnlyAgent
-from openhands.controller.agent import Agent
-from openhands.controller.agent_controller import AgentController
-from openhands.controller.state.state import State
-from openhands.core.config import AgentConfig, LLMConfig
-from openhands.core.schema import AgentState
-from openhands.events import EventSource, EventStream
-from openhands.events.action import (
-    AgentDelegateAction,
-    AgentFinishAction,
-    MessageAction,
-)
-from openhands.events.observation import AgentDelegateObservation
-from openhands.llm.llm import LLM
-from openhands.llm.metrics import Metrics
-from openhands.storage.memory import InMemoryFileStore
-
-
-@pytest.fixture
-def mock_event_stream():
-    """Creates an event stream in memory."""
-    sid = f'test-{uuid4()}'
-    file_store = InMemoryFileStore({})
-    return EventStream(sid=sid, file_store=file_store)
-
-
-@pytest.fixture
-def mock_codeact_agent():
-    """Creates a mock CodeActAgent for testing."""
-    agent = MagicMock(spec=CodeActAgent)
-    agent.name = 'CodeActAgent'
-    agent.llm = MagicMock(spec=LLM)
-    agent.llm.metrics = Metrics()
-    agent.llm.config = LLMConfig()
-    agent.config = AgentConfig()
-
-    # Add a proper system message mock
-    from openhands.events.action.message import SystemMessageAction
-
-    system_message = SystemMessageAction(content='Test system message for CodeActAgent')
-    system_message._source = EventSource.AGENT
-    system_message._id = -1  # Set invalid ID to avoid the ID check
-    agent.get_system_message.return_value = system_message
-
-    return agent
-
-
-@pytest.fixture
-def mock_readonly_agent():
-    """Creates a mock ReadOnlyAgent for testing."""
-    agent = MagicMock(spec=ReadOnlyAgent)
-    agent.name = 'ReadOnlyAgent'
-    agent.llm = MagicMock(spec=LLM)
-    agent.llm.metrics = Metrics()
-    agent.llm.config = LLMConfig()
-    agent.config = AgentConfig()
-
-    # Add a proper system message mock
-    from openhands.events.action.message import SystemMessageAction
-
-    system_message = SystemMessageAction(content='Test system message for ReadOnlyAgent')
-    system_message._source = EventSource.AGENT
-    system_message._id = -1  # Set invalid ID to avoid the ID check
-    agent.get_system_message.return_value = system_message
-
-    return agent
-
-
-@pytest.mark.asyncio
-async def test_agent_mode_toggle(mock_codeact_agent, mock_readonly_agent, mock_event_stream):
-    """
-    Test that the agent mode toggle works correctly:
-    1. Start with CodeActAgent
-    2. Toggle to ReadOnlyAgent
-    3. Toggle back to CodeActAgent
-    """
-    # Mock the agent class resolution so that AgentController can instantiate mock_readonly_agent
-    original_get_cls = Agent.get_cls
-    
-    def mock_get_cls(agent_name):
-        if agent_name == 'ReadOnlyAgent':
-            return lambda llm, config: mock_readonly_agent
-        return original_get_cls(agent_name)
-    
-    Agent.get_cls = Mock(side_effect=mock_get_cls)
-
-    # Create parent controller with CodeActAgent
-    parent_state = State(max_iterations=10)
-    parent_controller = AgentController(
-        agent=mock_codeact_agent,
-        event_stream=mock_event_stream,
-        max_iterations=10,
-        sid='parent',
-        confirmation_mode=False,
-        headless_mode=True,
-        initial_state=parent_state,
-    )
-
-    # Verify we're starting with CodeActAgent
-    assert parent_controller.agent.name == 'CodeActAgent'
-    assert parent_controller.delegate is None
-
-    # Create a delegate action to switch to ReadOnlyAgent
-    delegate_action = AgentDelegateAction(
-        agent='ReadOnlyAgent',
-        inputs={
-            'task': 'Continue the conversation in READ-ONLY MODE. You can explore and analyze code but cannot make changes.'
-        },
-        thought='Switching to read-only mode at user\'s request'
-    )
-    
-    # Simulate the delegate action
-    await parent_controller._on_event(delegate_action)
-    
-    # Give time for the async step() to execute
-    await asyncio.sleep(0.5)
-    
-    # Verify that we've delegated to ReadOnlyAgent
-    assert parent_controller.delegate is not None
-    assert parent_controller.delegate.agent.name == 'ReadOnlyAgent'
-    
-    # Simulate a user message to the ReadOnlyAgent
-    message_action = MessageAction(content='Show me the files in this directory')
-    message_action._source = EventSource.USER
-    await parent_controller.delegate._on_event(message_action)
-    
-    # Give time for the async step() to execute
-    await asyncio.sleep(0.5)
-    
-    # Now simulate switching back to CodeActAgent with a finish action
-    finish_action = AgentFinishAction(
-        final_thought='Switching back to EXECUTE MODE. You now have full capabilities to modify code and execute commands.',
-        task_completed=True,
-        outputs={'mode_switch': True}
-    )
-    
-    # Send the finish action to the delegate
-    await parent_controller.delegate._on_event(finish_action)
-    
-    # Give time for the async step() to execute
-    await asyncio.sleep(0.5)
-    
-    # Verify that we're back to the parent CodeActAgent
-    assert parent_controller.delegate is None
-    assert parent_controller.agent.name == 'CodeActAgent'
-    
-    # Verify that a delegate observation was added to the event stream
-    events = list(mock_event_stream.get_events())
-    assert any(isinstance(event, AgentDelegateObservation) for event in events)
-    
-    # Cleanup
-    await parent_controller.close()
-    
-    # Restore the original get_cls method
-    Agent.get_cls = original_get_cls
@@ -35,6 +35,7 @@ def mock_agent():

    # Configure the agent config
    agent_config.disabled_microagents = []
+    agent_config.enable_mcp = True

    # Set up the chain of mocks
    llm.metrics = metrics
@@ -250,7 +250,7 @@ async def test_new_conversation_success(provider_handler_mock):
            mock_create_conversation.return_value = MagicMock(
                conversation_id='test_conversation_id',
                url='https://my-conversation.com',
-                api_key=None,
+                session_api_key=None,
            )

            test_request = InitSessionRequest(
@@ -292,7 +292,7 @@ async def test_new_conversation_with_suggested_task(provider_handler_mock):
            mock_create_conversation.return_value = MagicMock(
                conversation_id='test_conversation_id',
                url='https://my-conversation.com',
-                api_key=None,
+                session_api_key=None,
            )

            # Mock SuggestedTask.get_prompt_for_task
@@ -375,7 +375,7 @@ async def test_new_conversation_missing_settings(provider_handler_mock):


@pytest.mark.asyncio
-async def test_new_conversation_invalid_api_key(provider_handler_mock):
+async def test_new_conversation_invalid_session_api_key(provider_handler_mock):
    """Test creating a new conversation with an invalid API key."""
    with _patch_store():
        # Mock the _create_new_conversation function to raise LLMAuthenticationError
@@ -477,7 +477,7 @@ async def test_new_conversation_with_bearer_auth(provider_handler_mock):
            mock_create_conversation.return_value = MagicMock(
                conversation_id='test_conversation_id',
                url='https://my-conversation.com',
-                api_key=None,
+                session_api_key=None,
            )

            # Create the request object
@@ -514,7 +514,7 @@ async def test_new_conversation_with_null_repository():
            mock_create_conversation.return_value = MagicMock(
                conversation_id='test_conversation_id',
                url='https://my-conversation.com',
-                api_key=None,
+                session_api_key=None,
            )

            # Create the request object with null repository
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
openhands	7561518e4c	Fix lint errors	2025-05-19 16:55:19 +00:00
openhands	2cd503f033	Fix merge conflicts with main branch	2025-05-19 16:53:24 +00:00
dependabot[bot]	470687f826	chore(deps): bump the mcp-packages group with 2 updates (#8546 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-05-19 18:37:11 +02:00
tofarr	38b4d93237	Add Session API Key Authentication for Runtime Communication (#8550 )	2025-05-19 09:59:22 -06:00
dependabot[bot]	872b97a3c8	chore(deps): bump the version-all group across 1 directory with 20 updates (#8545 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>	2025-05-19 15:49:53 +00:00
sp.wack	14334040f1	chore(frontend): Refactor chat interface-related event handling (#8403 )	2025-05-19 15:15:09 +00:00
sp.wack	b244138ec5	fix(frontend): Prevent making too many calls to `/git/changes` on conversation load (#8579 )	2025-05-19 18:57:18 +04:00
Xingyao Wang	4a3d2e6859	Fix #8551 : Show images produced in Jupyter Notebook to LLM directly (#8552 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-19 14:14:00 +00:00
luolin101	1a3cb16ba6	add Visual SWE-bench benchmark (#7131 ) Co-authored-by: tsukimi <yuailun@pku.edu.cn> Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>	2025-05-19 12:08:46 +07:00
Xingyao Wang	2ecc39ffcc	[eval]: disable MCP for SWE-Bench evaluation (#8574 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Engel Nyst <engel.nyst@gmail.com>	2025-05-19 01:32:46 +00:00
Graham Neubig	0b26174d60	Add documentation microagent (#8563 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-18 16:34:15 -04:00
Calvin Smith	b0005d4e09	Limit size of browser events (#8559 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev>	2025-05-18 11:35:09 -06:00
Graham Neubig	2dc7b37fe8	Fix flaky TestLocalFileStore tests (#8569 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-18 06:37:28 -04:00
openhands	2d434ad49f	Rename RuntimeInfo to ContextualInfo	2025-05-17 22:31:31 +00:00
openhands	81c0253d53	Rename memory.set_runtime_info to memory.set_contextual_info	2025-05-17 22:27:48 +00:00
openhands	9cdde313d8	Add context_message parameter to conversation creation endpoint	2025-05-17 22:22:15 +00:00
Carlos Freund	27c18f5bdd	build(makefile) Develop in OpenhandsCloud (#7440 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-05-17 14:40:46 +00:00
Graham Neubig	5077fea5c7	Fix: Run setup.sh script in GitHub resolver (#8548 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-17 09:52:34 -04:00