Fix SubprocessBashSession to allow multiple commands by default

- Add allow_multiple_commands parameter to SubprocessBashSession constructor - Default to True to maintain compatibility with original CLIRuntime behavior - When False, rejects multiple commands separated by newlines for security - Fixes test_cliruntime_multiple_newline_commands test failure - Maintains security by allowing fine-grained control over command execution
Fix CLI runtime tests: handle interactive input and background processes
2026-04-29 03:00:45 -04:00 · 2025-06-18 18:32:29 +00:00 · 2025-06-18 00:54:09 +00:00 · 2025-06-18 00:46:43 +00:00 · 2025-06-18 00:10:23 +00:00 · 2025-06-17 21:59:13 +00:00
100 changed files with 2985 additions and 1137 deletions
--- a/.devcontainer/setup.sh
+++ b/.devcontainer/setup.sh
@@ -1,5 +1,9 @@
 #!/bin/bash

+# Mark the current repository as safe for Git to prevent "dubious ownership" errors,
+# which can occur in containerized environments when directory ownership doesn't match the current user.
+git config --global --add safe.directory "$(realpath .)"
+
 # Install `nc`
 sudo apt update && sudo apt install netcat -y

--- a/.openhands/microagents/repo.md
+++ b/.openhands/microagents/repo.md
@@ -5,6 +5,18 @@ This repository contains the code for OpenHands, an automated AI software engine
 To set up the entire repo, including frontend and backend, run `make build`.
 You don't need to do this unless the user asks you to, or if you're trying to run the entire application.

+## Running OpenHands with OpenHands:
+To run the full application for development or self-improvement:
+```bash
+export INSTALL_DOCKER=0
+export RUNTIME=local
+make build && make run
+```
+For external access (cloud environments), use:
+```bash
+make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
+```
+
 IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.

 Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
--- a/Development.md
+++ b/Development.md
@@ -103,6 +103,29 @@ components or interface enhancements.
  make start-frontend
  ```

+### 5. Running OpenHands with OpenHands
+
+You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
+
+#### Quick Start
+
+1. **Build and run OpenHands:**
+   ```bash
+   export INSTALL_DOCKER=0
+   export RUNTIME=local
+   make build && make run
+   ```
+
+2. **Access the interface:**
+   - Local development: http://localhost:3001
+   - Remote/cloud environments: Use the appropriate external URL
+
+3. **Configure for external access (if needed):**
+   ```bash
+   # For external access (e.g., cloud environments)
+   make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
+   ```
+
 ### 6. LLM Debugging

 If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
@@ -136,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
 To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
 container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.

-Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.43-nikolaik`
+Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.44-nikolaik`

 ## Develop inside Docker container

--- a/README.md
+++ b/README.md
@@ -62,19 +62,21 @@ system requirements and more information.


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.44
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!

 When you open the application, you'll be asked to choose an LLM provider and add an API key.
--- a/README_CN.md
+++ b/README_CN.md
@@ -51,19 +51,21 @@ OpenHands也可以使用Docker在本地系统上运行。


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.44
 ```

+> **注意**: 如果您在0.44版本之前使用过OpenHands，您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
+
 您将在[http://localhost:3000](http://localhost:3000)找到运行中的OpenHands！

 打开应用程序时，您将被要求选择一个LLM提供商并添加API密钥。
--- a/containers/app/Dockerfile
+++ b/containers/app/Dockerfile
@@ -44,7 +44,7 @@ ENV WORKSPACE_BASE=/opt/workspace_base
 ENV OPENHANDS_BUILD_VERSION=$OPENHANDS_BUILD_VERSION
 ENV SANDBOX_USER_ID=0
 ENV FILE_STORE=local
-ENV FILE_STORE_PATH=/.openhands-state
+ENV FILE_STORE_PATH=/.openhands
 RUN mkdir -p $FILE_STORE_PATH
 RUN mkdir -p $WORKSPACE_BASE

--- a/containers/dev/compose.yml
+++ b/containers/dev/compose.yml
@@ -12,7 +12,7 @@ services:
      - SANDBOX_API_HOSTNAME=host.docker.internal
      - DOCKER_HOST_ADDR=host.docker.internal
      #
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.43-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.44-nikolaik}
      - SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -7,8 +7,8 @@ services:
    image: openhands:latest
    container_name: openhands-app-${DATE:-}
    environment:
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik}
-      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of openhands-state for this user
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik}
+      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
      - "3000:3000"
@@ -16,7 +16,7 @@ services:
      - "host.docker.internal:host-gateway"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
-      - ~/.openhands-state:/.openhands-state
+      - ~/.openhands:/.openhands
      - ${WORKSPACE_BASE:-$PWD/workspace}:/opt/workspace_base
    pull_policy: build
    stdin_open: true
--- a/docs/usage/cloud/slack-installation.mdx
+++ b/docs/usage/cloud/slack-installation.mdx
@@ -5,15 +5,38 @@ description: This guide walks you through installing the OpenHands Slack app.

 ## Prerequisites

- You are a slack workspace admin
 - Access to OpenHands Cloud

 ## Installation Steps

-1. Log in to [OpenHands Cloud](https://app.all-hands.dev)
-2. Click the button below to OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
-3. In the top right corner, select the workspace to install the OpenHands Slack app.
-4. Review permissions and click allow
+<AccordionGroup>
+<Accordion title="Install Slack App (only for Slack admins/owners)">
+
+  **This step is for Slack admins/owners**
+
+  1. Make sure you have permissions to install Apps to your workspace.
+  2. Click the button below to install OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+</Accordion>
+
+<Accordion title="Authorize Slack App (for all Slack workspace members)">
+
+  **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first**
+
+  Every user in the Slack workspace (including admins/owners) must link their Cloud OpenHands account to the OpenHands Slack App. To do this:
+  1. Visit [integrations settings](https://app.all-hands.dev/settings/integrations) in OpenHands Cloud.
+  2. Click the button "Install Slack App".
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+  Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App.
+
+</Accordion>
+
+</AccordionGroup>
+

 ## Working With the Slack App

@@ -45,6 +68,6 @@ You can mention a repo name when starting a new conversation in the following fo
 2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)

 The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
-If the repo name partially matches against, multiple repos, you'll be asked to select a repo from the filtered list.
+If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list.

 ![slack-pro-tip.png](/static/img/slack-pro-tip.png)
--- a/docs/usage/how-to/cli-mode.mdx
+++ b/docs/usage/how-to/cli-mode.mdx
@@ -11,10 +11,18 @@ for scripting.

 ### Running with Python

+**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported)
+
 1. Install OpenHands using pip:

 ```bash
 pip install openhands-ai
+```
+
+   Or if you prefer not to manage your own Python environment, you can use `uvx`:
+
+```bash
+uvx --python 3.12 --from openhands-ai openhands
 ```

 2. Launch an interactive OpenHands conversation from the command line:
@@ -47,19 +55,21 @@ poetry run python -m openhands.cli.main
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
    -e LLM_MODEL=$LLM_MODEL \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.44 \
    python -m openhands.cli.main --override-cli-mode true
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 This launches the CLI in Docker, allowing you to interact with OpenHands as described above.

 The `-e SANDBOX_USER_ID=$(id -u)` ensures files created by the agent in your workspace have the correct permissions.
--- a/docs/usage/how-to/headless-mode.mdx
+++ b/docs/usage/how-to/headless-mode.mdx
@@ -32,19 +32,20 @@ To run OpenHands in Headless mode with Docker:
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
    -e LLM_MODEL=$LLM_MODEL \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.44 \
    python -m openhands.core.main -t "write a bash script that prints hi"
 ```
+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.

 The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user’s
 permissions. This prevents the agent from creating root-owned files in the mounted workspace.
--- a/docs/usage/llms/local-llms.mdx
+++ b/docs/usage/llms/local-llms.mdx
@@ -54,25 +54,27 @@ Check [the installation guide](/usage/local-setup) to make sure you have all the
 export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
 export LMSTUDIO_URL="http://host.docker.internal:1234"  # <- Replace this with the port from LMStudio

-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik

-mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json
+mkdir -p ~/.openhands && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands/settings.json

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.44
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
 ```
 Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
-Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.43
+Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.44
 Starting OpenHands...
 Running OpenHands as root
 14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
@@ -126,6 +128,18 @@ vllm serve all-hands/openhands-lm-32b-v0.1 \
    --enable-prefix-caching
 ```

+### Create an OpenAI-Compatible Endpoint with Ollama
+
+- Install Ollama following [the official documentation](https://ollama.com/download).
+- For Ollama configuration, use `ollama/<modelname>` as custom model in web. Api key also can be set to `ollama`.
+- Example launch command for Devstral LM 24B:
+
+```bash
+OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve&
+#The minimum context size is ~8196, even the system prompt won't fit smaller
+ollama pull devstral:latest
+```
+
 ## Advanced: Run and Configure OpenHands

 ### Run OpenHands
--- a/docs/usage/local-setup.mdx
+++ b/docs/usage/local-setup.mdx
@@ -67,19 +67,21 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
 ### Start the App

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.44
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 You'll find OpenHands running at http://localhost:3000!

 ### Setup
--- a/docs/usage/troubleshooting/troubleshooting.mdx
+++ b/docs/usage/troubleshooting/troubleshooting.mdx
@@ -31,9 +31,9 @@ On initial prompt, an error is seen with `Permission Denied` or `PermissionError

 **Resolution**

-* Check if the `~/.openhands-state` is owned by `root`. If so, you can:
-  * Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands-state`.
-  * or update permissions on the directory: `sudo chmod 777 ~/.openhands-state`
+* Check if the `~/.openhands` is owned by `root`. If so, you can:
+  * Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands`.
+  * or update permissions on the directory: `sudo chmod 777 ~/.openhands`
  * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings.
 * If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running
  OpenHands.
@@ -56,13 +56,16 @@ To fix this:
       -e SANDBOX_VSCODE_PORT=41234 \
       -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:latest \
       -v /var/run/docker.sock:/var/run/docker.sock \
-       -v ~/.openhands-state:/.openhands-state \
+       -v ~/.openhands:/.openhands \
       -p 3000:3000 \
       -p 41234:41234 \
       --add-host host.docker.internal:host-gateway \
       --name openhands-app \
       docker.all-hands.dev/all-hands-ai/openhands:latest
   ```
+
+   > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 2. Make sure to expose the same port with `-p 41234:41234` in your Docker command.
 3. If running with the development workflow, you can set this in your `config.toml` file:
   ```toml
--- a/evaluation/benchmarks/gaia/.gitignore
+++ b/evaluation/benchmarks/gaia/.gitignore
@@ -0,0 +1 @@
+data/
--- a/evaluation/benchmarks/gaia/README.md
+++ b/evaluation/benchmarks/gaia/README.md
@@ -6,6 +6,13 @@ This folder contains evaluation harness for evaluating agents on the [GAIA bench

 Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.

+To enable the Tavily MCP Server, you can add the Tavily API key under the `core` section of your `config.toml` file, like below:
+
+```toml
+[core]
+search_api_key = "tvly-******"
+```
+
 ## Run the evaluation

 We are using the GAIA dataset hosted on [Hugging Face](https://huggingface.co/datasets/gaia-benchmark/GAIA).
--- a/evaluation/benchmarks/gaia/run_infer.py
+++ b/evaluation/benchmarks/gaia/run_infer.py
@@ -1,4 +1,5 @@
 import asyncio
+import copy
 import functools
 import os
 import re
@@ -6,6 +7,7 @@ import re
 import huggingface_hub
 import pandas as pd
 from datasets import load_dataset
+from pydantic import SecretStr

 from evaluation.benchmarks.gaia.scorer import question_scorer
 from evaluation.utils.shared import (
@@ -24,6 +26,7 @@ from openhands.core.config import (
    OpenHandsConfig,
    get_llm_config_arg,
    get_parser,
+    load_from_toml,
 )
 from openhands.core.config.utils import get_agent_config_arg
 from openhands.core.logger import openhands_logger as logger
@@ -41,7 +44,7 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
 }

 AGENT_CLS_TO_INST_SUFFIX = {
-    'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
+    'CodeActAgent': 'When you think you have solved the question, please use the finish tool and include your final answer in the message parameter of the finish tool. Your final answer MUST be encapsulated within <solution> and </solution>.\n'
 }


@@ -49,7 +52,7 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    sandbox_config = get_default_sandbox_config_for_eval()
-    sandbox_config.base_container_image = 'python:3.12-bookworm'
+    sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
    config = OpenHandsConfig(
        default_agent=metadata.agent_class,
        run_as_openhands=False,
@@ -67,6 +70,11 @@ def get_config(
        logger.info('Agent config not provided, using default settings')
        agent_config = config.get_agent_config(metadata.agent_class)
        agent_config.enable_prompt_extensions = False
+
+    config_copy = copy.deepcopy(config)
+    load_from_toml(config_copy)
+    if config_copy.search_api_key:
+        config.search_api_key = SecretStr(config_copy.search_api_key)
    return config


@@ -134,16 +142,26 @@ def process_instance(
        dest_file = None

    # Prepare instruction
-    instruction = f'{instance["Question"]}\n'
+    instruction = """You have one question to answer. It is paramount that you provide a correct answer.
+Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
+You must make sure you find the correct answer! You MUST strictly follow the task-specific formatting instructions for your final answer.
+Here is the task:
+{task_question}
+""".format(
+        task_question=instance['Question'],
+    )
    logger.info(f'Instruction: {instruction}')
    if dest_file:
        instruction += f'\n\nThe mentioned file is provided in the workspace at: {dest_file.split("/")[-1]}'

-    instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
-    instruction += 'Please encapsulate your final answer (answer ONLY) within <solution> and </solution>.\n'
+    instruction += """IMPORTANT: When seeking information from a website, REFRAIN from arbitrary URL navigation. You should utilize the designated search engine tool with precise keywords to obtain relevant URLs or use the specific website's search interface. DO NOT navigate directly to specific URLs as they may not exist.\n\nFor example: if you want to search for a research paper on Arxiv, either use the search engine tool with specific keywords or navigate to arxiv.org and then use its interface.\n"""
+    instruction += 'IMPORTANT: You should NEVER ask for Human Help.\n'
+    instruction += 'IMPORTANT: Please encapsulate your final answer (answer ONLY) within <solution> and </solution>. Your answer will be evaluated using string matching approaches so it important that you STRICTLY adhere to the output formatting instructions specified in the task (e.g., alphabetization, sequencing, units, rounding, decimal places, etc.)\n'
    instruction += (
        'For example: The answer to the question is <solution> 42 </solution>.\n'
    )
+    instruction += "IMPORTANT: Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, express it numerically (i.e., with digits rather than words), do not use commas, and do not include units such as $ or percent signs unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities). If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.\n"
+
    # NOTE: You can actually set slightly different instruction for different agents
    instruction += AGENT_CLS_TO_INST_SUFFIX.get(metadata.agent_class, '')
    logger.info(f'Instruction:\n{instruction}', extra={'msg_type': 'OBSERVATION'})
@@ -175,7 +193,7 @@ def process_instance(
    for event in reversed(state.history):
        if event.source == 'agent':
            if isinstance(event, AgentFinishAction):
-                model_answer_raw = event.thought
+                model_answer_raw = event.final_thought
                break
            elif isinstance(event, CmdRunAction):
                model_answer_raw = event.thought
@@ -222,6 +240,7 @@ def process_instance(
        error=state.last_error if state and state.last_error else None,
        test_result=test_result,
    )
+    runtime.close()
    return output


@@ -253,6 +272,8 @@ if __name__ == '__main__':
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')

+    toml_config = OpenHandsConfig()
+    load_from_toml(toml_config)
    metadata = make_metadata(
        llm_config=llm_config,
        dataset_name='gaia',
@@ -261,7 +282,10 @@ if __name__ == '__main__':
        eval_note=args.eval_note,
        eval_output_dir=args.eval_output_dir,
        data_split=args.data_split,
-        details={'gaia-level': args.level},
+        details={
+            'gaia-level': args.level,
+            'mcp-servers': ['tavily'] if toml_config.search_api_key else [],
+        },
        agent_config=agent_config,
    )

--- a/evaluation/benchmarks/gaia/scripts/run_infer.sh
+++ b/evaluation/benchmarks/gaia/scripts/run_infer.sh
@@ -39,7 +39,7 @@ echo "LEVELS: $LEVELS"
 COMMAND="poetry run python ./evaluation/benchmarks/gaia/run_infer.py \
  --agent-cls $AGENT \
  --llm-config $MODEL_CONFIG \
-  --max-iterations 30 \
+  --max-iterations 60 \
  --level $LEVELS \
  --data-split validation \
  --eval-num-workers $NUM_WORKERS \
--- a/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
@@ -116,7 +116,7 @@ def get_token_per_line(code: str):
    return identifiers_per_line


-def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
+def get_ISM(answer_code: str, model_output_list: list, answer_name: str) -> list:
    """
    计算ISM，返回一个有序的得分列表
    :return:
@@ -126,13 +126,13 @@ def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list
        if '```python' in code:
            code = code.replace('```python', '')
            code = code.replace('```', '')
-        if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
+        if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
            code
        ):
            score_list.append(0)
            continue

-        # if asnwer_name not in code:
+        # if answer_name not in code:
        #     score_list.append(0)
        #     continue

@@ -155,7 +155,7 @@ def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list


 def get_ISM_without_verification(
-    answer_code: str, model_output_list: list, asnwer_name: str
+    answer_code: str, model_output_list: list, answer_name: str
 ) -> list:
    """
    计算ISM，返回一个有序的得分列表
@@ -163,11 +163,11 @@ def get_ISM_without_verification(
    """
    score_list = []
    for code in model_output_list:
-        if asnwer_name not in code:
+        if answer_name not in code:
            score_list.append(0)
            continue

-        # if asnwer_name not in code:
+        # if answer_name not in code:
        #     score_list.append(0)
        #     continue

@@ -215,7 +215,7 @@ def longest_common_prefix_with_lengths(list1, list2):
    return max_length, len_list1, len_list2


-def get_PM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
+def get_PM(answer_code: str, model_output_list: list, answer_name: str) -> list:
    """
    计算PM，返回一个有序的得分列表
    :return:
@@ -225,14 +225,14 @@ def get_PM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
        if '```python' in code:
            code = code.replace('```python', '')
            code = code.replace('```', '')
-        if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
+        if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
            code
        ):
-            # if asnwer_name not in code or is_code_valid(code) == False:
+            # if answer_name not in code or is_code_valid(code) == False:
            score_list.append(0)
            continue

-        # if asnwer_name not in code:
+        # if answer_name not in code:
        #     score_list.append(0)
        #     continue

--- a/frontend/tests/components/features/home/repo-connector.test.tsx
+++ b/frontend/tests/components/features/home/repo-connector.test.tsx
@@ -31,7 +31,7 @@ const renderRepoConnector = () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
      ],
    },
--- a/frontend/tests/routes/git-settings.test.tsx
+++ b/frontend/tests/routes/git-settings.test.tsx
@@ -35,13 +35,13 @@ const queryClient = new QueryClient();
 const GitSettingsRouterStub = createRoutesStub([
  {
    Component: GitSettingsScreen,
-    path: "/settings/github",
+    path: "/settings/integrations",
  },
 ]);

 const renderGitSettingsScreen = () => {
  const { rerender, ...rest } = render(
-    <GitSettingsRouterStub initialEntries={["/settings/github"]} />,
+    <GitSettingsRouterStub initialEntries={["/settings/integrations"]} />,
    {
      wrapper: ({ children }) => (
        <QueryClientProvider client={queryClient}>
@@ -54,7 +54,7 @@ const renderGitSettingsScreen = () => {
  const rerenderGitSettingsScreen = () =>
    rerender(
      <QueryClientProvider client={queryClient}>
-        <GitSettingsRouterStub initialEntries={["/settings/github"]} />
+        <GitSettingsRouterStub initialEntries={["/settings/integrations"]} />
      </QueryClientProvider>,
    );

--- a/frontend/tests/routes/secrets-settings.test.tsx
+++ b/frontend/tests/routes/secrets-settings.test.tsx
@@ -31,7 +31,7 @@ const RouterStub = createRoutesStub([
      },
      {
        Component: () => <div data-testid="git-settings-screen" />,
-        path: "/settings/git",
+        path: "/settings/integrations",
      },
    ],
  },
--- a/frontend/tests/routes/settings-with-payment.test.tsx
+++ b/frontend/tests/routes/settings-with-payment.test.tsx
@@ -30,7 +30,7 @@ vi.mock("react-i18next", async () => {
    useTranslation: () => ({
      t: (key: string) => {
        const translations: Record<string, string> = {
-          "SETTINGS$NAV_GIT": "Git",
+          "SETTINGS$NAV_INTEGRATIONS": "Integrations",
          "SETTINGS$NAV_APPLICATION": "Application",
          "SETTINGS$NAV_CREDITS": "Credits",
          "SETTINGS$NAV_API_KEYS": "API Keys",
@@ -61,7 +61,7 @@ describe("Settings Billing", () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
        {
          Component: () => <div data-testid="user-settings-screen" />,
--- a/frontend/tests/routes/settings.test.tsx
+++ b/frontend/tests/routes/settings.test.tsx
@@ -14,7 +14,7 @@ vi.mock("react-i18next", async () => {
    useTranslation: () => ({
      t: (key: string) => {
        const translations: Record<string, string> = {
-          SETTINGS$NAV_GIT: "Git",
+          SETTINGS$NAV_INTEGRATIONS: "Integrations",
          SETTINGS$NAV_APPLICATION: "Application",
          SETTINGS$NAV_CREDITS: "Credits",
          SETTINGS$NAV_API_KEYS: "API Keys",
@@ -49,7 +49,7 @@ describe("Settings Screen", () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
        {
          Component: () => <div data-testid="application-settings-screen" />,
@@ -79,7 +79,7 @@ describe("Settings Screen", () => {
  };

  it("should render the navbar", async () => {
-    const sectionsToInclude = ["llm", "git", "application", "secrets"];
+    const sectionsToInclude = ["llm", "integrations", "application", "secrets"];
    const sectionsToExclude = ["api keys", "credits"];
    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
    // @ts-expect-error - only return app mode
@@ -111,7 +111,7 @@ describe("Settings Screen", () => {
      APP_MODE: "saas",
    });
    const sectionsToInclude = [
-      "git",
+      "integrations",
      "application",
      "credits",
      "secrets",
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@@ -1,12 +1,12 @@
 {
  "name": "openhands-frontend",
-  "version": "0.43.0",
+  "version": "0.44.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "openhands-frontend",
-      "version": "0.43.0",
+      "version": "0.44.0",
      "dependencies": {
        "@heroui/react": "^2.8.0-beta.7",
        "@microlink/react-json-view": "^1.26.2",
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -1,6 +1,6 @@
 {
  "name": "openhands-frontend",
-  "version": "0.43.0",
+  "version": "0.44.0",
  "private": true,
  "type": "module",
  "engines": {
--- a/frontend/src/api/open-hands.ts
+++ b/frontend/src/api/open-hands.ts
@@ -111,6 +111,59 @@ class OpenHands {
    return data;
  }

+  /**
+   * Submit conversation feedback with rating
+   * @param conversationId The conversation ID
+   * @param rating The rating (1-5)
+   * @param eventId Optional event ID this feedback corresponds to
+   * @param reason Optional reason for the rating
+   * @returns Response from the feedback endpoint
+   */
+  static async submitConversationFeedback(
+    conversationId: string,
+    rating: number,
+    eventId?: number,
+    reason?: string,
+  ): Promise<{ status: string; message: string }> {
+    const url = `/feedback/conversation`;
+    const payload = {
+      conversation_id: conversationId,
+      event_id: eventId,
+      rating,
+      reason,
+      metadata: { source: "likert-scale" },
+    };
+    const { data } = await openHands.post<{ status: string; message: string }>(
+      url,
+      payload,
+    );
+    return data;
+  }
+
+  /**
+   * Check if feedback exists for a specific conversation and event
+   * @param conversationId The conversation ID
+   * @param eventId The event ID to check
+   * @returns Feedback data including existence, rating, and reason
+   */
+  static async checkFeedbackExists(
+    conversationId: string,
+    eventId: number,
+  ): Promise<{ exists: boolean; rating?: number; reason?: string }> {
+    try {
+      const url = `/feedback/conversation/${conversationId}/${eventId}`;
+      const { data } = await openHands.get<{
+        exists: boolean;
+        rating?: number;
+        reason?: string;
+      }>(url);
+      return data;
+    } catch (error) {
+      // Error checking if feedback exists
+      return { exists: false };
+    }
+  }
+
  /**
   * Authenticate with GitHub token
   * @returns Response with authentication status and user info if successful
--- a/frontend/src/components/features/chat/chat-interface.tsx
+++ b/frontend/src/components/features/chat/chat-interface.tsx
@@ -18,6 +18,7 @@ import { useWsClient } from "#/context/ws-client-provider";
 import { Messages } from "./messages";
 import { ChatSuggestions } from "./chat-suggestions";
 import { ActionSuggestions } from "./action-suggestions";
+import { ScrollProvider } from "#/context/scroll-context";

 import { ScrollToBottomButton } from "#/components/shared/buttons/scroll-to-bottom-button";
 import { LoadingSpinner } from "#/components/shared/loading-spinner";
@@ -28,6 +29,7 @@ import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
 import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
 import { ErrorMessageBanner } from "./error-message-banner";
 import { shouldRenderEvent } from "./event-content-helpers/should-render-event";
+import { useConfig } from "#/hooks/query/use-config";

 function getEntryPoint(
  hasRepository: boolean | null,
@@ -45,8 +47,15 @@ export function ChatInterface() {
    useOptimisticUserMessage();
  const { t } = useTranslation();
  const scrollRef = React.useRef<HTMLDivElement>(null);
-  const { scrollDomToBottom, onChatBodyScroll, hitBottom } =
-    useScrollToBottom(scrollRef);
+  const {
+    scrollDomToBottom,
+    onChatBodyScroll,
+    hitBottom,
+    autoScroll,
+    setAutoScroll,
+    setHitBottom,
+  } = useScrollToBottom(scrollRef);
+  const { data: config } = useConfig();

  const { curAgentState } = useSelector((state: RootState) => state.agent);

@@ -126,80 +135,97 @@ export function ChatInterface() {
    curAgentState === AgentState.AWAITING_USER_INPUT ||
    curAgentState === AgentState.FINISHED;

+  // Create a ScrollProvider with the scroll hook values
+  const scrollProviderValue = {
+    scrollRef,
+    autoScroll,
+    setAutoScroll,
+    scrollDomToBottom,
+    hitBottom,
+    setHitBottom,
+    onChatBodyScroll,
+  };
+
  return (
-    <div className="h-full flex flex-col justify-between">
-      {events.length === 0 && !optimisticUserMessage && (
-        <ChatSuggestions onSuggestionsClick={setMessageToSend} />
-      )}
-
-      <div
-        ref={scrollRef}
-        onScroll={(e) => onChatBodyScroll(e.currentTarget)}
-        className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
-      >
-        {isLoadingMessages && (
-          <div className="flex justify-center">
-            <LoadingSpinner size="small" />
-          </div>
+    <ScrollProvider value={scrollProviderValue}>
+      <div className="h-full flex flex-col justify-between">
+        {events.length === 0 && !optimisticUserMessage && (
+          <ChatSuggestions onSuggestionsClick={setMessageToSend} />
        )}

-        {!isLoadingMessages && (
-          <Messages
-            messages={events}
-            isAwaitingUserConfirmation={
-              curAgentState === AgentState.AWAITING_USER_CONFIRMATION
-            }
-          />
-        )}
+        <div
+          ref={scrollRef}
+          onScroll={(e) => onChatBodyScroll(e.currentTarget)}
+          className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
+        >
+          {isLoadingMessages && (
+            <div className="flex justify-center">
+              <LoadingSpinner size="small" />
+            </div>
+          )}

-        {isWaitingForUserInput &&
-          events.length > 0 &&
-          !optimisticUserMessage && (
-            <ActionSuggestions
-              onSuggestionsClick={(value) => handleSendMessage(value, [])}
+          {!isLoadingMessages && (
+            <Messages
+              messages={events}
+              isAwaitingUserConfirmation={
+                curAgentState === AgentState.AWAITING_USER_CONFIRMATION
+              }
            />
          )}
-      </div>

-      <div className="flex flex-col gap-[6px] px-4 pb-4">
-        <div className="flex justify-between relative">
-          <TrajectoryActions
-            onPositiveFeedback={() =>
-              onClickShareFeedbackActionButton("positive")
-            }
-            onNegativeFeedback={() =>
-              onClickShareFeedbackActionButton("negative")
-            }
-            onExportTrajectory={() => onClickExportTrajectoryButton()}
-          />
-
-          <div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
-            {curAgentState === AgentState.RUNNING && <TypingIndicator />}
-          </div>
-
-          {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
+          {isWaitingForUserInput &&
+            events.length > 0 &&
+            !optimisticUserMessage && (
+              <ActionSuggestions
+                onSuggestionsClick={(value) => handleSendMessage(value, [])}
+              />
+            )}
        </div>

-        {errorMessage && <ErrorMessageBanner message={errorMessage} />}
+        <div className="flex flex-col gap-[6px] px-4 pb-4">
+          <div className="flex justify-between relative">
+            {config?.APP_MODE !== "saas" && (
+              <TrajectoryActions
+                onPositiveFeedback={() =>
+                  onClickShareFeedbackActionButton("positive")
+                }
+                onNegativeFeedback={() =>
+                  onClickShareFeedbackActionButton("negative")
+                }
+                onExportTrajectory={() => onClickExportTrajectoryButton()}
+              />
+            )}

-        <InteractiveChatBox
-          onSubmit={handleSendMessage}
-          onStop={handleStop}
-          isDisabled={
-            curAgentState === AgentState.LOADING ||
-            curAgentState === AgentState.AWAITING_USER_CONFIRMATION
-          }
-          mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
-          value={messageToSend ?? undefined}
-          onChange={setMessageToSend}
-        />
+            <div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
+              {curAgentState === AgentState.RUNNING && <TypingIndicator />}
+            </div>
+
+            {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
+          </div>
+
+          {errorMessage && <ErrorMessageBanner message={errorMessage} />}
+
+          <InteractiveChatBox
+            onSubmit={handleSendMessage}
+            onStop={handleStop}
+            isDisabled={
+              curAgentState === AgentState.LOADING ||
+              curAgentState === AgentState.AWAITING_USER_CONFIRMATION
+            }
+            mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
+            value={messageToSend ?? undefined}
+            onChange={setMessageToSend}
+          />
+        </div>
+
+        {config?.APP_MODE !== "saas" && (
+          <FeedbackModal
+            isOpen={feedbackModalIsOpen}
+            onClose={() => setFeedbackModalIsOpen(false)}
+            polarity={feedbackPolarity}
+          />
+        )}
      </div>
-
-      <FeedbackModal
-        isOpen={feedbackModalIsOpen}
-        onClose={() => setFeedbackModalIsOpen(false)}
-        polarity={feedbackPolarity}
-      />
-    </div>
+    </ScrollProvider>
  );
 }
--- a/frontend/src/components/features/chat/event-message.tsx
+++ b/frontend/src/components/features/chat/event-message.tsx
@@ -1,3 +1,4 @@
+import React from "react";
 import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
 import { OpenHandsAction } from "#/types/core/actions";
 import {
@@ -18,6 +19,10 @@ import { MCPObservationContent } from "./mcp-observation-content";
 import { getObservationResult } from "./event-content-helpers/get-observation-result";
 import { getEventContent } from "./event-content-helpers/get-event-content";
 import { GenericEventMessage } from "./generic-event-message";
+import { LikertScale } from "../feedback/likert-scale";
+
+import { useConfig } from "#/hooks/query/use-config";
+import { useFeedbackExists } from "#/hooks/query/use-feedback-exists";

 const hasThoughtProperty = (
  obj: Record<string, unknown>,
@@ -39,6 +44,14 @@ export function EventMessage({
  const shouldShowConfirmationButtons =
    isLastMessage && event.source === "agent" && isAwaitingUserConfirmation;

+  const { data: config } = useConfig();
+
+  // Use our query hook to check if feedback exists and get rating/reason
+  const {
+    data: feedbackData = { exists: false },
+    isLoading: isCheckingFeedback,
+  } = useFeedbackExists(isFinishAction(event) ? event.id : undefined);
+
  if (isErrorObservation(event)) {
    return (
      <ErrorMessage
@@ -55,9 +68,25 @@ export function EventMessage({
    return null;
  }

+  const showLikertScale =
+    config?.APP_MODE === "saas" &&
+    isFinishAction(event) &&
+    isLastMessage &&
+    !isCheckingFeedback;
+
  if (isFinishAction(event)) {
    return (
-      <ChatMessage type="agent" message={getEventContent(event).details} />
+      <>
+        <ChatMessage type="agent" message={getEventContent(event).details} />
+        {showLikertScale && (
+          <LikertScale
+            eventId={event.id}
+            initiallySubmitted={feedbackData.exists}
+            initialRating={feedbackData.rating}
+            initialReason={feedbackData.reason}
+          />
+        )}
+      </>
    );
  }

--- a/frontend/src/components/features/feedback/likert-scale.tsx
+++ b/frontend/src/components/features/feedback/likert-scale.tsx
@@ -0,0 +1,248 @@
+import React, { useState, useEffect, useContext } from "react";
+import { cn } from "#/utils/utils";
+import i18n from "#/i18n";
+import { useSubmitConversationFeedback } from "#/hooks/mutation/use-submit-conversation-feedback";
+import { ScrollContext } from "#/context/scroll-context";
+
+// Global timeout duration in milliseconds
+const AUTO_SUBMIT_TIMEOUT = 10000;
+
+interface LikertScaleProps {
+  eventId?: number;
+  initiallySubmitted?: boolean;
+  initialRating?: number;
+  initialReason?: string;
+}
+
+const FEEDBACK_REASONS = [
+  i18n.t("FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION"),
+  i18n.t("FEEDBACK$REASON_FORGOT_CONTEXT"),
+  i18n.t("FEEDBACK$REASON_UNNECESSARY_CHANGES"),
+  i18n.t("FEEDBACK$REASON_OTHER"),
+];
+
+export function LikertScale({
+  eventId,
+  initiallySubmitted = false,
+  initialRating,
+  initialReason,
+}: LikertScaleProps) {
+  const [selectedRating, setSelectedRating] = useState<number | null>(
+    initialRating || null,
+  );
+  const [selectedReason, setSelectedReason] = useState<string | null>(
+    initialReason || null,
+  );
+  const [showReasons, setShowReasons] = useState(false);
+  const [reasonTimeout, setReasonTimeout] = useState<NodeJS.Timeout | null>(
+    null,
+  );
+  const [isSubmitted, setIsSubmitted] = useState(initiallySubmitted);
+  const [countdown, setCountdown] = useState<number>(0);
+
+  // Get scroll context
+  const scrollContext = useContext(ScrollContext);
+
+  // If scrollContext is undefined, we're not inside a ScrollProvider
+  const scrollToBottom = scrollContext?.scrollDomToBottom;
+  const autoScroll = scrollContext?.autoScroll;
+
+  // Use our mutation hook
+  const { mutate: submitConversationFeedback } =
+    useSubmitConversationFeedback();
+
+  // Update isSubmitted if initiallySubmitted changes
+  useEffect(() => {
+    setIsSubmitted(initiallySubmitted);
+  }, [initiallySubmitted]);
+
+  // Update selectedRating if initialRating changes
+  useEffect(() => {
+    if (initialRating) {
+      setSelectedRating(initialRating);
+    }
+  }, [initialRating]);
+
+  // Update selectedReason if initialReason changes
+  useEffect(() => {
+    if (initialReason) {
+      setSelectedReason(initialReason);
+    }
+  }, [initialReason]);
+
+  // Submit feedback and disable the component
+  const submitFeedback = (rating: number, reason?: string) => {
+    submitConversationFeedback(
+      {
+        rating,
+        eventId,
+        reason,
+      },
+      {
+        onSuccess: () => {
+          setSelectedReason(reason || null);
+          setShowReasons(false);
+          setIsSubmitted(true);
+        },
+      },
+    );
+  };
+
+  // Handle star rating selection
+  const handleRatingClick = (rating: number) => {
+    if (isSubmitted) return; // Prevent changes after submission
+
+    setSelectedRating(rating);
+
+    // Only show reasons if rating is 3 or less (1, 2, or 3 stars)
+    // For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
+    if (rating <= 3) {
+      setShowReasons(true);
+      setCountdown(Math.ceil(AUTO_SUBMIT_TIMEOUT / 1000));
+
+      // Set a timeout to auto-submit if no reason is selected
+      const timeout = setTimeout(() => {
+        submitFeedback(rating);
+      }, AUTO_SUBMIT_TIMEOUT);
+
+      setReasonTimeout(timeout);
+
+      // Only scroll to bottom if the user is already at the bottom (autoScroll is true)
+      if (scrollToBottom && autoScroll) {
+        // Small delay to ensure the reasons are fully rendered
+        setTimeout(() => {
+          scrollToBottom();
+        }, 100);
+      }
+    } else {
+      // For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
+      setShowReasons(false);
+      submitFeedback(rating);
+    }
+  };
+
+  // Handle reason selection
+  const handleReasonClick = (reason: string) => {
+    if (selectedRating && reasonTimeout && !isSubmitted) {
+      clearTimeout(reasonTimeout);
+      setCountdown(0);
+      submitFeedback(selectedRating, reason);
+    }
+  };
+
+  // Countdown effect
+  useEffect(() => {
+    if (countdown > 0 && showReasons && !isSubmitted) {
+      const timer = setTimeout(() => {
+        setCountdown(countdown - 1);
+      }, 1000);
+      return () => clearTimeout(timer);
+    }
+    return () => {};
+  }, [countdown, showReasons, isSubmitted]);
+
+  // Clean up timeout on unmount
+  useEffect(
+    () => () => {
+      if (reasonTimeout) {
+        clearTimeout(reasonTimeout);
+      }
+    },
+    [reasonTimeout],
+  );
+
+  // Scroll to bottom when component mounts, but only if user is already at the bottom
+  useEffect(() => {
+    if (scrollToBottom && autoScroll && !isSubmitted) {
+      // Small delay to ensure the component is fully rendered
+      setTimeout(() => {
+        scrollToBottom();
+      }, 100);
+    }
+  }, [scrollToBottom, autoScroll, isSubmitted]);
+
+  // Scroll to bottom when reasons are shown, but only if user is already at the bottom
+  useEffect(() => {
+    if (scrollToBottom && autoScroll && showReasons) {
+      // Small delay to ensure the reasons are fully rendered
+      setTimeout(() => {
+        scrollToBottom();
+      }, 100);
+    }
+  }, [scrollToBottom, autoScroll, showReasons]);
+
+  // Helper function to get button class based on state
+  const getButtonClass = (rating: number) => {
+    if (isSubmitted) {
+      return selectedRating && selectedRating >= rating
+        ? "text-yellow-400 cursor-not-allowed"
+        : "text-gray-300 opacity-50 cursor-not-allowed";
+    }
+
+    return selectedRating && selectedRating >= rating
+      ? "text-yellow-400"
+      : "text-gray-300 hover:text-yellow-200";
+  };
+
+  return (
+    <div className="mt-3 flex flex-col gap-1">
+      <div className="text-sm text-gray-500 mb-1">
+        {isSubmitted
+          ? i18n.t("FEEDBACK$THANK_YOU_FOR_FEEDBACK")
+          : i18n.t("FEEDBACK$RATE_AGENT_PERFORMANCE")}
+      </div>
+      <div className="flex flex-col gap-1">
+        <span className="flex gap-2 items-center flex-wrap">
+          {[1, 2, 3, 4, 5].map((rating) => (
+            <button
+              type="button"
+              key={rating}
+              onClick={() => handleRatingClick(rating)}
+              disabled={isSubmitted}
+              className={cn("text-xl transition-all", getButtonClass(rating))}
+              aria-label={`Rate ${rating} stars`}
+            >
+              ★
+            </button>
+          ))}
+          {/* Show selected reason inline with stars when submitted (only for ratings <= 3) */}
+          {isSubmitted &&
+            selectedReason &&
+            selectedRating &&
+            selectedRating <= 3 && (
+              <span className="text-sm text-gray-500 italic">
+                {selectedReason}
+              </span>
+            )}
+        </span>
+      </div>
+
+      {showReasons && !isSubmitted && (
+        <div className="mt-1 flex flex-col gap-1">
+          <div className="text-xs text-gray-500 mb-1">
+            {i18n.t("FEEDBACK$SELECT_REASON")}
+          </div>
+          {countdown > 0 && (
+            <div className="text-xs text-gray-400 mb-1 italic">
+              {i18n.t("FEEDBACK$SELECT_REASON_COUNTDOWN", {
+                countdown,
+              })}
+            </div>
+          )}
+          <div className="flex flex-col gap-0.5">
+            {FEEDBACK_REASONS.map((reason) => (
+              <button
+                type="button"
+                key={reason}
+                onClick={() => handleReasonClick(reason)}
+                className="text-sm text-left py-1 px-2 rounded hover:bg-gray-700 transition-colors"
+              >
+                {reason}
+              </button>
+            ))}
+          </div>
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/features/home/connect-to-provider-message.tsx
+++ b/frontend/src/components/features/home/connect-to-provider-message.tsx
@@ -10,7 +10,10 @@ export function ConnectToProviderMessage() {
  return (
    <div className="flex flex-col gap-4">
      <p>{t("HOME$CONNECT_PROVIDER_MESSAGE")}</p>
-      <Link data-testid="navigate-to-settings-button" to="/settings/git">
+      <Link
+        data-testid="navigate-to-settings-button"
+        to="/settings/integrations"
+      >
        <BrandButton type="button" variant="primary" isDisabled={isLoading}>
          {!isLoading && t("SETTINGS$TITLE")}
          {isLoading && t("HOME$LOADING")}
--- a/frontend/src/components/features/settings/git-settings/install-slack-app-anchor.tsx
+++ b/frontend/src/components/features/settings/git-settings/install-slack-app-anchor.tsx
@@ -0,0 +1,21 @@
+import { useTranslation } from "react-i18next";
+import { I18nKey } from "#/i18n/declaration";
+import { BrandButton } from "../brand-button";
+
+export function InstallSlackAppAnchor() {
+  const { t } = useTranslation();
+
+  return (
+    <a
+      data-testid="install-slack-app-button"
+      href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"
+      target="_blank"
+      rel="noreferrer noopener"
+      className="py-9"
+    >
+      <BrandButton type="button" variant="secondary">
+        {t(I18nKey.SLACK$INSTALL_APP)}
+      </BrandButton>
+    </a>
+  );
+}
--- a/frontend/src/context/scroll-context.tsx
+++ b/frontend/src/context/scroll-context.tsx
@@ -0,0 +1,42 @@
+import React, { createContext, useContext, ReactNode, RefObject } from "react";
+import { useScrollToBottom } from "#/hooks/use-scroll-to-bottom";
+
+interface ScrollContextType {
+  scrollRef: RefObject<HTMLDivElement | null>;
+  autoScroll: boolean;
+  setAutoScroll: (value: boolean) => void;
+  scrollDomToBottom: () => void;
+  hitBottom: boolean;
+  setHitBottom: (value: boolean) => void;
+  onChatBodyScroll: (e: HTMLElement) => void;
+}
+
+export const ScrollContext = createContext<ScrollContextType | undefined>(
+  undefined,
+);
+
+interface ScrollProviderProps {
+  children: ReactNode;
+  value?: ScrollContextType;
+}
+
+export function ScrollProvider({ children, value }: ScrollProviderProps) {
+  const scrollHook = useScrollToBottom(React.useRef<HTMLDivElement>(null));
+
+  // Use provided value or default to the hook
+  const contextValue = value || scrollHook;
+
+  return (
+    <ScrollContext.Provider value={contextValue}>
+      {children}
+    </ScrollContext.Provider>
+  );
+}
+
+export function useScrollContext() {
+  const context = useContext(ScrollContext);
+  if (context === undefined) {
+    throw new Error("useScrollContext must be used within a ScrollProvider");
+  }
+  return context;
+}
--- a/frontend/src/hooks/mutation/use-submit-conversation-feedback.ts
+++ b/frontend/src/hooks/mutation/use-submit-conversation-feedback.ts
@@ -0,0 +1,39 @@
+import { useMutation, useQueryClient } from "@tanstack/react-query";
+import { useTranslation } from "react-i18next";
+import OpenHands from "#/api/open-hands";
+import { useConversationId } from "#/hooks/use-conversation-id";
+
+type SubmitConversationFeedbackArgs = {
+  rating: number;
+  eventId?: number;
+  reason?: string;
+};
+
+export const useSubmitConversationFeedback = () => {
+  const { conversationId } = useConversationId();
+  const queryClient = useQueryClient();
+  const { t } = useTranslation();
+
+  return useMutation({
+    mutationFn: ({ rating, eventId, reason }: SubmitConversationFeedbackArgs) =>
+      OpenHands.submitConversationFeedback(
+        conversationId,
+        rating,
+        eventId,
+        reason,
+      ),
+    onSuccess: (_, { eventId }) => {
+      // Invalidate the feedback existence query to trigger a refetch
+      if (eventId) {
+        queryClient.invalidateQueries({
+          queryKey: ["feedback", "exists", conversationId, eventId],
+        });
+      }
+    },
+    onError: (error) => {
+      // Log error but don't show toast - user will just see the UI stay in unsubmitted state
+      // eslint-disable-next-line no-console
+      console.error(t("FEEDBACK$FAILED_TO_SUBMIT"), error);
+    },
+  });
+};
--- a/frontend/src/hooks/query/use-feedback-exists.ts
+++ b/frontend/src/hooks/query/use-feedback-exists.ts
@@ -0,0 +1,24 @@
+import { useQuery } from "@tanstack/react-query";
+import OpenHands from "#/api/open-hands";
+import { useConversationId } from "#/hooks/use-conversation-id";
+
+export interface FeedbackData {
+  exists: boolean;
+  rating?: number;
+  reason?: string;
+}
+
+export const useFeedbackExists = (eventId?: number) => {
+  const { conversationId } = useConversationId();
+
+  return useQuery<FeedbackData>({
+    queryKey: ["feedback", "exists", conversationId, eventId],
+    queryFn: () => {
+      if (!eventId) return { exists: false };
+      return OpenHands.checkFeedbackExists(conversationId, eventId);
+    },
+    enabled: !!eventId,
+    staleTime: 1000 * 60 * 5, // 5 minutes
+    gcTime: 1000 * 60 * 15, // 15 minutes
+  });
+};
--- a/frontend/src/i18n/declaration.ts
+++ b/frontend/src/i18n/declaration.ts
@@ -80,7 +80,7 @@ export enum I18nKey {
  ANALYTICS$CONFIRM_PREFERENCES = "ANALYTICS$CONFIRM_PREFERENCES",
  SETTINGS$SAVING = "SETTINGS$SAVING",
  SETTINGS$SAVE_CHANGES = "SETTINGS$SAVE_CHANGES",
-  SETTINGS$NAV_GIT = "SETTINGS$NAV_GIT",
+  SETTINGS$NAV_INTEGRATIONS = "SETTINGS$NAV_INTEGRATIONS",
  SETTINGS$NAV_APPLICATION = "SETTINGS$NAV_APPLICATION",
  SETTINGS$NAV_CREDITS = "SETTINGS$NAV_CREDITS",
  SETTINGS$NAV_SECRETS = "SETTINGS$NAV_SECRETS",
@@ -170,10 +170,10 @@ export enum I18nKey {
  GITHUB$TOKEN_LINK_TEXT = "GITHUB$TOKEN_LINK_TEXT",
  GITHUB$INSTRUCTIONS_LINK_TEXT = "GITHUB$INSTRUCTIONS_LINK_TEXT",
  COMMON$HERE = "COMMON$HERE",
-  ANALYTICS$ENABLE = "ANALYTICS$ENABLE",
  GITHUB$TOKEN_INVALID = "GITHUB$TOKEN_INVALID",
  BUTTON$DISCONNECT = "BUTTON$DISCONNECT",
  GITHUB$CONFIGURE_REPOS = "GITHUB$CONFIGURE_REPOS",
+  SLACK$INSTALL_APP = "SLACK$INSTALL_APP",
  COMMON$CLICK_FOR_INSTRUCTIONS = "COMMON$CLICK_FOR_INSTRUCTIONS",
  LLM$SELECT_MODEL_PLACEHOLDER = "LLM$SELECT_MODEL_PLACEHOLDER",
  LLM$MODEL = "LLM$MODEL",
@@ -583,4 +583,13 @@ export enum I18nKey {
  SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE = "SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE",
  SETTINGS$RESEND_VERIFICATION = "SETTINGS$RESEND_VERIFICATION",
  SETTINGS$FAILED_TO_RESEND_VERIFICATION = "SETTINGS$FAILED_TO_RESEND_VERIFICATION",
+  FEEDBACK$RATE_AGENT_PERFORMANCE = "FEEDBACK$RATE_AGENT_PERFORMANCE",
+  FEEDBACK$SELECT_REASON = "FEEDBACK$SELECT_REASON",
+  FEEDBACK$SELECT_REASON_COUNTDOWN = "FEEDBACK$SELECT_REASON_COUNTDOWN",
+  FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION = "FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION",
+  FEEDBACK$REASON_FORGOT_CONTEXT = "FEEDBACK$REASON_FORGOT_CONTEXT",
+  FEEDBACK$REASON_UNNECESSARY_CHANGES = "FEEDBACK$REASON_UNNECESSARY_CHANGES",
+  FEEDBACK$REASON_OTHER = "FEEDBACK$REASON_OTHER",
+  FEEDBACK$THANK_YOU_FOR_FEEDBACK = "FEEDBACK$THANK_YOU_FOR_FEEDBACK",
+  FEEDBACK$FAILED_TO_SUBMIT = "FEEDBACK$FAILED_TO_SUBMIT",
 }
--- a/frontend/src/i18n/translation.json
+++ b/frontend/src/i18n/translation.json
@@ -1279,21 +1279,21 @@
        "de": "Änderungen speichern",
        "uk": "Зберегти зміни"
    },
-    "SETTINGS$NAV_GIT": {
-        "en": "Git",
-        "ja": "Git",
-        "zh-CN": "Git",
-        "zh-TW": "Git",
-        "ko-KR": "Git",
-        "no": "Git",
-        "it": "Git",
-        "pt": "Git",
-        "es": "Git",
-        "ar": "Git",
-        "fr": "Git",
-        "tr": "Git",
-        "de": "Git",
-        "uk": "Git"
+    "SETTINGS$NAV_INTEGRATIONS": {
+        "en": "Integrations",
+        "ja": "統合",
+        "zh-CN": "集成",
+        "zh-TW": "整合",
+        "ko-KR": "통합",
+        "no": "Integrasjoner",
+        "it": "Integrazioni",
+        "pt": "Integrações",
+        "es": "Integraciones",
+        "ar": "التكامل",
+        "fr": "Intégrations",
+        "tr": "Entegrasyonlar",
+        "de": "Integrationen",
+        "uk": "Інтеграції"
    },
    "SETTINGS$NAV_APPLICATION": {
        "en": "Application",
@@ -2719,22 +2719,6 @@
        "de": "Hier",
        "uk": "тут"
    },
-    "ANALYTICS$ENABLE": {
-        "en": "Enable analytics",
-        "ja": "アナリティクスを有効にする",
-        "zh-CN": "启用分析",
-        "zh-TW": "啟用分析功能",
-        "ko-KR": "분석 활성화",
-        "no": "Aktiver analyse",
-        "it": "Abilita analisi",
-        "pt": "Ativar análise",
-        "es": "Habilitar análisis",
-        "ar": "تمكين التحليلات",
-        "fr": "Activer les analyses",
-        "tr": "Analitiği etkinleştir",
-        "de": "Analyse aktivieren",
-        "uk": "Увімкнути аналітику"
-    },
    "GITHUB$TOKEN_INVALID": {
        "en": "Invalid GitHub token",
        "ja": "GitHubトークンが無効です",
@@ -2783,6 +2767,22 @@
        "de": "GitHub-Repositories konfigurieren",
        "uk": "Налаштування репозиторіїв Github"
    },
+    "SLACK$INSTALL_APP": {
+        "en": "Install OpenHands Slack App",
+        "ja": "OpenHands Slackアプリをインストール",
+        "zh-CN": "安装 OpenHands Slack 应用",
+        "zh-TW": "安裝 OpenHands Slack 應用程式",
+        "ko-KR": "OpenHands Slack 앱 설치",
+        "no": "Installer OpenHands Slack-app",
+        "it": "Installa l'app Slack di OpenHands",
+        "pt": "Instalar aplicativo Slack do OpenHands",
+        "es": "Instalar aplicación Slack de OpenHands",
+        "ar": "تثبيت تطبيق OpenHands Slack",
+        "fr": "Installer l'application Slack OpenHands",
+        "tr": "OpenHands Slack uygulamasını yükle",
+        "de": "OpenHands Slack-App installieren",
+        "uk": "Встановити додаток OpenHands Slack"
+    },
    "COMMON$CLICK_FOR_INSTRUCTIONS": {
        "en": "Click here for instructions",
        "ja": "手順はこちらをクリック",
@@ -9326,5 +9326,149 @@
        "tr": "Doğrulama e-postası yeniden gönderilemedi",
        "de": "Bestätigungs-E-Mail konnte nicht erneut gesendet werden",
        "uk": "Не вдалося повторно надіслати лист підтвердження"
+    },
+    "FEEDBACK$RATE_AGENT_PERFORMANCE": {
+        "en": "Rate the agent's performance:",
+        "ja": "エージェントのパフォーマンスを評価してください：",
+        "zh-CN": "评价代理的表现：",
+        "zh-TW": "評價代理的表現：",
+        "ko-KR": "에이전트의 성능을 평가하세요:",
+        "no": "Vurder agentens ytelse:",
+        "it": "Valuta le prestazioni dell'agente:",
+        "pt": "Avalie o desempenho do agente:",
+        "es": "Evalúe el rendimiento del agente:",
+        "ar": "قيم أداء الوكيل:",
+        "fr": "Évaluez la performance de l'agent :",
+        "tr": "Ajanın performansını değerlendirin:",
+        "de": "Bewerten Sie die Leistung des Agenten:",
+        "uk": "Оцініть продуктивність агента:"
+    },
+    "FEEDBACK$SELECT_REASON": {
+        "en": "Select a reason (optional):",
+        "ja": "理由を選択してください（任意）：",
+        "zh-CN": "选择原因（可选）：",
+        "zh-TW": "選擇原因（可選）：",
+        "ko-KR": "이유 선택 (선택 사항):",
+        "no": "Velg en grunn (valgfritt):",
+        "it": "Seleziona un motivo (opzionale):",
+        "pt": "Selecione um motivo (opcional):",
+        "es": "Seleccione un motivo (opcional):",
+        "ar": "حدد سببًا (اختياري):",
+        "fr": "Sélectionnez une raison (facultatif) :",
+        "tr": "Bir neden seçin (isteğe bağlı):",
+        "de": "Wählen Sie einen Grund (optional):",
+        "uk": "Виберіть причину (необов'язково):"
+    },
+    "FEEDBACK$SELECT_REASON_COUNTDOWN": {
+        "en": "Auto-submitting in {{countdown}} seconds...",
+        "ja": "{{countdown}}秒後に自動送信されます...",
+        "zh-CN": "{{countdown}}秒后自动提交...",
+        "zh-TW": "{{countdown}}秒後自動提交...",
+        "ko-KR": "{{countdown}}초 후 자동 제출...",
+        "no": "Sender automatisk om {{countdown}} sekunder...",
+        "it": "Invio automatico tra {{countdown}} secondi...",
+        "pt": "Enviando automaticamente em {{countdown}} segundos...",
+        "es": "Enviando automáticamente en {{countdown}} segundos...",
+        "ar": "الإرسال التلقائي خلال {{countdown}} ثانية...",
+        "fr": "Envoi automatique dans {{countdown}} secondes...",
+        "tr": "{{countdown}} saniye içinde otomatik gönderilecek...",
+        "de": "Automatische Übermittlung in {{countdown}} Sekunden...",
+        "uk": "Автоматична відправка через {{countdown}} секунд..."
+    },
+    "FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION": {
+        "en": "The agent misunderstood my instruction",
+        "ja": "エージェントは私の指示を誤解しました",
+        "zh-CN": "代理误解了我的指示",
+        "zh-TW": "代理誤解了我的指示",
+        "ko-KR": "에이전트가 내 지시를 잘못 이해했습니다",
+        "no": "Agenten misforsto instruksjonene mine",
+        "it": "L'agente ha frainteso le mie istruzioni",
+        "pt": "O agente não entendeu minhas instruções",
+        "es": "El agente malinterpretó mis instrucciones",
+        "ar": "أساء الوكيل فهم تعليماتي",
+        "fr": "L'agent a mal compris mes instructions",
+        "tr": "Ajan talimatlarımı yanlış anladı",
+        "de": "Der Agent hat meine Anweisungen missverstanden",
+        "uk": "Агент неправильно зрозумів мої інструкції"
+    },
+    "FEEDBACK$REASON_FORGOT_CONTEXT": {
+        "en": "The agent forgot about the earlier context",
+        "ja": "エージェントは以前のコンテキストを忘れました",
+        "zh-CN": "代理忘记了之前的上下文",
+        "zh-TW": "代理忘記了之前的上下文",
+        "ko-KR": "에이전트가 이전 컨텍스트를 잊었습니다",
+        "no": "Agenten glemte den tidligere konteksten",
+        "it": "L'agente ha dimenticato il contesto precedente",
+        "pt": "O agente esqueceu o contexto anterior",
+        "es": "El agente olvidó el contexto anterior",
+        "ar": "نسي الوكيل السياق السابق",
+        "fr": "L'agent a oublié le contexte précédent",
+        "tr": "Ajan önceki bağlamı unuttu",
+        "de": "Der Agent hat den früheren Kontext vergessen",
+        "uk": "Агент забув про попередній контекст"
+    },
+    "FEEDBACK$REASON_UNNECESSARY_CHANGES": {
+        "en": "The agent made unnecessary changes",
+        "ja": "エージェントは不要な変更を行いました",
+        "zh-CN": "代理进行了不必要的更改",
+        "zh-TW": "代理進行了不必要的更改",
+        "ko-KR": "에이전트가 불필요한 변경을 했습니다",
+        "no": "Agenten gjorde unødvendige endringer",
+        "it": "L'agente ha apportato modifiche non necessarie",
+        "pt": "O agente fez alterações desnecessárias",
+        "es": "El agente hizo cambios innecesarios",
+        "ar": "قام الوكيل بتغييرات غير ضرورية",
+        "fr": "L'agent a apporté des modifications inutiles",
+        "tr": "Ajan gereksiz değişiklikler yaptı",
+        "de": "Der Agent hat unnötige Änderungen vorgenommen",
+        "uk": "Агент зробив непотрібні зміни"
+    },
+    "FEEDBACK$REASON_OTHER": {
+        "en": "Other",
+        "ja": "その他",
+        "zh-CN": "其他",
+        "zh-TW": "其他",
+        "ko-KR": "기타",
+        "no": "Annet",
+        "it": "Altro",
+        "pt": "Outro",
+        "es": "Otro",
+        "ar": "أخرى",
+        "fr": "Autre",
+        "tr": "Diğer",
+        "de": "Andere",
+        "uk": "Інше"
+    },
+    "FEEDBACK$THANK_YOU_FOR_FEEDBACK": {
+        "en": "Thank you for your feedback! This will help us improve OpenHands going forward.",
+        "ja": "フィードバックをありがとうございます！これにより、今後OpenHandsを改善していくことができます。",
+        "zh-CN": "感谢您的反馈！这将帮助我们改进OpenHands。",
+        "zh-TW": "感謝您的反饋！這將幫助我們改進OpenHands。",
+        "ko-KR": "피드백 감사합니다! 이를 통해 OpenHands를 개선해 나가겠습니다.",
+        "no": "Takk for tilbakemeldingen! Dette vil hjelpe oss med å forbedre OpenHands fremover.",
+        "it": "Grazie per il tuo feedback! Questo ci aiuterà a migliorare OpenHands in futuro.",
+        "pt": "Obrigado pelo seu feedback! Isso nos ajudará a melhorar o OpenHands no futuro.",
+        "es": "¡Gracias por su comentario! Esto nos ayudará a mejorar OpenHands en el futuro.",
+        "ar": "شكرا على ملاحظاتك! سيساعدنا هذا في تحسين OpenHands في المستقبل.",
+        "fr": "Merci pour votre retour ! Cela nous aidera à améliorer OpenHands à l'avenir.",
+        "tr": "Geri bildiriminiz için teşekkürler! Bu, OpenHands'i ileride geliştirmemize yardımcı olacak.",
+        "de": "Vielen Dank für Ihr Feedback! Das hilft uns, OpenHands in Zukunft zu verbessern.",
+        "uk": "Дякуємо за ваш відгук! Це допоможе нам покращити OpenHands у майбутньому."
+    },
+    "FEEDBACK$FAILED_TO_SUBMIT": {
+        "en": "Failed to submit feedback",
+        "ja": "フィードバックの送信に失敗しました",
+        "zh-CN": "提交反馈失败",
+        "zh-TW": "提交反饋失敗",
+        "ko-KR": "피드백 제출 실패",
+        "no": "Kunne ikke sende tilbakemelding",
+        "it": "Impossibile inviare feedback",
+        "pt": "Falha ao enviar feedback",
+        "es": "Error al enviar comentarios",
+        "ar": "فشل في تقديم التعليقات",
+        "fr": "Échec de l'envoi des commentaires",
+        "tr": "Geri bildirim gönderilemedi",
+        "de": "Feedback konnte nicht gesendet werden",
+        "uk": "Не вдалося надіслати відгук"
    }
 }
--- a/frontend/src/routes.ts
+++ b/frontend/src/routes.ts
@@ -13,7 +13,7 @@ export default [
      index("routes/llm-settings.tsx"),
      route("mcp", "routes/mcp-settings.tsx"),
      route("user", "routes/user-settings.tsx"),
-      route("git", "routes/git-settings.tsx"),
+      route("integrations", "routes/git-settings.tsx"),
      route("app", "routes/app-settings.tsx"),
      route("billing", "routes/billing.tsx"),
      route("secrets", "routes/secrets-settings.tsx"),
--- a/frontend/src/routes/app-settings.tsx
+++ b/frontend/src/routes/app-settings.tsx
@@ -139,7 +139,7 @@ function AppSettingsScreen() {
            defaultIsToggled={!!settings.USER_CONSENTS_TO_ANALYTICS}
            onToggle={checkIfAnalyticsSwitchHasChanged}
          >
-            {t(I18nKey.ANALYTICS$ENABLE)}
+            {t(I18nKey.ANALYTICS$SEND_ANONYMOUS_DATA)}
          </SettingsSwitch>

          <SettingsSwitch
--- a/frontend/src/routes/git-settings.tsx
+++ b/frontend/src/routes/git-settings.tsx
@@ -7,6 +7,7 @@ import { useLogout } from "#/hooks/mutation/use-logout";
 import { GitHubTokenInput } from "#/components/features/settings/git-settings/github-token-input";
 import { GitLabTokenInput } from "#/components/features/settings/git-settings/gitlab-token-input";
 import { ConfigureGitHubRepositoriesAnchor } from "#/components/features/settings/git-settings/configure-github-repositories-anchor";
+import { InstallSlackAppAnchor } from "#/components/features/settings/git-settings/install-slack-app-anchor";
 import { I18nKey } from "#/i18n/declaration";
 import {
  displayErrorToast,
@@ -103,6 +104,10 @@ function GitSettingsScreen() {
            <ConfigureGitHubRepositoriesAnchor slug={config.APP_SLUG!} />
          )}

+          {shouldRenderExternalConfigureButtons && !isLoading && (
+            <InstallSlackAppAnchor />
+          )}
+
          {!isSaas && (
            <GitHubTokenInput
              name="github-token-input"
--- a/frontend/src/routes/secrets-settings.tsx
+++ b/frontend/src/routes/secrets-settings.tsx
@@ -84,7 +84,11 @@ function SecretsSettingsScreen() {
      )}

      {shouldRenderConnectToGitButton && (
-        <Link to="/settings/git" data-testid="connect-git-button" type="button">
+        <Link
+          to="/settings/integrations"
+          data-testid="connect-git-button"
+          type="button"
+        >
          <BrandButton type="button" variant="secondary">
            Connect a Git provider to manage secrets
          </BrandButton>
--- a/frontend/src/routes/settings.tsx
+++ b/frontend/src/routes/settings.tsx
@@ -16,7 +16,7 @@ function SettingsScreen() {

  const saasNavItems = [
    { to: "/settings/user", text: t("SETTINGS$NAV_USER") },
-    { to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
+    { to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
    { to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
    { to: "/settings/billing", text: t("SETTINGS$NAV_CREDITS") },
    { to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
@@ -26,7 +26,7 @@ function SettingsScreen() {
  const ossNavItems = [
    { to: "/settings", text: t("SETTINGS$NAV_LLM") },
    { to: "/settings/mcp", text: t("SETTINGS$NAV_MCP") },
-    { to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
+    { to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
    { to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
    { to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
  ];
--- a/openhands/agenthub/browsing_agent/browsing_agent.py
+++ b/openhands/agenthub/browsing_agent/browsing_agent.py
@@ -125,9 +125,9 @@ class BrowsingAgent(Agent):
        self.reset()

    def reset(self) -> None:
-        """Resets the Browsing Agent."""
+        """Resets the Browsing Agent's internal state."""
        super().reset()
-        self.cost_accumulator = 0
+        # Reset agent-specific counters but not LLM metrics
        self.error_accumulator = 0

    def step(self, state: State) -> Action:
--- a/openhands/agenthub/codeact_agent/codeact_agent.py
+++ b/openhands/agenthub/codeact_agent/codeact_agent.py
@@ -136,8 +136,9 @@ class CodeActAgent(Agent):
        return tools

    def reset(self) -> None:
-        """Resets the CodeAct Agent."""
+        """Resets the CodeAct Agent's internal state."""
        super().reset()
+        # Only clear pending actions, not LLM metrics
        self.pending_actions.clear()

    def step(self, state: State) -> 'Action':
--- a/openhands/agenthub/dummy_agent/agent.py
+++ b/openhands/agenthub/dummy_agent/agent.py
@@ -119,14 +119,14 @@ class DummyAgent(Agent):
        ]

    def step(self, state: State) -> Action:
-        if state.iteration >= len(self.steps):
+        if state.iteration_flag.current_value >= len(self.steps):
            return AgentFinishAction()

-        current_step = self.steps[state.iteration]
+        current_step = self.steps[state.iteration_flag.current_value]
        action = current_step['action']

-        if state.iteration > 0:
-            prev_step = self.steps[state.iteration - 1]
+        if state.iteration_flag.current_value > 0:
+            prev_step = self.steps[state.iteration_flag.current_value - 1]

            if 'observations' in prev_step and prev_step['observations']:
                expected_observations = prev_step['observations']
--- a/openhands/agenthub/visualbrowsing_agent/visualbrowsing_agent.py
+++ b/openhands/agenthub/visualbrowsing_agent/visualbrowsing_agent.py
@@ -176,9 +176,9 @@ Note:
        self.reset()

    def reset(self) -> None:
-        """Resets the VisualBrowsingAgent."""
+        """Resets the VisualBrowsingAgent's internal state."""
        super().reset()
-        self.cost_accumulator = 0
+        # Reset agent-specific counters but not LLM metrics
        self.error_accumulator = 0

    def step(self, state: State) -> Action:
--- a/openhands/cli/main.py
+++ b/openhands/cli/main.py
@@ -8,6 +8,7 @@ from prompt_toolkit.formatted_text import HTML
 from prompt_toolkit.shortcuts import clear

 import openhands.agenthub  # noqa F401 (we import this to get the agents registered)
+import openhands.cli.suppress_warnings  # noqa: F401
 from openhands.cli.commands import (
    check_folder_security_agreement,
    handle_commands,
@@ -273,9 +274,9 @@ async def run_session(
            )
        )

-        config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
+        runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)

-        await add_mcp_tools_to_agent(agent, runtime, memory, config)
+        await add_mcp_tools_to_agent(agent, runtime, memory)

    # Clear loading animation
    is_loaded.set()
--- a/openhands/cli/suppress_warnings.py
+++ b/openhands/cli/suppress_warnings.py
@@ -0,0 +1,10 @@
+"""Module to suppress common warnings."""
+
+import warnings
+
+# Suppress pydub warning about ffmpeg/avconv
+warnings.filterwarnings(
+    'ignore',
+    message="Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work",
+    category=RuntimeWarning,
+)
--- a/openhands/controller/agent.py
+++ b/openhands/controller/agent.py
@@ -103,16 +103,10 @@ class Agent(ABC):
        pass

    def reset(self) -> None:
-        """Resets the agent's execution status and clears the history. This method can be used
-        to prepare the agent for restarting the instruction or cleaning up before destruction.
-
-        """
-        # TODO clear history
+        """Resets the agent's execution status."""
+        # Only reset the completion status, not the LLM metrics
        self._complete = False

-        if self.llm:
-            self.llm.reset()
-
    @property
    def name(self) -> str:
        return self.__class__.__name__
--- a/openhands/controller/agent_controller.py
+++ b/openhands/controller/agent_controller.py
@@ -7,7 +7,6 @@ import time
 import traceback
 from typing import Callable

-import litellm  # noqa
 from litellm.exceptions import (  # noqa
    APIConnectionError,
    APIError,
@@ -25,7 +24,8 @@ from litellm.exceptions import (  # noqa

 from openhands.controller.agent import Agent
 from openhands.controller.replay import ReplayManager
-from openhands.controller.state.state import State, TrafficControlState
+from openhands.controller.state.state import State
+from openhands.controller.state.state_tracker import StateTracker
 from openhands.controller.stuck import StuckDetector
 from openhands.core.config import AgentConfig, LLMConfig
 from openhands.core.exceptions import (
@@ -61,7 +61,6 @@ from openhands.events.action import (
 )
 from openhands.events.action.agent import CondensationAction, RecallAction
 from openhands.events.event import Event
-from openhands.events.event_filter import EventFilter
 from openhands.events.observation import (
    AgentDelegateObservation,
    AgentStateChangedObservation,
@@ -69,10 +68,11 @@ from openhands.events.observation import (
    NullObservation,
    Observation,
 )
-from openhands.events.serialization.event import event_to_trajectory, truncate_content
+from openhands.events.serialization.event import truncate_content
 from openhands.llm.llm import LLM
 from openhands.llm.metrics import Metrics, TokenUsage
 from openhands.memory.view import View
+from openhands.storage.files import FileStore

 # note: RESUME is only available on web GUI
 TRAFFIC_CONTROL_REMINDER = (
@@ -101,11 +101,13 @@ class AgentController:
        self,
        agent: Agent,
        event_stream: EventStream,
-        max_iterations: int,
-        max_budget_per_task: float | None = None,
+        iteration_delta: int,
+        budget_per_task_delta: float | None = None,
        agent_to_llm_config: dict[str, LLMConfig] | None = None,
        agent_configs: dict[str, AgentConfig] | None = None,
        sid: str | None = None,
+        file_store: FileStore | None = None,
+        user_id: str | None = None,
        confirmation_mode: bool = False,
        initial_state: State | None = None,
        is_delegate: bool = False,
@@ -132,7 +134,10 @@ class AgentController:
            status_callback: Optional callback function to handle status updates.
            replay_events: A list of logs to replay.
        """
+
        self.id = sid or event_stream.sid
+        self.user_id = user_id
+        self.file_store = file_store
        self.agent = agent
        self.headless_mode = headless_mode
        self.is_delegate = is_delegate
@@ -146,29 +151,22 @@ class AgentController:
                EventStreamSubscriber.AGENT_CONTROLLER, self.on_event, self.id
            )

-        # filter out events that are not relevant to the agent
-        # so they will not be included in the agent history
-        self.agent_history_filter = EventFilter(
-            exclude_types=(
-                NullAction,
-                NullObservation,
-                ChangeAgentStateAction,
-                AgentStateChangedObservation,
-            ),
-            exclude_hidden=True,
-        )
+        self.state_tracker = StateTracker(sid, file_store, user_id)

        # state from the previous session, state from a parent agent, or a fresh state
        self.set_initial_state(
            state=initial_state,
-            max_iterations=max_iterations,
+            max_iterations=iteration_delta,
+            max_budget_per_task=budget_per_task_delta,
            confirmation_mode=confirmation_mode,
        )
-        self.max_budget_per_task = max_budget_per_task
+
+        self.state = self.state_tracker.state  # TODO: share between manager and controller for backward compatability; we should ideally move all state related logic to the state manager
+
        self.agent_to_llm_config = agent_to_llm_config if agent_to_llm_config else {}
        self.agent_configs = agent_configs if agent_configs else {}
-        self._initial_max_iterations = max_iterations
-        self._initial_max_budget_per_task = max_budget_per_task
+        self._initial_max_iterations = iteration_delta
+        self._initial_max_budget_per_task = budget_per_task_delta

        # stuck helper
        self._stuck_detector = StuckDetector(self.state)
@@ -181,7 +179,7 @@ class AgentController:
        self._add_system_message()

    def _add_system_message(self):
-        for event in self.event_stream.get_events(start_id=self.state.start_id):
+        for event in self.event_stream.search_events(start_id=self.state.start_id):
            if isinstance(event, MessageAction) and event.source == EventSource.USER:
                # FIXME: Remove this after 6/1/2025
                # Do not try to add a system message if we first run into
@@ -214,26 +212,7 @@ class AgentController:
        if set_stop_state:
            await self.set_agent_state_to(AgentState.STOPPED)

-        # we made history, now is the time to rewrite it!
-        # the final state.history will be used by external scripts like evals, tests, etc.
-        # history will need to be complete WITH delegates events
-        # like the regular agent history, it does not include:
-        # - 'hidden' events, events with hidden=True
-        # - backend events (the default 'filtered out' types, types in self.filter_out)
-        start_id = self.state.start_id if self.state.start_id >= 0 else 0
-        end_id = (
-            self.state.end_id
-            if self.state.end_id >= 0
-            else self.event_stream.get_latest_event_id()
-        )
-        self.state.history = list(
-            self.event_stream.search_events(
-                start_id=start_id,
-                end_id=end_id,
-                reverse=False,
-                filter=self.agent_history_filter,
-            )
-        )
+        self.state_tracker.close(self.event_stream)

        # unsubscribe from the event stream
        # only the root parent controller subscribes to the event stream
@@ -257,14 +236,6 @@ class AgentController:
        extra_merged = {'session_id': self.id, **extra}
        getattr(logger, level)(message, extra=extra_merged, stacklevel=2)

-    def update_state_before_step(self) -> None:
-        self.state.iteration += 1
-        self.state.local_iteration += 1
-
-    async def update_state_after_step(self) -> None:
-        # update metrics especially for cost. Use deepcopy to avoid it being modified by agent._reset()
-        self.state.local_metrics = copy.deepcopy(self.agent.llm.metrics)
-
    async def _react_to_exception(
        self,
        e: Exception,
@@ -390,10 +361,17 @@ class AgentController:
        # If we have a delegate that is not finished or errored, forward events to it
        if self.delegate is not None:
            delegate_state = self.delegate.get_agent_state()
-            if delegate_state not in (
-                AgentState.FINISHED,
-                AgentState.ERROR,
-                AgentState.REJECTED,
+            if (
+                delegate_state
+                not in (
+                    AgentState.FINISHED,
+                    AgentState.ERROR,
+                    AgentState.REJECTED,
+                )
+                or 'RuntimeError: Agent reached maximum iteration.'
+                in self.delegate.state.last_error
+                or 'RuntimeError:Agent reached maximum budget for conversation'
+                in self.delegate.state.last_error
            ):
                # Forward the event to delegate and skip parent processing
                asyncio.get_event_loop().run_until_complete(
@@ -412,9 +390,7 @@ class AgentController:
        if hasattr(event, 'hidden') and event.hidden:
            return

-        # if the event is not filtered out, add it to the history
-        if self.agent_history_filter.include(event):
-            self.state.history.append(event)
+        self.state_tracker.add_history(event)

        if isinstance(event, Action):
            await self._handle_action(event)
@@ -457,11 +433,9 @@ class AgentController:

        elif isinstance(action, AgentFinishAction):
            self.state.outputs = action.outputs
-            self.state.metrics.merge(self.state.local_metrics)
            await self.set_agent_state_to(AgentState.FINISHED)
        elif isinstance(action, AgentRejectAction):
            self.state.outputs = action.outputs
-            self.state.metrics.merge(self.state.local_metrics)
            await self.set_agent_state_to(AgentState.REJECTED)

    async def _handle_observation(self, observation: Observation) -> None:
@@ -481,8 +455,10 @@ class AgentController:
            log_level, str(observation_to_print), extra={'msg_type': 'OBSERVATION'}
        )

+        # TODO: these metrics come from the draft editor, and they get accumulated into controller's state metrics and the agent's llm metrics
+        # In the future, we should have a more principled way to sharing metrics across all LLM instances for a given conversation
        if observation.llm_metrics is not None:
-            self.agent.llm.metrics.merge(observation.llm_metrics)
+            self.state_tracker.merge_metrics(observation.llm_metrics)

        # this happens for runnable actions and microagent actions
        if self._pending_action and self._pending_action.id == observation.cause:
@@ -496,9 +472,6 @@ class AgentController:
            if self.state.agent_state == AgentState.USER_REJECTED:
                await self.set_agent_state_to(AgentState.AWAITING_USER_INPUT)
            return
-        elif isinstance(observation, ErrorObservation):
-            if self.state.agent_state == AgentState.ERROR:
-                self.state.metrics.merge(self.state.local_metrics)

    async def _handle_message_action(self, action: MessageAction) -> None:
        """Handles message actions from the event stream.
@@ -516,22 +489,6 @@ class AgentController:
                str(action),
                extra={'msg_type': 'ACTION', 'event_source': EventSource.USER},
            )
-            # Extend max iterations when the user sends a message (only in non-headless mode)
-            if self._initial_max_iterations is not None and not self.headless_mode:
-                self.state.max_iterations = (
-                    self.state.iteration + self._initial_max_iterations
-                )
-                if (
-                    self.state.traffic_control_state == TrafficControlState.THROTTLING
-                    or self.state.traffic_control_state == TrafficControlState.PAUSED
-                ):
-                    self.state.traffic_control_state = TrafficControlState.NORMAL
-                self.log(
-                    'debug',
-                    f'Extended max iterations to {self.state.max_iterations} after user message',
-                )
-            # try to retrieve microagents relevant to the user message
-            # set pending_action while we search for information

            # if this is the first user message for this agent, matters for the microagent info type
            first_user_message = self._first_user_message()
@@ -605,36 +562,16 @@ class AgentController:
            return

        if new_state in (AgentState.STOPPED, AgentState.ERROR):
-            # sync existing metrics BEFORE resetting the agent
-            await self.update_state_after_step()
-            self.state.metrics.merge(self.state.local_metrics)
            self._reset()
-        elif (
-            new_state == AgentState.RUNNING
-            and self.state.agent_state == AgentState.PAUSED
-            # TODO: do we really need both THROTTLING and PAUSED states, or can we clean up one of them completely?
-            and self.state.traffic_control_state == TrafficControlState.THROTTLING
-        ):
-            # user intends to interrupt traffic control and let the task resume temporarily
-            self.state.traffic_control_state = TrafficControlState.PAUSED
-            # User has chosen to deliberately continue - lets double the max iterations
-            if (
-                self.state.iteration is not None
-                and self.state.max_iterations is not None
-                and self._initial_max_iterations is not None
-                and not self.headless_mode
-            ):
-                if self.state.iteration >= self.state.max_iterations:
-                    self.state.max_iterations += self._initial_max_iterations

-            if (
-                self.state.metrics.accumulated_cost is not None
-                and self.max_budget_per_task is not None
-                and self._initial_max_budget_per_task is not None
-            ):
-                if self.state.metrics.accumulated_cost >= self.max_budget_per_task:
-                    self.max_budget_per_task += self._initial_max_budget_per_task
-        elif self._pending_action is not None and (
+        # User is allowing to check control limits and expand them if applicable
+        if (
+            self.state.agent_state == AgentState.ERROR
+            and new_state == AgentState.RUNNING
+        ):
+            self.state_tracker.maybe_increase_control_flags_limits(self.headless_mode)
+
+        if self._pending_action is not None and (
            new_state in (AgentState.USER_CONFIRMED, AgentState.USER_REJECTED)
        ):
            if hasattr(self._pending_action, 'thought'):
@@ -659,6 +596,10 @@ class AgentController:
            EventSource.ENVIRONMENT,
        )

+        # Save state whenever agent state changes to ensure we don't lose state
+        # in case of crashes or unexpected circumstances
+        self.save_state()
+
    def get_agent_state(self) -> AgentState:
        """Returns the current state of the agent.

@@ -686,19 +627,27 @@ class AgentController:
        agent_cls: type[Agent] = Agent.get_cls(action.agent)
        agent_config = self.agent_configs.get(action.agent, self.agent.config)
        llm_config = self.agent_to_llm_config.get(action.agent, self.agent.llm.config)
-        llm = LLM(config=llm_config, retry_listener=self._notify_on_llm_retry)
+        # Make sure metrics are shared between parent and child for global accumulation
+        llm = LLM(
+            config=llm_config,
+            retry_listener=self.agent.llm.retry_listener,
+            metrics=self.state.metrics,
+        )
        delegate_agent = agent_cls(llm=llm, config=agent_config)
+
+        # Take a snapshot of the current metrics before starting the delegate
        state = State(
            session_id=self.id.removesuffix('-delegate'),
            inputs=action.inputs or {},
-            local_iteration=0,
-            iteration=self.state.iteration,
-            max_iterations=self.state.max_iterations,
+            iteration_flag=self.state.iteration_flag,
+            budget_flag=self.state.budget_flag,
            delegate_level=self.state.delegate_level + 1,
            # global metrics should be shared between parent and child
            metrics=self.state.metrics,
            # start on top of the stream
            start_id=self.event_stream.get_latest_event_id() + 1,
+            parent_metrics_snapshot=self.state_tracker.get_metrics_snapshot(),
+            parent_iteration=self.state.iteration_flag.current_value,
        )
        self.log(
            'debug',
@@ -708,10 +657,12 @@ class AgentController:
        # Create the delegate with is_delegate=True so it does NOT subscribe directly
        self.delegate = AgentController(
            sid=self.id + '-delegate',
+            file_store=self.file_store,
+            user_id=self.user_id,
            agent=delegate_agent,
            event_stream=self.event_stream,
-            max_iterations=self.state.max_iterations,
-            max_budget_per_task=self.max_budget_per_task,
+            iteration_delta=self._initial_max_iterations,
+            budget_per_task_delta=self._initial_max_budget_per_task,
            agent_to_llm_config=self.agent_to_llm_config,
            agent_configs=self.agent_configs,
            initial_state=state,
@@ -730,7 +681,13 @@ class AgentController:
        delegate_state = self.delegate.get_agent_state()

        # update iteration that is shared across agents
-        self.state.iteration = self.delegate.state.iteration
+        self.state.iteration_flag.current_value = (
+            self.delegate.state.iteration_flag.current_value
+        )
+
+        # Calculate delegate-specific metrics before closing the delegate
+        delegate_metrics = self.state.get_local_metrics()
+        logger.info(f'Local metrics for delegate: {delegate_metrics}')

        # close the delegate controller before adding new events
        asyncio.get_event_loop().run_until_complete(self.delegate.close())
@@ -743,8 +700,12 @@ class AgentController:

            # prepare delegate result observation
            # TODO: replace this with AI-generated summary (#2395)
+            # Filter out metrics from the formatted output to avoid clutter
+            display_outputs = {
+                k: v for k, v in delegate_outputs.items() if k != 'metrics'
+            }
            formatted_output = ', '.join(
-                f'{key}: {value}' for key, value in delegate_outputs.items()
+                f'{key}: {value}' for key, value in display_outputs.items()
            )
            content = (
                f'{self.delegate.agent.name} finishes task with {formatted_output}'
@@ -798,24 +759,16 @@ class AgentController:

        self.log(
            'debug',
-            f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.local_iteration} GLOBAL STEP {self.state.iteration}',
+            f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.get_local_step()} GLOBAL STEP {self.state.iteration_flag.current_value}',
            extra={'msg_type': 'STEP'},
        )

-        stop_step = False
-        if self.state.iteration >= self.state.max_iterations:
-            stop_step = await self._handle_traffic_control(
-                'iteration', self.state.iteration, self.state.max_iterations
-            )
-        if self.max_budget_per_task is not None:
-            current_cost = self.state.metrics.accumulated_cost
-            if current_cost > self.max_budget_per_task:
-                stop_step = await self._handle_traffic_control(
-                    'budget', current_cost, self.max_budget_per_task
-                )
-        if stop_step:
-            logger.warning('Stopping agent due to traffic control')
-            return
+        # Ensure budget control flag is synchronized with the latest metrics.
+        # In the future, we should centralized the use of one LLM object per conversation.
+        # This will help us unify the cost for auto generating titles, running the condensor, etc.
+        # Before many microservices will touh the same llm cost field, we should sync with the budget flag for the controller
+        # and check that we haven't exceeded budget BEFORE executing an agent step.
+        self.state_tracker.sync_budget_flag_with_metrics()

        if self._is_stuck():
            await self._react_to_exception(
@@ -823,7 +776,13 @@ class AgentController:
            )
            return

-        self.update_state_before_step()
+        try:
+            self.state_tracker.run_control_flags()
+        except Exception as e:
+            logger.warning('Control flag limits hit')
+            await self._react_to_exception(e)
+            return
+
        action: Action = NullAction()

        if self._replay_manager.should_replay():
@@ -894,60 +853,9 @@ class AgentController:

            self.event_stream.add_event(action, action._source)  # type: ignore [attr-defined]

-        await self.update_state_after_step()
-
        log_level = 'info' if LOG_ALL_EVENTS else 'debug'
        self.log(log_level, str(action), extra={'msg_type': 'ACTION'})

-    def _notify_on_llm_retry(self, retries: int, max: int) -> None:
-        if self.status_callback is not None:
-            msg_id = 'STATUS$LLM_RETRY'
-            self.status_callback(
-                'info', msg_id, f'Retrying LLM request, {retries} / {max}'
-            )
-
-    async def _handle_traffic_control(
-        self, limit_type: str, current_value: float, max_value: float
-    ) -> bool:
-        """Handles agent state after hitting the traffic control limit.
-
-        Args:
-            limit_type (str): The type of limit that was hit.
-            current_value (float): The current value of the limit.
-            max_value (float): The maximum value of the limit.
-        """
-        stop_step = False
-        if self.state.traffic_control_state == TrafficControlState.PAUSED:
-            self.log(
-                'debug', 'Hitting traffic control, temporarily resume upon user request'
-            )
-            self.state.traffic_control_state = TrafficControlState.NORMAL
-        else:
-            self.state.traffic_control_state = TrafficControlState.THROTTLING
-            # Format values as integers for iterations, keep decimals for budget
-            if limit_type == 'iteration':
-                current_str = str(int(current_value))
-                max_str = str(int(max_value))
-            else:
-                current_str = f'{current_value:.2f}'
-                max_str = f'{max_value:.2f}'
-
-            if self.headless_mode:
-                e = RuntimeError(
-                    f'Agent reached maximum {limit_type} in headless mode. '
-                    f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}'
-                )
-                await self._react_to_exception(e)
-            else:
-                e = RuntimeError(
-                    f'Agent reached maximum {limit_type}. '
-                    f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}. '
-                )
-                # FIXME: this isn't really an exception--we should have a different path
-                await self._react_to_exception(e)
-            stop_step = True
-        return stop_step
-
    @property
    def _pending_action(self) -> Action | None:
        """Get the current pending action with time tracking.
@@ -1015,150 +923,26 @@ class AgentController:
        self,
        state: State | None,
        max_iterations: int,
+        max_budget_per_task: float | None,
        confirmation_mode: bool = False,
-    ) -> None:
-        """Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
-
-        Args:
-            state: The state to initialize with, or None to create a new state.
-            max_iterations: The maximum number of iterations allowed for the task.
-            confirmation_mode: Whether to enable confirmation mode.
-        """
-        # state can come from:
-        # - the previous session, in which case it has history
-        # - from a parent agent, in which case it has no history
-        # - None / a new state
-
-        # If state is None, we create a brand new state and still load the event stream so we can restore the history
-        if state is None:
-            self.state = State(
-                session_id=self.id.removesuffix('-delegate'),
-                inputs={},
-                max_iterations=max_iterations,
-                confirmation_mode=confirmation_mode,
-            )
-            self.state.start_id = 0
-
-            self.log(
-                'info',
-                f'AgentController {self.id} - created new state. start_id: {self.state.start_id}',
-            )
-        else:
-            self.state = state
-
-            if self.state.start_id <= -1:
-                self.state.start_id = 0
-
-            self.log(
-                'info',
-                f'AgentController {self.id} initializing history from event {self.state.start_id}',
-            )
-
+    ):
+        self.state_tracker.set_initial_state(
+            self.id,
+            self.agent,
+            state,
+            max_iterations,
+            max_budget_per_task,
+            confirmation_mode,
+        )
        # Always load from the event stream to avoid losing history
-        self._init_history()
+        self.state_tracker._init_history(
+            self.event_stream,
+        )

    def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
        # state history could be partially hidden/truncated before controller is closed
        assert self._closed
-        return [
-            event_to_trajectory(event, include_screenshots)
-            for event in self.state.history
-        ]
-
-    def _init_history(self) -> None:
-        """Initializes the agent's history from the event stream.
-
-        The history is a list of events that:
-        - Excludes events of types listed in self.filter_out
-        - Excludes events with hidden=True attribute
-        - For delegate events (between AgentDelegateAction and AgentDelegateObservation):
-            - Excludes all events between the action and observation
-            - Includes the delegate action and observation themselves
-        """
-        # define range of events to fetch
-        # delegates start with a start_id and initially won't find any events
-        # otherwise we're restoring a previous session
-        start_id = self.state.start_id if self.state.start_id >= 0 else 0
-        end_id = (
-            self.state.end_id
-            if self.state.end_id >= 0
-            else self.event_stream.get_latest_event_id()
-        )
-
-        # sanity check
-        if start_id > end_id + 1:
-            self.log(
-                'warning',
-                f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
-            )
-            self.state.history = []
-            return
-
-        events: list[Event] = []
-
-        # Get rest of history
-        events_to_add = list(
-            self.event_stream.search_events(
-                start_id=start_id,
-                end_id=end_id,
-                reverse=False,
-                filter=self.agent_history_filter,
-            )
-        )
-        events.extend(events_to_add)
-
-        # Find all delegate action/observation pairs
-        delegate_ranges: list[tuple[int, int]] = []
-        delegate_action_ids: list[int] = []  # stack of unmatched delegate action IDs
-
-        for event in events:
-            if isinstance(event, AgentDelegateAction):
-                delegate_action_ids.append(event.id)
-                # Note: we can get agent=event.agent and task=event.inputs.get('task','')
-                # if we need to track these in the future
-
-            elif isinstance(event, AgentDelegateObservation):
-                # Match with most recent unmatched delegate action
-                if not delegate_action_ids:
-                    self.log(
-                        'warning',
-                        f'Found AgentDelegateObservation without matching action at id={event.id}',
-                    )
-                    continue
-
-                action_id = delegate_action_ids.pop()
-                delegate_ranges.append((action_id, event.id))
-
-        # Filter out events between delegate action/observation pairs
-        if delegate_ranges:
-            filtered_events: list[Event] = []
-            current_idx = 0
-
-            for start_id, end_id in sorted(delegate_ranges):
-                # Add events before delegate range
-                filtered_events.extend(
-                    event for event in events[current_idx:] if event.id < start_id
-                )
-
-                # Add delegate action and observation
-                filtered_events.extend(
-                    event for event in events if event.id in (start_id, end_id)
-                )
-
-                # Update index to after delegate range
-                current_idx = next(
-                    (i for i, e in enumerate(events) if e.id > end_id), len(events)
-                )
-
-            # Add any remaining events after last delegate range
-            filtered_events.extend(events[current_idx:])
-
-            self.state.history = filtered_events
-        else:
-            self.state.history = events
-
-        # make sure history is in sync
-        self.state.start_id = start_id
+        return self.state_tracker.get_trajectory(include_screenshots)

    def _handle_long_context_error(self) -> None:
        # When context window is exceeded, keep roughly half of agent interactions
@@ -1359,7 +1143,7 @@ class AgentController:
            action: The action to attach metrics to
        """
        # Get metrics from agent LLM
-        agent_metrics = self.agent.llm.metrics
+        agent_metrics = self.state.metrics

        # Get metrics from condenser LLM if it exists
        condenser_metrics: TokenUsage | None = None
@@ -1390,10 +1174,10 @@ class AgentController:
        # Log the metrics information for debugging
        # Get the latest usage directly from the agent's metrics
        latest_usage = None
-        if self.agent.llm.metrics.token_usages:
-            latest_usage = self.agent.llm.metrics.token_usages[-1]
+        if self.state.metrics.token_usages:
+            latest_usage = self.state.metrics.token_usages[-1]

-        accumulated_usage = self.agent.llm.metrics.accumulated_token_usage
+        accumulated_usage = self.state.metrics.accumulated_token_usage
        self.log(
            'debug',
            f'Action metrics - accumulated_cost: {metrics.accumulated_cost}, '
@@ -1432,7 +1216,7 @@ class AgentController:
        )

    def _is_awaiting_observation(self) -> bool:
-        events = self.event_stream.get_events(reverse=True)
+        events = self.event_stream.search_events(reverse=True)
        for event in events:
            if isinstance(event, AgentStateChangedObservation):
                result = event.agent_state == AgentState.RUNNING
@@ -1473,7 +1257,7 @@ class AgentController:
        self._cached_first_user_message = next(
            (
                e
-                for e in self.event_stream.get_events(
+                for e in self.event_stream.search_events(
                    start_id=self.state.start_id,
                )
                if isinstance(e, MessageAction) and e.source == EventSource.USER
@@ -1481,3 +1265,6 @@ class AgentController:
            None,
        )
        return self._cached_first_user_message
+
+    def save_state(self):
+        self.state_tracker.save_state()
--- a/openhands/controller/state/control_flags.py
+++ b/openhands/controller/state/control_flags.py
@@ -0,0 +1,95 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Generic, TypeVar
+
+T = TypeVar(
+    'T', int, float
+)  # Type for the value (int for iterations, float for budget)
+
+
+@dataclass
+class ControlFlag(Generic[T]):
+    """Base class for control flags that manage limits and state transitions."""
+
+    limit_increase_amount: T
+    current_value: T
+    max_value: T
+    headless_mode: bool = False
+    _hit_limit: bool = False
+
+    def reached_limit(self) -> bool:
+        """Check if the limit has been reached.
+
+        Returns:
+            bool: True if the limit has been reached, False otherwise.
+        """
+        raise NotImplementedError
+
+    def increase_limit(self, headless_mode: bool) -> None:
+        """Expand the limit when needed."""
+        raise NotImplementedError
+
+    def step(self):
+        """Determine the next state based on the current state and mode.
+
+        Returns:
+            ControlFlagState: The next state.
+        """
+        raise NotImplementedError
+
+
+@dataclass
+class IterationControlFlag(ControlFlag[int]):
+    """Control flag for managing iteration limits."""
+
+    def reached_limit(self) -> bool:
+        """Check if the iteration limit has been reached."""
+        self._hit_limit = self.current_value >= self.max_value
+        return self._hit_limit
+
+    def increase_limit(self, headless_mode: bool) -> None:
+        """Expand the iteration limit by adding the initial value."""
+        if not headless_mode and self._hit_limit:
+            self.max_value += self.limit_increase_amount
+            self._hit_limit = False
+
+    def step(self):
+        if self.reached_limit():
+            raise RuntimeError(
+                f'Agent reached maximum iteration. '
+                f'Current iteration: {self.current_value}, max iteration: {self.max_value}'
+            )
+
+        # Increment the current value
+        self.current_value += 1
+
+
+@dataclass
+class BudgetControlFlag(ControlFlag[float]):
+    """Control flag for managing budget limits."""
+
+    def reached_limit(self) -> bool:
+        """Check if the budget limit has been reached."""
+        self._hit_limit = self.current_value >= self.max_value
+        return self._hit_limit
+
+    def increase_limit(self, headless_mode) -> None:
+        """Expand the budget limit by adding the initial value to the current value."""
+        if self._hit_limit:
+            self.max_value = self.current_value + self.limit_increase_amount
+            self._hit_limit = False
+
+    def step(self):
+        """Check if we've reached the limit and update state accordingly.
+
+        Note: Unlike IterationControlFlag, this doesn't increment the value
+        as the budget is updated externally.
+        """
+        if self.reached_limit():
+            current_str = f'{self.current_value:.2f}'
+            max_str = f'{self.max_value:.2f}'
+            raise RuntimeError(
+                f'Agent reached maximum budget for conversation.'
+                f'Current budget: {current_str}, max budget: {max_str}'
+            )
--- a/openhands/controller/state/state.py
+++ b/openhands/controller/state/state.py
@@ -8,6 +8,10 @@ from enum import Enum
 from typing import Any

 import openhands
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
 from openhands.core.logger import openhands_logger as logger
 from openhands.core.schema import AgentState
 from openhands.events.action import (
@@ -20,7 +24,15 @@ from openhands.memory.view import View
 from openhands.storage.files import FileStore
 from openhands.storage.locations import get_conversation_agent_state_filename

+RESUMABLE_STATES = [
+    AgentState.RUNNING,
+    AgentState.PAUSED,
+    AgentState.AWAITING_USER_INPUT,
+    AgentState.FINISHED,
+]

+
+# NOTE: this is deprecated
 class TrafficControlState(str, Enum):
    # default state, no rate limiting
    NORMAL = 'normal'
@@ -32,14 +44,6 @@ class TrafficControlState(str, Enum):
    PAUSED = 'paused'


-RESUMABLE_STATES = [
-    AgentState.RUNNING,
-    AgentState.PAUSED,
-    AgentState.AWAITING_USER_INPUT,
-    AgentState.FINISHED,
-]
-
-
@dataclass
 class State:
    """
@@ -75,35 +79,43 @@ class State:
    """

    session_id: str = ''
-    # global iteration for the current task
-    iteration: int = 0
-    # local iteration for the current subtask
-    local_iteration: int = 0
-    # max number of iterations for the current task
-    max_iterations: int = 100
+    iteration_flag: IterationControlFlag = field(
+        default_factory=lambda: IterationControlFlag(
+            limit_increase_amount=100, current_value=0, max_value=100
+        )
+    )
+    budget_flag: BudgetControlFlag | None = None
    confirmation_mode: bool = False
    history: list[Event] = field(default_factory=list)
    inputs: dict = field(default_factory=dict)
    outputs: dict = field(default_factory=dict)
    agent_state: AgentState = AgentState.LOADING
    resume_state: AgentState | None = None
-    traffic_control_state: TrafficControlState = TrafficControlState.NORMAL
    # global metrics for the current task
    metrics: Metrics = field(default_factory=Metrics)
-    # local metrics for the current subtask
-    local_metrics: Metrics = field(default_factory=Metrics)
    # root agent has level 0, and every delegate increases the level by one
    delegate_level: int = 0
    # start_id and end_id track the range of events in history
    start_id: int = -1
    end_id: int = -1

-    delegates: dict[tuple[int, int], tuple[str, str]] = field(default_factory=dict)
-    # NOTE: This will never be used by the controller, but it can be used by different
+    parent_metrics_snapshot: Metrics | None = None
+    parent_iteration: int = 100
+
+    # NOTE: this is used by the controller to track parent's metrics snapshot before delegation
    # evaluation tasks to store extra data needed to track the progress/state of the task.
    extra_data: dict[str, Any] = field(default_factory=dict)
    last_error: str = ''

+    # NOTE: deprecated args, kept here temporarily for backwards compatability
+    # Will be remove in 30 days
+    iteration: int | None = None
+    local_iteration: int | None = None
+    max_iterations: int | None = None
+    traffic_control_state: TrafficControlState | None = None
+    local_metrics: Metrics | None = None
+    delegates: dict[tuple[int, int], tuple[str, str]] | None = None
+
    def save_to_session(
        self, sid: str, file_store: FileStore, user_id: str | None
    ) -> None:
@@ -165,6 +177,10 @@ class State:

        # first state after restore
        state.agent_state = AgentState.LOADING
+
+        # We don't need to clean up deprecated fields here
+        # They will be handled by __getstate__ when the state is saved again
+
        return state

    def __getstate__(self) -> dict:
@@ -177,15 +193,52 @@ class State:
        state.pop('_history_checksum', None)
        state.pop('_view', None)

+        # Remove deprecated fields before pickling
+        state.pop('iteration', None)
+        state.pop('local_iteration', None)
+        state.pop('max_iterations', None)
+        state.pop('traffic_control_state', None)
+        state.pop('local_metrics', None)
+        state.pop('delegates', None)
+
        return state

    def __setstate__(self, state: dict) -> None:
+        # Check if we're restoring from an older version (before control flags)
+        is_old_version = 'iteration' in state
+
+        # Convert old iteration tracking to new iteration_flag if needed
+        if is_old_version:
+            # Create iteration_flag from old values
+            max_iterations = state.get('max_iterations', 100)
+            current_iteration = state.get('iteration', 0)
+
+            # Add the iteration_flag to the state
+            state['iteration_flag'] = IterationControlFlag(
+                limit_increase_amount=max_iterations,
+                current_value=current_iteration,
+                max_value=max_iterations,
+            )
+
+        # Update the state
        self.__dict__.update(state)

+        # We keep the deprecated fields for backward compatibility
+        # They will be removed by __getstate__ when the state is saved again
+
        # make sure we always have the attribute history
        if not hasattr(self, 'history'):
            self.history = []

+        # Ensure we have default values for new fields if they're missing
+        if not hasattr(self, 'iteration_flag'):
+            self.iteration_flag = IterationControlFlag(
+                limit_increase_amount=100, current_value=0, max_value=100
+            )
+
+        if not hasattr(self, 'budget_flag'):
+            self.budget_flag = None
+
    def get_current_user_intent(self) -> tuple[str | None, list[str] | None]:
        """Returns the latest user message and image(if provided) that appears after a FinishAction, or the first (the task) if nothing was finished yet."""
        last_user_message = None
@@ -223,6 +276,17 @@ class State:
            ],
        }

+    def get_local_step(self):
+        if not self.parent_iteration:
+            return self.iteration_flag.current_value
+
+        return self.iteration_flag.current_value - self.parent_iteration
+
+    def get_local_metrics(self):
+        if not self.parent_metrics_snapshot:
+            return self.metrics
+        return self.metrics.diff(self.parent_metrics_snapshot)
+
    @property
    def view(self) -> View:
        # Compute a simple checksum from the history to see if we can re-use any
--- a/openhands/controller/state/state_tracker.py
+++ b/openhands/controller/state/state_tracker.py
@@ -0,0 +1,290 @@
+from openhands.controller.agent import Agent
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
+from openhands.controller.state.state import State
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action.agent import AgentDelegateAction, ChangeAgentStateAction
+from openhands.events.action.empty import NullAction
+from openhands.events.event import Event
+from openhands.events.event_filter import EventFilter
+from openhands.events.observation.agent import AgentStateChangedObservation
+from openhands.events.observation.delegate import AgentDelegateObservation
+from openhands.events.observation.empty import NullObservation
+from openhands.events.serialization.event import event_to_trajectory
+from openhands.events.stream import EventStream
+from openhands.llm.metrics import Metrics
+from openhands.storage.files import FileStore
+
+
+class StateTracker:
+    """Manages and synchronizes the state of an agent throughout its lifecycle.
+
+    It is responsible for:
+    1. Maintaining agent state persistence across sessions
+    2. Managing agent history by filtering and tracking relevant events (previously done in the agent controller)
+    3. Synchronizing metrics between the controller and LLM components
+    4. Updating control flags for budget and iteration limits
+
+    """
+
+    def __init__(
+        self, sid: str | None, file_store: FileStore | None, user_id: str | None
+    ):
+        self.sid = sid
+        self.file_store = file_store
+        self.user_id = user_id
+
+        # filter out events that are not relevant to the agent
+        # so they will not be included in the agent history
+        self.agent_history_filter = EventFilter(
+            exclude_types=(
+                NullAction,
+                NullObservation,
+                ChangeAgentStateAction,
+                AgentStateChangedObservation,
+            ),
+            exclude_hidden=True,
+        )
+
+    def set_initial_state(
+        self,
+        id: str,
+        agent: Agent,
+        state: State | None,
+        max_iterations: int,
+        max_budget_per_task: float | None,
+        confirmation_mode: bool = False,
+    ) -> None:
+        """Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
+
+        Args:
+            state: The state to initialize with, or None to create a new state.
+            max_iterations: The maximum number of iterations allowed for the task.
+            confirmation_mode: Whether to enable confirmation mode.
+        """
+        # state can come from:
+        # - the previous session, in which case it has history
+        # - from a parent agent, in which case it has no history
+        # - None / a new state
+
+        # If state is None, we create a brand new state and still load the event stream so we can restore the history
+        if state is None:
+            self.state = State(
+                session_id=id.removesuffix('-delegate'),
+                inputs={},
+                iteration_flag=IterationControlFlag(
+                    limit_increase_amount=max_iterations,
+                    current_value=0,
+                    max_value=max_iterations,
+                ),
+                budget_flag=None
+                if not max_budget_per_task
+                else BudgetControlFlag(
+                    limit_increase_amount=max_budget_per_task,
+                    current_value=0,
+                    max_value=max_budget_per_task,
+                ),
+                confirmation_mode=confirmation_mode,
+            )
+            self.state.start_id = 0
+
+            logger.info(
+                f'AgentController {id} - created new state. start_id: {self.state.start_id}'
+            )
+        else:
+            self.state = state
+            if self.state.start_id <= -1:
+                self.state.start_id = 0
+
+            logger.info(
+                f'AgentController {id} initializing history from event {self.state.start_id}',
+            )
+
+        # Share the state metrics with the agent's LLM metrics
+        # This ensures that all accumulated metrics are always in sync between controller and llm
+        agent.llm.metrics = self.state.metrics
+
+    def _init_history(self, event_stream: EventStream) -> None:
+        """Initializes the agent's history from the event stream.
+
+        The history is a list of events that:
+        - Excludes events of types listed in self.filter_out
+        - Excludes events with hidden=True attribute
+        - For delegate events (between AgentDelegateAction and AgentDelegateObservation):
+            - Excludes all events between the action and observation
+            - Includes the delegate action and observation themselves
+        """
+        # define range of events to fetch
+        # delegates start with a start_id and initially won't find any events
+        # otherwise we're restoring a previous session
+        start_id = self.state.start_id if self.state.start_id >= 0 else 0
+        end_id = (
+            self.state.end_id
+            if self.state.end_id >= 0
+            else event_stream.get_latest_event_id()
+        )
+
+        # sanity check
+        if start_id > end_id + 1:
+            logger.warning(
+                f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
+            )
+            self.state.history = []
+            return
+
+        events: list[Event] = []
+
+        # Get rest of history
+        events_to_add = list(
+            event_stream.search_events(
+                start_id=start_id,
+                end_id=end_id,
+                reverse=False,
+                filter=self.agent_history_filter,
+            )
+        )
+        events.extend(events_to_add)
+
+        # Find all delegate action/observation pairs
+        delegate_ranges: list[tuple[int, int]] = []
+        delegate_action_ids: list[int] = []  # stack of unmatched delegate action IDs
+
+        for event in events:
+            if isinstance(event, AgentDelegateAction):
+                delegate_action_ids.append(event.id)
+                # Note: we can get agent=event.agent and task=event.inputs.get('task','')
+                # if we need to track these in the future
+
+            elif isinstance(event, AgentDelegateObservation):
+                # Match with most recent unmatched delegate action
+                if not delegate_action_ids:
+                    logger.warning(
+                        f'Found AgentDelegateObservation without matching action at id={event.id}',
+                    )
+                    continue
+
+                action_id = delegate_action_ids.pop()
+                delegate_ranges.append((action_id, event.id))
+
+        # Filter out events between delegate action/observation pairs
+        if delegate_ranges:
+            filtered_events: list[Event] = []
+            current_idx = 0
+
+            for start_id, end_id in sorted(delegate_ranges):
+                # Add events before delegate range
+                filtered_events.extend(
+                    event for event in events[current_idx:] if event.id < start_id
+                )
+
+                # Add delegate action and observation
+                filtered_events.extend(
+                    event for event in events if event.id in (start_id, end_id)
+                )
+
+                # Update index to after delegate range
+                current_idx = next(
+                    (i for i, e in enumerate(events) if e.id > end_id), len(events)
+                )
+
+            # Add any remaining events after last delegate range
+            filtered_events.extend(events[current_idx:])
+
+            self.state.history = filtered_events
+        else:
+            self.state.history = events
+
+        # make sure history is in sync
+        self.state.start_id = start_id
+
+    def close(self, event_stream: EventStream):
+        # we made history, now is the time to rewrite it!
+        # the final state.history will be used by external scripts like evals, tests, etc.
+        # history will need to be complete WITH delegates events
+        # like the regular agent history, it does not include:
+        # - 'hidden' events, events with hidden=True
+        # - backend events (the default 'filtered out' types, types in self.filter_out)
+        start_id = self.state.start_id if self.state.start_id >= 0 else 0
+        end_id = (
+            self.state.end_id
+            if self.state.end_id >= 0
+            else event_stream.get_latest_event_id()
+        )
+
+        self.state.history = list(
+            event_stream.search_events(
+                start_id=start_id,
+                end_id=end_id,
+                reverse=False,
+                filter=self.agent_history_filter,
+            )
+        )
+
+    def add_history(self, event: Event):
+        # if the event is not filtered out, add it to the history
+        if self.agent_history_filter.include(event):
+            self.state.history.append(event)
+
+    def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
+        return [
+            event_to_trajectory(event, include_screenshots)
+            for event in self.state.history
+        ]
+
+    def maybe_increase_control_flags_limits(self, headless_mode: bool):
+        # Iteration and budget extensions are independent of each other
+        # An error will be thrown if any one of the control flags have reached or exceeded its limit
+        self.state.iteration_flag.increase_limit(headless_mode)
+        if self.state.budget_flag:
+            self.state.budget_flag.increase_limit(headless_mode)
+
+    def get_metrics_snapshot(self):
+        """
+        Deep copy of metrics
+        This serves as a snapshot for the parent's metrics at the time a delegate is created
+        It will be stored and used to compute local metrics for the delegate
+        (since delegates now accumulate metrics from where its parent left off)
+        """
+
+        return self.state.metrics.copy()
+
+    def save_state(self):
+        """
+        Save's current state to persistent store
+        """
+        if self.sid and self.file_store:
+            self.state.save_to_session(self.sid, self.file_store, self.user_id)
+
+    def run_control_flags(self):
+        """
+        Performs one step of the control flags
+        """
+        self.state.iteration_flag.step()
+        if self.state.budget_flag:
+            self.state.budget_flag.step()
+
+    def sync_budget_flag_with_metrics(self):
+        """
+        Ensures that budget flag is up to date with accumulated costs from llm completions
+        Budget flag will monitor for when budget is exceeded
+        """
+        if self.state.budget_flag:
+            self.state.budget_flag.current_value = self.state.metrics.accumulated_cost
+
+    def merge_metrics(self, metrics: Metrics):
+        """
+        Merges metrics with the state metrics
+
+        NOTE: this should be refactored in the future. We should have services (draft llm, title autocomplete, condenser, etc)
+        use their own LLMs, but the metrics object should be shared. This way we have one source of truth for accumulated costs from
+        all services
+
+        This would prevent having fragmented stores for metrics, and we don't have the burden of deciding where and how to store them
+        if we decide introduce more specialized services that require llm completions
+
+        """
+        self.state.metrics.merge(metrics)
+        if self.state.budget_flag:
+            self.state.budget_flag.current_value = self.state.metrics.accumulated_cost
--- a/openhands/core/config/utils.py
+++ b/openhands/core/config/utils.py
@@ -744,27 +744,6 @@ def get_parser() -> argparse.ArgumentParser:
        type=bool,
        default=False,
    )
-
-    # LLM configuration arguments for local models
-    parser.add_argument(
-        '--llm-model',
-        help='LLM model to use (e.g., "lm_studio/devstral", "openai/gpt-4")',
-        type=str,
-        default=None,
-    )
-    parser.add_argument(
-        '--llm-base-url',
-        help='Base URL for LLM API (required for local models, e.g., "http://localhost:1234/v1")',
-        type=str,
-        default=None,
-    )
-    parser.add_argument(
-        '--llm-api-key',
-        help='API key for LLM (use "dummy" for local models)',
-        type=str,
-        default=None,
-    )
-
    return parser


@@ -842,21 +821,6 @@ def setup_config_from_args(args: argparse.Namespace) -> OpenHandsConfig:
            raise ValueError(f'Invalid toml file, cannot read {args.llm_config}')
        config.set_llm_config(llm_config)

-    # Override LLM settings with direct CLI arguments
-    if args.llm_model or args.llm_base_url or args.llm_api_key:
-        from pydantic import SecretStr
-
-        llm_config = config.get_llm_config()
-
-        if args.llm_model:
-            llm_config.model = args.llm_model
-        if args.llm_base_url:
-            llm_config.base_url = args.llm_base_url
-        if args.llm_api_key:
-            llm_config.api_key = SecretStr(args.llm_api_key)
-
-        config.set_llm_config(llm_config)
-
    # Override default agent if provided
    if args.agent_cls:
        config.default_agent = args.agent_cls
--- a/openhands/core/main.py
+++ b/openhands/core/main.py
@@ -5,6 +5,7 @@ from pathlib import Path
 from typing import Callable, Protocol

 import openhands.agenthub  # noqa F401 (we import this to get the agents registered)
+import openhands.cli.suppress_warnings  # noqa: F401
 from openhands.controller.agent import Agent
 from openhands.controller.replay import ReplayManager
 from openhands.controller.state.state import State
@@ -139,9 +140,9 @@ async def run_controller(
                config.mcp_host, config, None
            )
        )
-        config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
+        runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)

-        await add_mcp_tools_to_agent(agent, runtime, memory, config)
+        await add_mcp_tools_to_agent(agent, runtime, memory)

    replay_events: list[Event] | None = None
    if config.replay_trajectory_path:
--- a/openhands/core/setup.py
+++ b/openhands/core/setup.py
@@ -206,8 +206,8 @@ def create_controller(

    controller = AgentController(
        agent=agent,
-        max_iterations=config.max_iterations,
-        max_budget_per_task=config.max_budget_per_task,
+        iteration_delta=config.max_iterations,
+        budget_per_task_delta=config.max_budget_per_task,
        agent_to_llm_config=config.get_agent_to_llm_config_map(),
        event_stream=event_stream,
        initial_state=initial_state,
--- a/openhands/events/async_event_store_wrapper.py
+++ b/openhands/events/async_event_store_wrapper.py
@@ -15,8 +15,8 @@ class AsyncEventStoreWrapper:
        loop = asyncio.get_running_loop()

        # Create an async generator that yields events
-        for event in self.event_store.get_events(*self.args, **self.kwargs):
-            # Run the blocking get_events() in a thread pool
+        for event in self.event_store.search_events(*self.args, **self.kwargs):
+            # Run the blocking search_events() in a thread pool
            def get_event(e: Event = event) -> Event:
                return e

--- a/openhands/events/event_store.py
+++ b/openhands/events/event_store.py
@@ -140,7 +140,7 @@ class EventStore(EventStoreABC):
        return self.cur_id - 1

    def filtered_events_by_source(self, source: EventSource) -> Iterable[Event]:
-        for event in self.get_events():
+        for event in self.search_events():
            if event.source == source:
                yield event

--- a/openhands/integrations/provider.py
+++ b/openhands/integrations/provider.py
@@ -321,7 +321,7 @@ class ProviderHandler:

    async def verify_repo_provider(
        self, repository: str, specified_provider: ProviderType | None = None
-    ):
+    ) -> Repository:
        if specified_provider:
            try:
                service = self._get_service(specified_provider)
--- a/openhands/llm/llm.py
+++ b/openhands/llm/llm.py
@@ -773,9 +773,6 @@ class LLM(RetryMixin, DebugMixin):
    def __repr__(self) -> str:
        return str(self)

-    def reset(self) -> None:
-        self.metrics.reset()
-
    def format_messages_for_llm(self, messages: Message | list[Message]) -> list[dict]:
        if isinstance(messages, Message):
            messages = [messages]
--- a/openhands/llm/metrics.py
+++ b/openhands/llm/metrics.py
@@ -193,22 +193,6 @@ class Metrics:
            'token_usages': [usage.model_dump() for usage in self._token_usages],
        }

-    def reset(self) -> None:
-        self._accumulated_cost = 0.0
-        self._costs = []
-        self._response_latencies = []
-        self._token_usages = []
-        # Reset accumulated token usage with a new instance
-        self._accumulated_token_usage = TokenUsage(
-            model=self.model_name,
-            prompt_tokens=0,
-            completion_tokens=0,
-            cache_read_tokens=0,
-            cache_write_tokens=0,
-            context_window=0,
-            response_id='',
-        )
-
    def log(self) -> str:
        """Log the metrics."""
        metrics = self.get()
@@ -221,5 +205,58 @@ class Metrics:
        """Create a deep copy of the Metrics object."""
        return copy.deepcopy(self)

+    def diff(self, baseline: 'Metrics') -> 'Metrics':
+        """Calculate the difference between current metrics and a baseline.
+
+        This is useful for tracking metrics for specific operations like delegates.
+
+        Args:
+            baseline: A metrics object representing the baseline state
+
+        Returns:
+            A new Metrics object containing only the differences since the baseline
+        """
+        result = Metrics(self.model_name)
+
+        # Calculate cost difference
+        result._accumulated_cost = self._accumulated_cost - baseline._accumulated_cost
+
+        # Include only costs that were added after the baseline
+        if baseline._costs:
+            last_baseline_timestamp = baseline._costs[-1].timestamp
+            result._costs = [
+                cost for cost in self._costs if cost.timestamp > last_baseline_timestamp
+            ]
+        else:
+            result._costs = self._costs.copy()
+
+        # Include only response latencies that were added after the baseline
+        result._response_latencies = self._response_latencies[
+            len(baseline._response_latencies) :
+        ]
+
+        # Include only token usages that were added after the baseline
+        result._token_usages = self._token_usages[len(baseline._token_usages) :]
+
+        # Calculate accumulated token usage difference
+        base_usage = baseline.accumulated_token_usage
+        current_usage = self.accumulated_token_usage
+
+        result._accumulated_token_usage = TokenUsage(
+            model=self.model_name,
+            prompt_tokens=current_usage.prompt_tokens - base_usage.prompt_tokens,
+            completion_tokens=current_usage.completion_tokens
+            - base_usage.completion_tokens,
+            cache_read_tokens=current_usage.cache_read_tokens
+            - base_usage.cache_read_tokens,
+            cache_write_tokens=current_usage.cache_write_tokens
+            - base_usage.cache_write_tokens,
+            context_window=current_usage.context_window,
+            per_turn_token=0,
+            response_id='',
+        )
+
+        return result
+
    def __repr__(self) -> str:
        return f'Metrics({self.get()}'
--- a/openhands/mcp/utils.py
+++ b/openhands/mcp/utils.py
@@ -10,7 +10,6 @@ from openhands.core.config.mcp_config import (
    MCPSHTTPServerConfig,
    MCPSSEServerConfig,
 )
-from openhands.core.config.openhands_config import OpenHandsConfig
 from openhands.core.logger import openhands_logger as logger
 from openhands.events.action.mcp import MCPAction
 from openhands.events.observation.mcp import MCPObservation
@@ -187,9 +186,7 @@ async def call_tool_mcp(mcp_clients: list[MCPClient], action: MCPAction) -> Obse
    )


-async def add_mcp_tools_to_agent(
-    agent: 'Agent', runtime: Runtime, memory: 'Memory', app_config: OpenHandsConfig
-):
+async def add_mcp_tools_to_agent(agent: 'Agent', runtime: Runtime, memory: 'Memory'):
    """
    Add MCP tools to an agent.
    """
@@ -208,7 +205,6 @@ async def add_mcp_tools_to_agent(
    extra_stdio_servers = []

    # Add microagent MCP tools if available
-    mcp_config: MCPConfig = app_config.mcp
    microagent_mcp_configs = memory.get_microagent_mcp_tools()
    for mcp_config in microagent_mcp_configs:
        if mcp_config.sse_servers:
--- a/openhands/runtime/init.py
+++ b/openhands/runtime/init.py
@@ -1,6 +1,13 @@
 from openhands.runtime.base import Runtime
 from openhands.runtime.impl.cli.cli_runtime import CLIRuntime
-from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
+
+try:
+    from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
+
+    _DAYTONA_AVAILABLE = True
+except ImportError:
+    _DAYTONA_AVAILABLE = False
+    DaytonaRuntime = None  # type: ignore
 from openhands.runtime.impl.docker.docker_runtime import (
    DockerRuntime,
 )
@@ -20,7 +27,7 @@ _DEFAULT_RUNTIME_CLASSES: dict[str, type[Runtime]] = {
    'modal': ModalRuntime,
    'runloop': RunloopRuntime,
    'local': LocalRuntime,
-    'daytona': DaytonaRuntime,
+    **({'daytona': DaytonaRuntime} if _DAYTONA_AVAILABLE else {}),
    'cli': CLIRuntime,
 }

@@ -49,7 +56,9 @@ __all__ = [
    'ModalRuntime',
    'RunloopRuntime',
    'DockerRuntime',
-    'DaytonaRuntime',
    'CLIRuntime',
    'get_runtime_cls',
 ]
+
+if _DAYTONA_AVAILABLE:
+    __all__.append('DaytonaRuntime')
--- a/openhands/runtime/base.py
+++ b/openhands/runtime/base.py
@@ -372,20 +372,6 @@ class Runtime(FileEditRuntimeMixin):
        selected_repository: str | None,
        selected_branch: str | None,
    ) -> str:
-        repository = None
-        if selected_repository:  # Determine provider from repo name
-            try:
-                provider_handler = ProviderHandler(
-                    git_provider_tokens or MappingProxyType({})
-                )
-                repository = await provider_handler.verify_repo_provider(
-                    selected_repository
-                )
-            except AuthenticationError:
-                raise RuntimeError(
-                    'Git provider authentication issue when cloning repo'
-                )
-
        if not selected_repository:
            # In SaaS mode (indicated by user_id being set), always run git init
            # In OSS mode, only run git init if workspace_base is not set
@@ -403,34 +389,9 @@ class Runtime(FileEditRuntimeMixin):
                )
            return ''

-        # This satisfies mypy because param is optional, but `verify_repo_provider` guarentees this gets populated
-        if not repository:
-            return ''
-
-        provider = repository.git_provider
-        provider_domains = {
-            ProviderType.GITHUB: 'github.com',
-            ProviderType.GITLAB: 'gitlab.com',
-        }
-
-        domain = provider_domains[provider]
-
-        # If git_provider_tokens is provided, use the host from the token if available
-        if git_provider_tokens and provider in git_provider_tokens:
-            domain = git_provider_tokens[provider].host or domain
-
-        # Try to use token if available, otherwise use public URL
-        if git_provider_tokens and provider in git_provider_tokens:
-            git_token = git_provider_tokens[provider].token
-            if git_token:
-                if provider == ProviderType.GITLAB:
-                    remote_repo_url = f'https://oauth2:{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
-                else:
-                    remote_repo_url = f'https://{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
-            else:
-                remote_repo_url = f'https://{domain}/{selected_repository}.git'
-        else:
-            remote_repo_url = f'https://{domain}/{selected_repository}.git'
+        remote_repo_url = await self._get_authenticated_git_url(
+            selected_repository, git_provider_tokens
+        )

        if not remote_repo_url:
            raise ValueError('Missing either Git token or valid repository')
@@ -630,36 +591,52 @@ fi

        return loaded_microagents

-    def _get_authenticated_git_url(self, repo_path: str) -> str:
+    async def _get_authenticated_git_url(
+        self, repo_name: str, git_provider_tokens: PROVIDER_TOKEN_TYPE | None
+    ) -> str:
        """Get an authenticated git URL for a repository.

        Args:
-            repo_path: Repository path (e.g., "github.com/acme-co/api")
+            repo_path: Repository name (owner/repo)

        Returns:
            Authenticated git URL if credentials are available, otherwise regular HTTPS URL
        """
-        remote_url = f'https://{repo_path}.git'

-        # Determine provider from repo path
-        provider = None
-        if 'github.com' in repo_path:
-            provider = ProviderType.GITHUB
-        elif 'gitlab.com' in repo_path:
-            provider = ProviderType.GITLAB
+        try:
+            provider_handler = ProviderHandler(
+                git_provider_tokens or MappingProxyType({})
+            )
+            repository = await provider_handler.verify_repo_provider(repo_name)
+        except AuthenticationError:
+            raise Exception('Git provider authentication issue when getting remote URL')

-        # Add authentication if available
-        if (
-            provider
-            and self.git_provider_tokens
-            and provider in self.git_provider_tokens
-        ):
-            git_token = self.git_provider_tokens[provider].token
+        provider = repository.git_provider
+        repo_name = repository.full_name
+
+        provider_domains = {
+            ProviderType.GITHUB: 'github.com',
+            ProviderType.GITLAB: 'gitlab.com',
+        }
+
+        domain = provider_domains[provider]
+
+        # If git_provider_tokens is provided, use the host from the token if available
+        if git_provider_tokens and provider in git_provider_tokens:
+            domain = git_provider_tokens[provider].host or domain
+
+        # Try to use token if available, otherwise use public URL
+        if git_provider_tokens and provider in git_provider_tokens:
+            git_token = git_provider_tokens[provider].token
            if git_token:
                if provider == ProviderType.GITLAB:
-                    remote_url = f'https://oauth2:{git_token.get_secret_value()}@{repo_path.replace("gitlab.com/", "")}.git'
+                    remote_url = f'https://oauth2:{git_token.get_secret_value()}@{domain}/{repo_name}.git'
                else:
-                    remote_url = f'https://{git_token.get_secret_value()}@{repo_path.replace("github.com/", "")}.git'
+                    remote_url = f'https://{git_token.get_secret_value()}@{domain}/{repo_name}.git'
+            else:
+                remote_url = f'https://{domain}/{repo_name}.git'
+        else:
+            remote_url = f'https://{domain}/{repo_name}.git'

        return remote_url

@@ -685,13 +662,10 @@ fi
            return loaded_microagents

        # Extract the domain and org/user name
-        domain = repo_parts[0] if len(repo_parts) > 2 else 'github.com'
        org_name = repo_parts[-2]

        # Construct the org-level .openhands repo path
-        org_openhands_repo = f'{domain}/{org_name}/.openhands'
-        if domain not in org_openhands_repo:
-            org_openhands_repo = f'github.com/{org_openhands_repo}'
+        org_openhands_repo = f'{org_name}/.openhands'

        self.log(
            'info',
@@ -704,9 +678,18 @@ fi
            org_repo_dir = self.workspace_root / f'org_openhands_{org_name}'

            # Get authenticated URL and do a shallow clone (--depth 1) for efficiency
-            remote_url = self._get_authenticated_git_url(org_openhands_repo)
-
-            clone_cmd = f'git clone --depth 1 {remote_url} {org_repo_dir}'
+            try:
+                remote_url = call_async_from_sync(
+                    self._get_authenticated_git_url,
+                    GENERAL_TIMEOUT,
+                    org_openhands_repo,
+                    self.git_provider_tokens,
+                )
+            except Exception as e:
+                raise Exception(str(e))
+            clone_cmd = (
+                f'GIT_TERMINAL_PROMPT=0 git clone --depth 1 {remote_url} {org_repo_dir}'
+            )

            action = CmdRunAction(command=clone_cmd)
            obs = self.run_action(action)
--- a/openhands/runtime/impl/init.py
+++ b/openhands/runtime/impl/init.py
@@ -6,7 +6,14 @@ from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
 from openhands.runtime.impl.cli import CLIRuntime
-from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
+
+try:
+    from openhands.runtime.impl.daytona.daytona_runtime import DaytonaRuntime
+
+    _DAYTONA_AVAILABLE = True
+except ImportError:
+    _DAYTONA_AVAILABLE = False
+    DaytonaRuntime = None  # type: ignore
 from openhands.runtime.impl.docker.docker_runtime import DockerRuntime
 from openhands.runtime.impl.e2b.e2b_runtime import E2BRuntime
 from openhands.runtime.impl.local.local_runtime import LocalRuntime
@@ -17,7 +24,6 @@ from openhands.runtime.impl.runloop.runloop_runtime import RunloopRuntime
 __all__ = [
    'ActionExecutionClient',
    'CLIRuntime',
-    'DaytonaRuntime',
    'DockerRuntime',
    'E2BRuntime',
    'LocalRuntime',
@@ -25,3 +31,6 @@ __all__ = [
    'RemoteRuntime',
    'RunloopRuntime',
 ]
+
+if _DAYTONA_AVAILABLE:
+    __all__.append('DaytonaRuntime')
--- a/openhands/runtime/impl/cli/cli_runtime.py
+++ b/openhands/runtime/impl/cli/cli_runtime.py
@@ -5,6 +5,7 @@ It does not implement browser functionality.

 import asyncio
 import os
+import re
 import select
 import shutil
 import signal
@@ -50,6 +51,7 @@ from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.base import Runtime
 from openhands.runtime.plugins import PluginRequirement
 from openhands.runtime.runtime_status import RuntimeStatus
+from openhands.runtime.utils.bash import SubprocessBashSession


 class CLIRuntime(Runtime):
@@ -119,6 +121,13 @@ class CLIRuntime(Runtime):
        self.file_editor = OHEditor(workspace_root=self._workspace_path)
        self._shell_stream_callback: Callable[[str], None] | None = None

+        # Initialize bash session
+        self.bash_session = SubprocessBashSession(
+            work_dir=self._workspace_path,
+            username=None,
+            no_change_timeout_seconds=30,
+        )
+
        logger.warning(
            'Initializing CLIRuntime. WARNING: NO SANDBOX IS USED. '
            'This runtime executes commands directly on the local system. '
@@ -138,6 +147,9 @@ class CLIRuntime(Runtime):
        if not self.attach_to_existing:
            await asyncio.to_thread(self.setup_initial_env)

+        # Initialize bash session
+        self.bash_session.initialize()
+
        self._runtime_initialized = True
        self.set_runtime_status(RuntimeStatus.RUNTIME_STARTED)
        logger.info(f'CLIRuntime initialized with workspace at {self._workspace_path}')
@@ -351,7 +363,7 @@ class CLIRuntime(Runtime):
        )

    def run(self, action: CmdRunAction) -> Observation:
-        """Run a command using subprocess."""
+        """Run a command using the bash session."""
        if not self._runtime_initialized:
            return ErrorObservation(
                f'Runtime not initialized for command: {action.command}'
@@ -369,18 +381,36 @@ class CLIRuntime(Runtime):
            )

        try:
-            effective_timeout = (
-                action.timeout
-                if action.timeout is not None
-                else self.config.sandbox.timeout
-            )
+            # Set effective timeout if not already set
+            if action.timeout is None:
+                action.set_hard_timeout(self.config.sandbox.timeout)

            logger.debug(
-                f'Running command in CLIRuntime: "{action.command}" with effective timeout: {effective_timeout}s'
-            )
-            return self._execute_shell_command(
-                action.command, timeout=effective_timeout
+                f'Running command in CLIRuntime: "{action.command}" with effective timeout: {action.timeout}s'
            )
+
+            # Use the bash session to execute the command
+            obs = self.bash_session.execute(action)
+
+            # For CLIRuntime, we need to adjust the timeout message format and working directory
+            if isinstance(obs, CmdOutputObservation):
+                # Fix timeout message format for CLIRuntime
+                if obs.metadata.suffix and 'timed out after' in obs.metadata.suffix:
+                    # Extract timeout duration from the suffix
+                    match = re.search(
+                        r'timed out after ([\d.]+) seconds', obs.metadata.suffix
+                    )
+                    if match:
+                        timeout_duration = match.group(1)
+                        obs.metadata.suffix = (
+                            f'[The command timed out after {timeout_duration} seconds.]'
+                        )
+
+                # Fix working directory for CLIRuntime
+                obs.metadata.working_dir = self._workspace_path
+
+            return obs
+
        except Exception as e:
            logger.error(
                f'Error in CLIRuntime.run for command "{action.command}": {str(e)}'
@@ -737,6 +767,10 @@ class CLIRuntime(Runtime):
            raise RuntimeError(f'Error creating zip file: {str(e)}')

    def close(self) -> None:
+        # Clean up bash session
+        if hasattr(self, 'bash_session'):
+            self.bash_session.close()
+
        self._runtime_initialized = False
        super().close()

--- a/openhands/runtime/impl/daytona/README.md
+++ b/openhands/runtime/impl/daytona/README.md
@@ -89,12 +89,14 @@ docker run -it --rm --pull=always \
    -e LOG_ALL_EVENTS=true \
    -e RUNTIME=daytona \
    -e DAYTONA_API_KEY=${DAYTONA_API_KEY} \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:${OPENHANDS_VERSION}
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 #### Windows:
 ```powershell
 docker run -it --rm --pull=always `
@@ -102,12 +104,14 @@ docker run -it --rm --pull=always `
    -e LOG_ALL_EVENTS=true `
    -e RUNTIME=daytona `
    -e DAYTONA_API_KEY=${env:DAYTONA_API_KEY} `
-    -v ~/.openhands-state:/.openhands-state `
+    -v ~/.openhands:/.openhands `
    -p 3000:3000 `
    --name openhands-app `
    docker.all-hands.dev/all-hands-ai/openhands:${env:OPENHANDS_VERSION}
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 > **Tip:** If you don't want your sandboxes to default to the EU region, you can set the `DAYTONA_TARGET` environment variable to `us`

 ### Running OpenHands Locally Without Docker
--- a/openhands/runtime/impl/daytona/daytona_runtime.py
+++ b/openhands/runtime/impl/daytona/daytona_runtime.py
@@ -1,18 +1,18 @@
-import json
 from typing import Callable

 import httpx
 import tenacity
-from daytona_sdk import (
-    CreateWorkspaceParams,
+from daytona import (
+    CreateSandboxFromSnapshotParams,
    Daytona,
    DaytonaConfig,
+    Sandbox,
    SessionExecuteRequest,
-    Workspace,
 )

 from openhands.core.config.openhands_config import OpenHandsConfig
 from openhands.events.stream import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -23,11 +23,11 @@ from openhands.runtime.utils.request import RequestHTTPError
 from openhands.utils.async_utils import call_sync_from_async
 from openhands.utils.tenacity_stop import stop_if_should_exit

-WORKSPACE_PREFIX = 'openhands-sandbox-'
+OPENHANDS_SID_LABEL = 'OpenHands_SID'


 class DaytonaRuntime(ActionExecutionClient):
-    """The DaytonaRuntime class is a DockerRuntime that utilizes Daytona workspace as a runtime environment."""
+    """The DaytonaRuntime class is a DockerRuntime that utilizes Daytona Sandboxes as runtime environments."""

    _sandbox_port: int = 4444
    _vscode_port: int = 4445
@@ -42,13 +42,14 @@ class DaytonaRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.daytona_api_key, 'Daytona API key is required'

        self.config = config
        self.sid = sid
-        self.workspace_id = WORKSPACE_PREFIX + sid
-        self.workspace: Workspace | None = None
+        self.sandbox: Sandbox | None = None
        self._vscode_url: str | None = None

        daytona_config = DaytonaConfig(
@@ -74,22 +75,28 @@ class DaytonaRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

-    def _get_workspace(self) -> Workspace | None:
+    def _get_sandbox(self) -> Sandbox | None:
        try:
-            workspace = self.daytona.get_current_workspace(self.workspace_id)
-            self.log(
-                'info', f'Attached to existing workspace with id: {self.workspace_id}'
-            )
+            sandboxes = self.daytona.list({OPENHANDS_SID_LABEL: self.sid})
+            if len(sandboxes) == 0:
+                return None
+            assert len(sandboxes) == 1, 'Multiple sandboxes found for SID'
+
+            sandbox = sandboxes[0]
+
+            self.log('info', f'Attached to existing sandbox with id: {self.sid}')
        except Exception:
            self.log(
                'warning',
-                f'Failed to attach to existing workspace with id: {self.workspace_id}',
+                f'Failed to attach to existing sandbox with id: {self.sid}',
            )
-            workspace = None
+            sandbox = None

-        return workspace
+        return sandbox

    def _get_creation_env_vars(self) -> dict[str, str]:
        env_vars: dict[str, str] = {
@@ -103,37 +110,28 @@ class DaytonaRuntime(ActionExecutionClient):

        return env_vars

-    def _create_workspace(self) -> Workspace:
-        workspace_params = CreateWorkspaceParams(
-            id=self.workspace_id,
+    def _create_sandbox(self) -> Sandbox:
+        sandbox_params = CreateSandboxFromSnapshotParams(
            language='python',
-            image=self.config.sandbox.runtime_container_image,
+            snapshot=self.config.sandbox.runtime_container_image,
            public=True,
            env_vars=self._get_creation_env_vars(),
+            labels={OPENHANDS_SID_LABEL: self.sid},
        )
-        workspace = self.daytona.create(workspace_params)
-        return workspace
+        return self.daytona.create(sandbox_params)

    def _construct_api_url(self, port: int) -> str:
-        assert self.workspace is not None, 'Workspace is not initialized'
-        assert self.workspace.instance.info is not None, (
-            'Workspace info is not available'
-        )
-        assert self.workspace.instance.info.provider_metadata is not None, (
-            'Provider metadata is not available'
-        )
+        assert self.sandbox is not None, 'Sandbox is not initialized'
+        assert self.sandbox.runner_domain is not None, 'Runner domain is not available'

-        node_domain = json.loads(self.workspace.instance.info.provider_metadata)[
-            'nodeDomain'
-        ]
-        return f'https://{port}-{self.workspace.id}.{node_domain}'
+        return f'https://{port}-{self.sandbox.id}.{self.sandbox.runner_domain}'

    @property
    def action_execution_server_url(self) -> str:
        return self.api_url

    def _start_action_execution_server(self) -> None:
-        assert self.workspace is not None, 'Workspace is not initialized'
+        assert self.sandbox is not None, 'Sandbox is not initialized'

        start_command: list[str] = get_action_execution_server_startup_command(
            server_port=self._sandbox_port,
@@ -153,9 +151,9 @@ class DaytonaRuntime(ActionExecutionClient):
        )

        exec_session_id = 'action-execution-server'
-        self.workspace.process.create_session(exec_session_id)
+        self.sandbox.process.create_session(exec_session_id)

-        exec_command = self.workspace.process.execute_session_command(
+        exec_command = self.sandbox.process.execute_session_command(
            exec_session_id,
            SessionExecuteRequest(command=start_command_str, var_async=True),
        )
@@ -175,27 +173,27 @@ class DaytonaRuntime(ActionExecutionClient):
        should_start_action_execution_server = False

        if self.attach_to_existing:
-            self.workspace = await call_sync_from_async(self._get_workspace)
+            self.sandbox = await call_sync_from_async(self._get_sandbox)
        else:
            should_start_action_execution_server = True

-        if self.workspace is None:
+        if self.sandbox is None:
            self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
-            self.workspace = await call_sync_from_async(self._create_workspace)
-            self.log('info', f'Created new workspace with id: {self.workspace_id}')
+            self.sandbox = await call_sync_from_async(self._create_sandbox)
+            self.log('info', f'Created a new sandbox with id: {self.sid}')

        self.api_url = self._construct_api_url(self._sandbox_port)

-        state = self.workspace.instance.state
+        state = self.sandbox.state

        if state == 'stopping':
-            self.log('info', 'Waiting for Daytona workspace to stop...')
-            await call_sync_from_async(self.workspace.wait_for_workspace_stop)
+            self.log('info', 'Waiting for the Daytona sandbox to stop...')
+            await call_sync_from_async(self.sandbox.wait_for_sandbox_stop)
            state = 'stopped'

        if state == 'stopped':
-            self.log('info', 'Starting Daytona workspace...')
-            await call_sync_from_async(self.workspace.start)
+            self.log('info', 'Starting the Daytona sandbox...')
+            await call_sync_from_async(self.sandbox.start)
            should_start_action_execution_server = True

        if should_start_action_execution_server:
@@ -242,8 +240,8 @@ class DaytonaRuntime(ActionExecutionClient):
        if self.attach_to_existing:
            return

-        if self.workspace:
-            self.daytona.remove(self.workspace)
+        if self.sandbox:
+            self.sandbox.delete()

    @property
    def vscode_url(self) -> str | None:
@@ -255,9 +253,9 @@ class DaytonaRuntime(ActionExecutionClient):
                'warning', 'Failed to get VSCode token while trying to get VSCode URL'
            )
            return None
-        if not self.workspace:
+        if not self.sandbox:
            self.log(
-                'warning', 'Workspace is not initialized while trying to get VSCode URL'
+                'warning', 'Sandbox is not initialized while trying to get VSCode URL'
            )
            return None
        self._vscode_url = (
--- a/openhands/runtime/impl/docker/docker_runtime.py
+++ b/openhands/runtime/impl/docker/docker_runtime.py
@@ -17,6 +17,7 @@ from openhands.core.exceptions import (
 from openhands.core.logger import DEBUG, DEBUG_RUNTIME
 from openhands.core.logger import openhands_logger as logger
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.builder import DockerRuntimeBuilder
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
@@ -86,6 +87,8 @@ class DockerRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
        main_module: str = DEFAULT_MAIN_MODULE,
    ):
        if not DockerRuntime._shutdown_listener_id:
@@ -132,6 +135,8 @@ class DockerRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

        # Log runtime_extra_deps after base class initialization so self.sid is available
--- a/openhands/runtime/impl/e2b/e2b_runtime.py
+++ b/openhands/runtime/impl/e2b/e2b_runtime.py
@@ -12,29 +12,42 @@ from openhands.events.observation import (
    Observation,
 )
 from openhands.events.stream import EventStream
-from openhands.runtime.base import Runtime
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
+from openhands.runtime.impl.action_execution.action_execution_client import (
+    ActionExecutionClient,
+)
 from openhands.runtime.impl.e2b.filestore import E2BFileStore
 from openhands.runtime.impl.e2b.sandbox import E2BSandbox
 from openhands.runtime.plugins import PluginRequirement
 from openhands.runtime.utils.files import insert_lines, read_lines


-class E2BRuntime(Runtime):
+class E2BRuntime(ActionExecutionClient):
    def __init__(
        self,
        config: OpenHandsConfig,
        event_stream: EventStream,
        sid: str = 'default',
        plugins: list[PluginRequirement] | None = None,
-        sandbox: E2BSandbox | None = None,
+        env_vars: dict[str, str] | None = None,
        status_callback: Callable | None = None,
+        attach_to_existing: bool = False,
+        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
+        sandbox: E2BSandbox | None = None,
    ):
        super().__init__(
            config,
            event_stream,
            sid,
            plugins,
-            status_callback=status_callback,
+            env_vars,
+            status_callback,
+            attach_to_existing,
+            headless_mode,
+            user_id,
+            git_provider_tokens,
        )
        if sandbox is None:
            self.sandbox = E2BSandbox()
--- a/openhands/runtime/impl/local/local_runtime.py
+++ b/openhands/runtime/impl/local/local_runtime.py
@@ -25,6 +25,7 @@ from openhands.events.observation import (
    Observation,
 )
 from openhands.events.serialization import event_to_dict, observation_from_dict
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -145,6 +146,8 @@ class LocalRuntime(ActionExecutionClient):
        status_callback: Callable[[str, str, str], None] | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ) -> None:
        self.is_windows = sys.platform == 'win32'
        if self.is_windows:
@@ -194,6 +197,8 @@ class LocalRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

        # If there is an API key in the environment we use this in requests to the runtime
--- a/openhands/runtime/impl/modal/modal_runtime.py
+++ b/openhands/runtime/impl/modal/modal_runtime.py
@@ -9,6 +9,7 @@ import tenacity

 from openhands.core.config import OpenHandsConfig
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -53,6 +54,8 @@ class ModalRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.modal_api_token_id, 'Modal API token id is required'
        assert config.modal_api_token_secret, 'Modal API token secret is required'
@@ -100,6 +103,8 @@ class ModalRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

    async def connect(self):
--- a/openhands/runtime/impl/remote/remote_runtime.py
+++ b/openhands/runtime/impl/remote/remote_runtime.py
@@ -140,7 +140,6 @@ class RemoteRuntime(ActionExecutionClient):
            )
        else:
            self.log('info', 'No existing runtime found, starting a new one')
-            self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
            if self.config.sandbox.runtime_container_image is None:
                self.log(
                    'info',
@@ -160,7 +159,6 @@ class RemoteRuntime(ActionExecutionClient):
        assert self.runtime_url is not None, (
            'Runtime URL is not set. This should never happen.'
        )
-        self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
        if not self.attach_to_existing:
            self.log('info', 'Waiting for runtime to be alive...')
        self._wait_until_alive()
@@ -221,6 +219,7 @@ class RemoteRuntime(ActionExecutionClient):

    def _build_runtime(self) -> None:
        self.log('debug', f'Building RemoteRuntime config:\n{self.config}')
+        self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
        response = self._send_runtime_api_request(
            'GET',
            f'{self.config.sandbox.remote_runtime_api_url}/registry_prefix',
@@ -265,6 +264,7 @@ class RemoteRuntime(ActionExecutionClient):

    def _start_runtime(self) -> None:
        # Prepare the request body for the /start endpoint
+        self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
        command = self.get_action_execution_server_startup_command()
        environment: dict[str, str] = {}
        if self.config.debug or os.environ.get('DEBUG', 'false').lower() == 'true':
--- a/openhands/runtime/impl/runloop/runloop_runtime.py
+++ b/openhands/runtime/impl/runloop/runloop_runtime.py
@@ -9,6 +9,7 @@ from runloop_api_client.types.shared_params import LaunchParameters
 from openhands.core.config import OpenHandsConfig
 from openhands.core.logger import openhands_logger as logger
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -36,6 +37,8 @@ class RunloopRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.runloop_api_key is not None, 'Runloop API key is required'
        self.devbox: DevboxView | None = None
@@ -53,6 +56,8 @@ class RunloopRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )
        # Buffer for container logs
        self._vscode_url: str | None = None
--- a/openhands/runtime/utils/bash.py
+++ b/openhands/runtime/utils/bash.py
@@ -1,5 +1,6 @@
 import os
 import re
+import subprocess
 import time
 import traceback
 import uuid
@@ -167,6 +168,7 @@ class BashCommandStatus(Enum):
    COMPLETED = 'completed'
    NO_CHANGE_TIMEOUT = 'no_change_timeout'
    HARD_TIMEOUT = 'hard_timeout'
+    INTERRUPTED = 'interrupted'


 def _remove_command_prefix(command_output: str, command: str) -> str:
@@ -654,3 +656,247 @@ class BashSession:
            logger.debug(f'SLEEPING for {self.POLL_INTERVAL} seconds for next poll')
            time.sleep(self.POLL_INTERVAL)
        raise RuntimeError('Bash session was likely interrupted...')
+
+
+class SubprocessBashSession(BashSession):
+    """
+    A bash session implementation using individual subprocess calls
+    instead of tmux, while maintaining the same interface as BashSession.
+    """
+
+    def __init__(
+        self,
+        work_dir: str,
+        username: str | None = None,
+        no_change_timeout_seconds: int = 30,
+        max_memory_mb: int | None = None,
+        allow_multiple_commands: bool = True,
+    ):
+        # Initialize parent class attributes
+        self.work_dir = work_dir
+        self.username = username
+        self.no_change_timeout_seconds = no_change_timeout_seconds
+        self.max_memory_mb = max_memory_mb
+        self.allow_multiple_commands = allow_multiple_commands
+        self._initialized = False
+
+        # Set initial state
+        self.prev_status: BashCommandStatus | None = None
+        self.prev_output: str = ''
+        self._closed: bool = False
+        self._cwd = os.path.abspath(self.work_dir)
+        self._current_process: subprocess.Popen | None = None
+
+    def initialize(self) -> None:
+        """Initialize the bash session."""
+        logger.debug(
+            f'Initializing subprocess bash session with work dir: {self.work_dir}'
+        )
+
+        # Set initial state
+        self._initialized = True
+
+        logger.debug(
+            f'Subprocess bash session initialized with work dir: {self.work_dir}'
+        )
+
+    def close(self) -> None:
+        """Clean up the session."""
+        if self._current_process and self._current_process.poll() is None:
+            self._current_process.terminate()
+            try:
+                self._current_process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                self._current_process.kill()
+        self._closed = True
+
+    def interrupt(self) -> None:
+        """Interrupt the currently running command (Ctrl+C equivalent)."""
+        if self._current_process and self._current_process.poll() is None:
+            logger.debug('Interrupting current command')
+            self._current_process.terminate()
+            self.prev_status = BashCommandStatus.INTERRUPTED
+
+    def get_status(self) -> BashCommandStatus | None:
+        """Get the status of the last command."""
+        return self.prev_status
+
+    def is_running(self) -> bool:
+        """Check if a command is currently running."""
+        return (
+            self._current_process is not None and self._current_process.poll() is None
+        )
+
+    def execute(self, action: CmdRunAction) -> CmdOutputObservation | ErrorObservation:
+        """Execute a command in the bash session using subprocess."""
+        from openhands.events.observation.commands import CmdOutputMetadata
+
+        if not self._initialized:
+            return ErrorObservation(content='Subprocess bash session not initialized')
+
+        command = action.command
+
+        # Handle interactive input (not supported in subprocess mode)
+        if action.is_input:
+            return ErrorObservation(
+                content=f"Subprocess bash session does not support interactive input. The command '{command}' was not sent to any process."
+            )
+
+        # Handle empty commands
+        if command == '':
+            return CmdOutputObservation(
+                content='ERROR: No command provided.',
+                command='',
+                metadata=CmdOutputMetadata(),
+            )
+
+        # Check for multiple commands based on configuration
+        if not self.allow_multiple_commands:
+            splited_commands = split_bash_commands(command)
+            if len(splited_commands) > 1:
+                return ErrorObservation(
+                    content=(
+                        f'ERROR: Cannot execute multiple commands at once.\n'
+                        f'Please run each command separately OR chain them into a single command via && or ;\n'
+                        f'Provided commands:\n{"\n".join(f"({i + 1}) {cmd}" for i, cmd in enumerate(splited_commands))}'
+                    )
+                )
+
+        start_time = time.time()
+
+        try:
+            # Prepare the command
+            escaped_command = escape_bash_special_chars(command)
+            logger.debug(f'EXECUTING COMMAND: {escaped_command!r}')
+
+            # Set effective timeout
+            effective_timeout = action.timeout if action.timeout else 30.0
+
+            # Check if this is a background command (ends with &)
+            is_background = command.strip().endswith('&')
+
+            # Execute the command using subprocess
+            self._current_process = subprocess.Popen(
+                ['bash', '-c', escaped_command],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+                cwd=self._cwd,
+            )
+
+            try:
+                if is_background:
+                    # For background commands, wait a short time to see if bash exits quickly
+                    # Background commands should cause bash to return immediately with exit code 0
+                    try:
+                        stdout, stderr = self._current_process.communicate(timeout=0.5)
+                        exit_code = self._current_process.returncode
+                    except subprocess.TimeoutExpired:
+                        # If bash doesn't exit quickly, it means the command is still running
+                        # This shouldn't happen for proper background commands, but handle it
+                        self._current_process.kill()
+                        stdout, stderr = self._current_process.communicate()
+                        exit_code = 0  # Treat as successful background launch
+                else:
+                    stdout, stderr = self._current_process.communicate(
+                        timeout=effective_timeout
+                    )
+                    exit_code = self._current_process.returncode
+
+                # Check if process was interrupted (negative exit codes indicate signals)
+                if exit_code < 0:
+                    self.prev_status = BashCommandStatus.INTERRUPTED
+                else:
+                    self.prev_status = BashCommandStatus.COMPLETED
+
+                # Combine output and error
+                combined_output = stdout
+                if stderr:
+                    combined_output += f'\n{stderr}'
+
+                # Update working directory if it's a cd command
+                if command.strip().startswith('cd '):
+                    try:
+                        # Try to get the new working directory
+                        pwd_process = subprocess.run(
+                            ['bash', '-c', f'{escaped_command}; pwd'],
+                            capture_output=True,
+                            text=True,
+                            cwd=self._cwd,
+                            timeout=5,
+                        )
+                        if pwd_process.returncode == 0:
+                            new_cwd = pwd_process.stdout.strip()
+                            if os.path.isdir(new_cwd):
+                                self._cwd = new_cwd
+                    except Exception as e:
+                        logger.debug(f'Failed to update working directory: {e}')
+
+                # Create metadata
+                metadata = CmdOutputMetadata()
+                metadata.exit_code = exit_code
+                metadata.working_dir = self._cwd
+
+                self.prev_output = ''
+
+                return CmdOutputObservation(
+                    content=combined_output.rstrip() if combined_output else '',
+                    command=command,
+                    metadata=metadata,
+                )
+
+            except subprocess.TimeoutExpired:
+                # Handle timeout
+                self._current_process.kill()
+                elapsed_time = time.time() - start_time
+
+                # Try to get partial output
+                try:
+                    stdout, stderr = self._current_process.communicate(timeout=1.0)
+                    partial_output = stdout
+                    if stderr:
+                        partial_output += f'\n{stderr}'
+                except subprocess.TimeoutExpired:
+                    partial_output = ''
+
+                metadata = CmdOutputMetadata()
+                metadata.suffix = (
+                    f'\n[The command timed out after {elapsed_time:.1f} seconds. '
+                    f'{TIMEOUT_MESSAGE_TEMPLATE}]'
+                )
+
+                self.prev_status = BashCommandStatus.HARD_TIMEOUT
+
+                return CmdOutputObservation(
+                    content=partial_output.rstrip() if partial_output else '',
+                    command=command,
+                    metadata=metadata,
+                )
+
+            finally:
+                # Clear current process reference
+                self._current_process = None
+
+        except Exception as e:
+            logger.error(f'Error executing command "{command}": {e}')
+            return ErrorObservation(
+                content=f'Error executing command "{command}": {str(e)}'
+            )
+
+    def _ready_for_next_command(self) -> None:
+        """Reset state for next command."""
+        pass
+
+    def _get_pane_content(self) -> str:
+        """Get current output."""
+        return ''
+
+    @property
+    def cwd(self) -> str:
+        """Get current working directory."""
+        return self._cwd
+
+    @property
+    def initialized(self) -> bool:
+        """Check if the session is initialized."""
+        return self._initialized
--- a/openhands/runtime/utils/edit.py
+++ b/openhands/runtime/utils/edit.py
@@ -305,7 +305,6 @@ class FileEditRuntimeMixin(FileEditRuntimeInterface):
            return ErrorObservation(error_msg)

        content_to_edit = '\n'.join(old_file_lines[start_idx:end_idx])
-        self.draft_editor_llm.reset()
        _edited_content = get_new_file_contents(
            self.draft_editor_llm, content_to_edit, action.content
        )
--- a/openhands/server/session/agent_session.py
+++ b/openhands/server/session/agent_session.py
@@ -158,7 +158,7 @@ class AgentSession:
            # NOTE: this needs to happen before controller is created
            # so MCP tools can be included into the SystemMessageAction
            if self.runtime and runtime_connected and agent.config.enable_mcp:
-                await add_mcp_tools_to_agent(agent, self.runtime, self.memory, config)
+                await add_mcp_tools_to_agent(agent, self.runtime, self.memory)

            if replay_json:
                initial_message = self._run_replay(
@@ -232,8 +232,7 @@ class AgentSession:
        if self.event_stream is not None:
            self.event_stream.close()
        if self.controller is not None:
-            end_state = self.controller.get_state()
-            end_state.save_to_session(self.sid, self.file_store, self.user_id)
+            self.controller.save_state()
            await self.controller.close()
        if self.runtime is not None:
            EXECUTOR.submit(self.runtime.close)
@@ -366,6 +365,7 @@ class AgentSession:
                headless_mode=False,
                attach_to_existing=False,
                env_vars=env_vars,
+                git_provider_tokens=git_provider_tokens,
            )

        # FIXME: this sleep is a terrible hack.
@@ -438,10 +438,12 @@ class AgentSession:
        initial_state = self._maybe_restore_state()
        controller = AgentController(
            sid=self.sid,
+            user_id=self.user_id,
+            file_store=self.file_store,
            event_stream=self.event_stream,
            agent=agent,
-            max_iterations=int(max_iterations),
-            max_budget_per_task=max_budget_per_task,
+            iteration_delta=int(max_iterations),
+            budget_per_task_delta=max_budget_per_task,
            agent_to_llm_config=agent_to_llm_config,
            agent_configs=agent_configs,
            confirmation_mode=confirmation_mode,
--- a/openhands/utils/conversation_summary.py
+++ b/openhands/utils/conversation_summary.py
@@ -95,7 +95,7 @@ async def auto_generate_title(

        # Find the first user message
        first_user_message = None
-        for event in event_stream.get_events():
+        for event in event_stream.search_events():
            if (
                event.source == EventSource.USER
                and isinstance(event, MessageAction)
--- a/openhands/utils/prompt.py
+++ b/openhands/utils/prompt.py
@@ -127,5 +127,5 @@ class PromptManager:
            None,
        )
        if latest_user_message:
-            reminder_text = f'\n\nENVIRONMENT REMINDER: You have {state.max_iterations - state.iteration} turns left to complete the task. When finished reply with <finish></finish>.'
+            reminder_text = f'\n\nENVIRONMENT REMINDER: You have {state.iteration_flag.max_value - state.iteration_flag.current_value} turns left to complete the task. When finished reply with <finish></finish>.'
            latest_user_message.content.append(TextContent(text=reminder_text))
--- a/poetry.lock
+++ b/poetry.lock
@@ -1,4 +1,50 @@
-# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.
+
+[[package]]
+name = "aioboto3"
+version = "14.3.0"
+description = "Async boto3 wrapper"
+optional = false
+python-versions = "<4.0,>=3.8"
+groups = ["main"]
+files = [
+    {file = "aioboto3-14.3.0-py3-none-any.whl", hash = "sha256:aec5de94e9edc1ffbdd58eead38a37f00ddac59a519db749a910c20b7b81bca7"},
+    {file = "aioboto3-14.3.0.tar.gz", hash = "sha256:1d18f88bb56835c607b62bb6cb907754d717bedde3ddfff6935727cb48a80135"},
+]
+
+[package.dependencies]
+aiobotocore = {version = "2.22.0", extras = ["boto3"]}
+aiofiles = ">=23.2.1"
+
+[package.extras]
+chalice = ["chalice (>=1.24.0)"]
+s3cse = ["cryptography (>=44.0.1)"]
+
+[[package]]
+name = "aiobotocore"
+version = "2.22.0"
+description = "Async client for aws services using botocore and aiohttp"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "aiobotocore-2.22.0-py3-none-any.whl", hash = "sha256:b4e6306f79df9d81daff1f9d63189a2dbee4b77ce3ab937304834e35eaaeeccf"},
+    {file = "aiobotocore-2.22.0.tar.gz", hash = "sha256:11091477266b75c2b5d28421c1f2bc9a87d175d0b8619cb830805e7a113a170b"},
+]
+
+[package.dependencies]
+aiohttp = ">=3.9.2,<4.0.0"
+aioitertools = ">=0.5.1,<1.0.0"
+boto3 = {version = ">=1.37.2,<1.37.4", optional = true, markers = "extra == \"boto3\""}
+botocore = ">=1.37.2,<1.37.4"
+jmespath = ">=0.7.1,<2.0.0"
+multidict = ">=6.0.0,<7.0.0"
+python-dateutil = ">=2.1,<3.0.0"
+wrapt = ">=1.10.10,<2.0.0"
+
+[package.extras]
+awscli = ["awscli (>=1.38.2,<1.38.4)"]
+boto3 = ["boto3 (>=1.37.2,<1.37.4)"]

 [[package]]
 name = "aiofiles"
@@ -147,6 +193,22 @@ files = [
 [package.dependencies]
 aiohttp = "*"

+[[package]]
+name = "aioitertools"
+version = "0.12.0"
+description = "itertools and builtins for AsyncIO and mixed iterables"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "aioitertools-0.12.0-py3-none-any.whl", hash = "sha256:fc1f5fac3d737354de8831cbba3eb04f79dd649d8f3afb4c5b114925e662a796"},
+    {file = "aioitertools-0.12.0.tar.gz", hash = "sha256:c2a9055b4fbb7705f561b9d86053e8af5d10cc845d22c32008c43490b2d8dd6b"},
+]
+
+[package.extras]
+dev = ["attribution (==1.8.0)", "black (==24.8.0)", "build (>=1.2)", "coverage (==7.6.1)", "flake8 (==7.1.1)", "flit (==3.9.0)", "mypy (==1.11.2)", "ufmt (==2.7.1)", "usort (==1.0.8.post1)"]
+docs = ["sphinx (==8.0.2)", "sphinx-mdinclude (==0.6.2)"]
+
 [[package]]
 name = "aiolimiter"
 version = "1.2.1"
@@ -400,7 +462,7 @@ description = "LTS Port of Python audioop"
 optional = false
 python-versions = ">=3.13"
 groups = ["main"]
-markers = "python_version >= \"3.13\""
+markers = "python_version == \"3.13\""
 files = [
    {file = "audioop_lts-0.2.1-cp313-abi3-macosx_10_13_universal2.whl", hash = "sha256:fd1345ae99e17e6910f47ce7d52673c6a1a70820d78b67de1b7abb3af29c426a"},
    {file = "audioop_lts-0.2.1-cp313-abi3-macosx_10_13_x86_64.whl", hash = "sha256:e175350da05d2087e12cea8e72a70a1a8b14a17e92ed2022952a4419689ede5e"},
@@ -581,20 +643,20 @@ files = [

 [[package]]
 name = "boto3"
-version = "1.38.36"
+version = "1.37.3"
 description = "The AWS SDK for Python"
 optional = false
-python-versions = ">=3.9"
+python-versions = ">=3.8"
 groups = ["main"]
 files = [
-    {file = "boto3-1.38.36-py3-none-any.whl", hash = "sha256:34c27d7317cadb62c0e9856e5d5aa0271ef47202d340584831048bc7ac904136"},
-    {file = "boto3-1.38.36.tar.gz", hash = "sha256:efe0aaa060f8fedd76e5c942055f051aee0432fc722d79d8830a9fd9db83593e"},
+    {file = "boto3-1.37.3-py3-none-any.whl", hash = "sha256:2063b40af99fd02f6228ff52397b552ff3353831edaf8d25cc04801827ab9794"},
+    {file = "boto3-1.37.3.tar.gz", hash = "sha256:21f3ce0ef111297e63a6eb998a25197b8c10982970c320d4c6e8db08be2157be"},
 ]

 [package.dependencies]
-botocore = ">=1.38.36,<1.39.0"
+botocore = ">=1.37.3,<1.38.0"
 jmespath = ">=0.7.1,<2.0.0"
-s3transfer = ">=0.13.0,<0.14.0"
+s3transfer = ">=0.11.0,<0.12.0"

 [package.extras]
 crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
@@ -1028,14 +1090,14 @@ xray = ["mypy-boto3-xray (>=1.38.0,<1.39.0)"]

 [[package]]
 name = "botocore"
-version = "1.38.36"
+version = "1.37.3"
 description = "Low-level, data-driven core of boto 3."
 optional = false
-python-versions = ">=3.9"
+python-versions = ">=3.8"
 groups = ["main"]
 files = [
-    {file = "botocore-1.38.36-py3-none-any.whl", hash = "sha256:b6a50b853f6d23af9edfed89a59800c6bc1687a947cdd3492879f7d64e002d30"},
-    {file = "botocore-1.38.36.tar.gz", hash = "sha256:4a1ced1a4218bdff0ed5b46abb54570d473154ddefafa5d121a8d96e4b76ebc1"},
+    {file = "botocore-1.37.3-py3-none-any.whl", hash = "sha256:d01bd3bf4c80e61fa88d636ad9f5c9f60a551d71549b481386c6b4efe0bb2b2e"},
+    {file = "botocore-1.37.3.tar.gz", hash = "sha256:fe8403eb55a88faf9b0f9da6615e5bee7be056d75e17af66c3c8f0a3b0648da4"},
 ]

 [package.dependencies]
@@ -1580,7 +1642,7 @@ files = [
    {file = "colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6"},
    {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"},
 ]
-markers = {main = "platform_system == \"Windows\" or sys_platform == \"win32\" or os_name == \"nt\"", dev = "os_name == \"nt\" or sys_platform == \"win32\"", runtime = "sys_platform == \"win32\"", test = "platform_system == \"Windows\" or sys_platform == \"win32\""}
+markers = {main = "platform_system == \"Windows\" or os_name == \"nt\" or sys_platform == \"win32\"", dev = "os_name == \"nt\" or sys_platform == \"win32\"", runtime = "sys_platform == \"win32\"", test = "platform_system == \"Windows\" or sys_platform == \"win32\""}

 [[package]]
 name = "comm"
@@ -1932,16 +1994,48 @@ tests-numpy2 = ["Pillow (>=9.4.0)", "absl-py", "decorator", "elasticsearch (<8.0
 torch = ["torch"]
 vision = ["Pillow (>=9.4.0)"]

+[[package]]
+name = "daytona"
+version = "0.21.1"
+description = "Python SDK for Daytona"
+optional = false
+python-versions = ">=3.7"
+groups = ["main"]
+files = [
+    {file = "daytona-0.21.1-py3-none-any.whl", hash = "sha256:1ce6b352f52ef92e667098b7bdaa60c22ffbfb8e686a8cbd12418bf7698ac834"},
+    {file = "daytona-0.21.1.tar.gz", hash = "sha256:01d83dd2b627f87e82491fb97f41845768d75c33f0767eaa44f6e8378bd58e60"},
+]
+
+[package.dependencies]
+aioboto3 = ">=14.0.0,<15.0.0"
+aiofiles = ">=24.1.0,<24.2.0"
+aiohttp = ">=3.12.0,<4.0.0"
+aiohttp_retry = ">=2.9.0,<3.0.0"
+boto3 = ">=1.0.0,<2.0.0"
+daytona_api_client = ">=0.21.0,<0.22.0"
+daytona_api_client_async = ">=0.21.0,<0.22.0"
+Deprecated = ">=1.2.18,<2.0.0"
+environs = ">=9.5.0,<10.0.0"
+httpx = ">=0.28.0,<0.29.0"
+marshmallow = ">=3.19.0,<4.0.0"
+pydantic = ">=2.4.2,<3.0.0"
+python-dateutil = ">=2.8.2,<3.0.0"
+toml = ">=0.10.0,<0.11.0"
+urllib3 = ">=2.0.7,<3.0.0"
+
+[package.extras]
+dev = ["black[jupyter] (>=23.1.0,<24.0.0)", "build (>=1.0.3)", "isort (>=5.10.0,<6.0.0)", "matplotlib (>=3.10.0,<3.11.0)", "nbqa (>=1.9.1,<2.0.0)", "pydoc-markdown (>=4.8.2)", "pylint (>=3.3.4,<4.0.0)", "setuptools (>=68.0.0)", "twine (>=4.0.2)", "unasync (>=0.6.0,<0.7.0)", "wheel (>=0.41.2)"]
+
 [[package]]
 name = "daytona-api-client"
-version = "0.20.1"
+version = "0.21.0"
 description = "Daytona"
 optional = false
 python-versions = "*"
 groups = ["main"]
 files = [
-    {file = "daytona_api_client-0.20.1-py3-none-any.whl", hash = "sha256:4d5023108013365eba76bd0bd4704f30dee54c13e2ac5b62e8c88bcd4af5db92"},
-    {file = "daytona_api_client-0.20.1.tar.gz", hash = "sha256:ff2061f7e7dc9c935a9087216600be277cb9cf6b8c1eecdfe333ef30d6b208fd"},
+    {file = "daytona_api_client-0.21.0-py3-none-any.whl", hash = "sha256:a8ff1f0fb397368dbd6ddb224c28d679e599c657eab2ec5821cf0c972a60229a"},
+    {file = "daytona_api_client-0.21.0.tar.gz", hash = "sha256:92d591c5a1750a827b5850425ce483441609b72b05d35a618d5353fbbba50bca"},
 ]

 [package.dependencies]
@@ -1952,14 +2046,14 @@ urllib3 = ">=1.25.3,<3.0.0"

 [[package]]
 name = "daytona-api-client-async"
-version = "0.20.1"
+version = "0.21.0"
 description = "Daytona"
 optional = false
 python-versions = "*"
 groups = ["main"]
 files = [
-    {file = "daytona_api_client_async-0.20.1-py3-none-any.whl", hash = "sha256:f24e06e3ab6e554214ed064f1b4c8723356c76c14c69de9a73a6cad60a386127"},
-    {file = "daytona_api_client_async-0.20.1.tar.gz", hash = "sha256:043045cb173b0b53416c19a9e276124a5c4fe14209f409a8572ef1975240e53f"},
+    {file = "daytona_api_client_async-0.21.0-py3-none-any.whl", hash = "sha256:f5731963d0dd6c1e207b92bdc7f5b59952d3365444bc9dc8b013d77a4dddf377"},
+    {file = "daytona_api_client_async-0.21.0.tar.gz", hash = "sha256:08a22c0d1616f82efa8d157d7be6c432554fd43d75560725c4e0cef0228607d6"},
 ]

 [package.dependencies]
@@ -1970,35 +2064,6 @@ python-dateutil = ">=2.8.2"
 typing-extensions = ">=4.7.1"
 urllib3 = ">=1.25.3,<3.0.0"

-[[package]]
-name = "daytona-sdk"
-version = "0.20.0"
-description = "Python SDK for Daytona"
-optional = false
-python-versions = ">=3.7"
-groups = ["main"]
-files = [
-    {file = "daytona_sdk-0.20.0-py3-none-any.whl", hash = "sha256:7919acfff21c072a0ea826a3b250c0d9c5765e58c054d2bd5b91ea76f0df4709"},
-    {file = "daytona_sdk-0.20.0.tar.gz", hash = "sha256:b5c13b999fcce1e6460974dbbb0dd336d8ca1e96d6a25afe705f476fba4e6f11"},
-]
-
-[package.dependencies]
-aiofiles = ">=24.1.0,<24.2.0"
-aiohttp = ">=3.12.0,<4.0.0"
-aiohttp_retry = ">=2.9.0,<3.0.0"
-daytona_api_client = ">=0.20.0,<0.21.0"
-daytona_api_client_async = ">=0.20.0,<0.21.0"
-Deprecated = ">=1.2.18,<2.0.0"
-environs = ">=9.5.0,<10.0.0"
-httpx = ">=0.28.0,<0.29.0"
-marshmallow = ">=3.19.0,<4.0.0"
-pydantic = ">=2.4.2,<3.0.0"
-python-dateutil = ">=2.8.2,<3.0.0"
-urllib3 = ">=2.0.7,<3.0.0"
-
-[package.extras]
-dev = ["black[jupyter] (>=23.1.0,<24.0.0)", "build (>=1.0.3)", "isort (>=5.10.0,<6.0.0)", "matplotlib (>=3.10.0,<3.11.0)", "nbqa (>=1.9.1,<2.0.0)", "pydoc-markdown (>=4.8.2)", "pylint (>=3.3.4,<4.0.0)", "setuptools (>=68.0.0)", "twine (>=4.0.2)", "unasync (>=0.6.0,<0.7.0)", "wheel (>=0.41.2)"]
-
 [[package]]
 name = "debugpy"
 version = "1.8.14"
@@ -2974,8 +3039,8 @@ files = [
 google-api-core = {version = ">=1.34.1,<2.0.dev0 || >=2.11.dev0,<3.0.0dev", extras = ["grpc"]}
 google-auth = ">=2.14.1,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0dev"
 proto-plus = [
-    {version = ">=1.22.3,<2.0.0dev"},
    {version = ">=1.25.0,<2.0.0dev", markers = "python_version >= \"3.13\""},
+    {version = ">=1.22.3,<2.0.0dev"},
 ]
 protobuf = ">=3.20.2,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<6.0.0dev"

@@ -2997,8 +3062,8 @@ googleapis-common-protos = ">=1.56.2,<2.0.0"
 grpcio = {version = ">=1.49.1,<2.0.0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
 grpcio-status = {version = ">=1.49.1,<2.0.0", optional = true, markers = "python_version >= \"3.11\" and extra == \"grpc\""}
 proto-plus = [
-    {version = ">=1.22.3,<2.0.0"},
    {version = ">=1.25.0,<2.0.0", markers = "python_version >= \"3.13\""},
+    {version = ">=1.22.3,<2.0.0"},
 ]
 protobuf = ">=3.19.5,<3.20.0 || >3.20.0,<3.20.1 || >3.20.1,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<7.0.0"
 requests = ">=2.18.0,<3.0.0"
@@ -3216,8 +3281,8 @@ google-api-core = {version = ">=1.34.1,<2.0.dev0 || >=2.11.dev0,<3.0.0", extras
 google-auth = ">=2.14.1,<2.24.0 || >2.24.0,<2.25.0 || >2.25.0,<3.0.0"
 grpc-google-iam-v1 = ">=0.14.0,<1.0.0"
 proto-plus = [
-    {version = ">=1.22.3,<2.0.0"},
    {version = ">=1.25.0,<2.0.0", markers = "python_version >= \"3.13\""},
+    {version = ">=1.22.3,<2.0.0"},
 ]
 protobuf = ">=3.20.2,<4.21.0 || >4.21.0,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<7.0.0"

@@ -6479,8 +6544,8 @@ files = [
 [package.dependencies]
 googleapis-common-protos = ">=1.52,<2.0"
 grpcio = [
-    {version = ">=1.63.2,<2.0.0", markers = "python_version < \"3.13\""},
    {version = ">=1.66.2,<2.0.0", markers = "python_version >= \"3.13\""},
+    {version = ">=1.63.2,<2.0.0", markers = "python_version < \"3.13\""},
 ]
 opentelemetry-api = ">=1.15,<2.0"
 opentelemetry-exporter-otlp-proto-common = "1.34.1"
@@ -8967,21 +9032,21 @@ typing-extensions = ">=4.10,<5"

 [[package]]
 name = "s3transfer"
-version = "0.13.0"
+version = "0.11.3"
 description = "An Amazon S3 Transfer Manager"
 optional = false
-python-versions = ">=3.9"
+python-versions = ">=3.8"
 groups = ["main"]
 files = [
-    {file = "s3transfer-0.13.0-py3-none-any.whl", hash = "sha256:0148ef34d6dd964d0d8cf4311b2b21c474693e57c2e069ec708ce043d2b527be"},
-    {file = "s3transfer-0.13.0.tar.gz", hash = "sha256:f5e6db74eb7776a37208001113ea7aa97695368242b364d73e91c981ac522177"},
+    {file = "s3transfer-0.11.3-py3-none-any.whl", hash = "sha256:ca855bdeb885174b5ffa95b9913622459d4ad8e331fc98eb01e6d5eb6a30655d"},
+    {file = "s3transfer-0.11.3.tar.gz", hash = "sha256:edae4977e3a122445660c7c114bba949f9d191bae3b34a096f18a1c8c354527a"},
 ]

 [package.dependencies]
-botocore = ">=1.37.4,<2.0a.0"
+botocore = ">=1.36.0,<2.0a.0"

 [package.extras]
-crt = ["botocore[crt] (>=1.37.4,<2.0a.0)"]
+crt = ["botocore[crt] (>=1.36.0,<2.0a.0)"]

 [[package]]
 name = "sacrebleu"
@@ -9243,7 +9308,6 @@ files = [
    {file = "setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922"},
    {file = "setuptools-80.9.0.tar.gz", hash = "sha256:f36b47402ecde768dbfafc46e8e4207b4360c654f1f3bb84475f0a28628fb19c"},
 ]
-markers = {evaluation = "platform_system == \"Linux\" and platform_machine == \"x86_64\""}

 [package.extras]
 check = ["pytest-checkdocs (>=2.4)", "pytest-ruff (>=0.2.1) ; sys_platform != \"cygwin\"", "ruff (>=0.8.0) ; sys_platform != \"cygwin\""]
@@ -9486,7 +9550,7 @@ description = "Standard library aifc redistribution. \"dead battery\"."
 optional = false
 python-versions = "*"
 groups = ["main"]
-markers = "python_version >= \"3.13\""
+markers = "python_version == \"3.13\""
 files = [
    {file = "standard_aifc-3.13.0-py3-none-any.whl", hash = "sha256:f7ae09cc57de1224a0dd8e3eb8f73830be7c3d0bc485de4c1f82b4a7f645ac66"},
    {file = "standard_aifc-3.13.0.tar.gz", hash = "sha256:64e249c7cb4b3daf2fdba4e95721f811bde8bdfc43ad9f936589b7bb2fae2e43"},
@@ -9503,7 +9567,7 @@ description = "Standard library chunk redistribution. \"dead battery\"."
 optional = false
 python-versions = "*"
 groups = ["main"]
-markers = "python_version >= \"3.13\""
+markers = "python_version == \"3.13\""
 files = [
    {file = "standard_chunk-3.13.0-py3-none-any.whl", hash = "sha256:17880a26c285189c644bd5bd8f8ed2bdb795d216e3293e6dbe55bbd848e2982c"},
    {file = "standard_chunk-3.13.0.tar.gz", hash = "sha256:4ac345d37d7e686d2755e01836b8d98eda0d1a3ee90375e597ae43aaf064d654"},
@@ -11665,4 +11729,4 @@ cffi = ["cffi (>=1.11)"]
 [metadata]
 lock-version = "2.1"
 python-versions = "^3.12,<3.14"
-content-hash = "47df4fc76b97147ff31169028edafaf35c1f4e661c7ab74bad48cb0ceea06aba"
+content-hash = "df8217d9808a5a1f5886e0328cbeb5032b20c28a677154888bd010f7bc945cb2"
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ requires = [

 [tool.poetry]
 name = "openhands-ai"
-version = "0.43.0"
+version = "0.44.0"
 description = "OpenHands: Code Less, Make More"
 authors = [ "OpenHands" ]
 license = "MIT"
@@ -80,7 +80,7 @@ bashlex = "^0.18"
 # TODO: These are integrations that should probably be optional
 redis = ">=5.2,<7.0"
 minio = "^7.2.8"
-daytona-sdk = "0.20.0"
+daytona = "0.21.1"
 stripe = ">=11.5,<13.0"
 google-cloud-aiplatform = "*"
 anthropic = { extras = [ "vertex" ], version = "*" }
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/runtime/init.py
+++ b/tests/runtime/init.py
--- a/tests/runtime/test_microagent.py
+++ b/tests/runtime/test_microagent.py
@@ -385,7 +385,6 @@ async def test_add_mcp_tools_from_microagents():
    """Test that add_mcp_tools_to_agent adds tools from microagents."""
    # Import ActionExecutionClient for mocking

-    from openhands.core.config.openhands_config import OpenHandsConfig
    from openhands.runtime.impl.action_execution.action_execution_client import (
        ActionExecutionClient,
    )
@@ -394,10 +393,6 @@ async def test_add_mcp_tools_from_microagents():
    mock_agent = MagicMock()
    mock_runtime = MagicMock(spec=ActionExecutionClient)
    mock_memory = MagicMock()
-    mock_mcp_config = MCPConfig()
-
-    # Create a mock OpenHandsConfig with the MCP config
-    mock_app_config = OpenHandsConfig(mcp=mock_mcp_config, search_api_key=None)

    # Configure the mock memory to return a microagent MCP config
    mock_stdio_server = MCPStdioServerConfig(
@@ -425,9 +420,7 @@ async def test_add_mcp_tools_from_microagents():
        new=AsyncMock(return_value=[mock_tool]),
    ):
        # Call the function with the OpenHandsConfig instead of MCPConfig
-        await add_mcp_tools_to_agent(
-            mock_agent, mock_runtime, mock_memory, mock_app_config
-        )
+        await add_mcp_tools_to_agent(mock_agent, mock_runtime, mock_memory)

        # Verify that the memory's get_microagent_mcp_tools was called
        mock_memory.get_microagent_mcp_tools.assert_called_once()
--- a/tests/unit/test_agent_controller.py
+++ b/tests/unit/test_agent_controller.py
@@ -1,4 +1,5 @@
 import asyncio
+import copy
 from unittest.mock import ANY, AsyncMock, MagicMock, patch
 from uuid import uuid4

@@ -11,7 +12,10 @@ from litellm import (

 from openhands.controller.agent import Agent
 from openhands.controller.agent_controller import AgentController
-from openhands.controller.state.state import State, TrafficControlState
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+)
+from openhands.controller.state.state import State
 from openhands.core.config import OpenHandsConfig
 from openhands.core.config.agent_config import AgentConfig
 from openhands.core.main import run_controller
@@ -128,7 +132,7 @@ async def test_set_agent_state(mock_agent, mock_event_stream):
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -146,7 +150,7 @@ async def test_on_event_message_action(mock_agent, mock_event_stream):
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -163,7 +167,7 @@ async def test_on_event_change_agent_state_action(mock_agent, mock_event_stream)
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -181,7 +185,7 @@ async def test_react_to_exception(mock_agent, mock_event_stream, mock_status_cal
        agent=mock_agent,
        event_stream=mock_event_stream,
        status_callback=mock_status_callback,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -201,7 +205,7 @@ async def test_react_to_content_policy_violation(
        agent=mock_agent,
        event_stream=mock_event_stream,
        status_callback=mock_status_callback,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -256,6 +260,7 @@ async def test_run_controller_with_fatal_error(

    test_event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
    runtime.event_stream = test_event_stream
+    runtime.config = copy.deepcopy(config)

    def on_event_memory(event: Event):
        if isinstance(event, RecallAction):
@@ -287,7 +292,7 @@ async def test_run_controller_with_fatal_error(
    )
    assert len(error_observations) == 1
    error_observation = error_observations[0]
-    assert state.iteration == 3
+    assert state.iteration_flag.current_value == 3
    assert state.agent_state == AgentState.ERROR
    assert state.last_error == 'AgentStuckInLoopError: Agent got stuck in a loop'
    assert (
@@ -323,6 +328,7 @@ async def test_run_controller_stop_with_stuck(

    test_event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
    runtime.event_stream = test_event_stream
+    runtime.config = copy.deepcopy(config)

    def on_event_memory(event: Event):
        if isinstance(event, RecallAction):
@@ -351,7 +357,7 @@ async def test_run_controller_stop_with_stuck(
    for i, event in enumerate(events):
        print(f'event {i}: {event_to_dict(event)}')

-    assert state.iteration == 3
+    assert state.iteration_flag.current_value == 3
    assert len(events) == 12
    # check the eventstream have 4 pairs of repeated actions and observations
    # With the refactored system message handling, we need to adjust the range
@@ -378,24 +384,19 @@ async def test_run_controller_stop_with_stuck(
@pytest.mark.asyncio
 async def test_max_iterations_extension(mock_agent, mock_event_stream):
    # Test with headless_mode=False - should extend max_iterations
-    initial_state = State(max_iterations=10)
-
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=False,
-        initial_state=initial_state,
    )
    controller.state.agent_state = AgentState.RUNNING
-    controller.state.iteration = 10
-    assert controller.state.traffic_control_state == TrafficControlState.NORMAL
+    controller.state.iteration_flag.current_value = 10

    # Trigger throttling by calling _step() when we hit max_iterations
    await controller._step()
-    assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
    assert controller.state.agent_state == AgentState.ERROR

    # Simulate a new user message
@@ -405,28 +406,24 @@ async def test_max_iterations_extension(mock_agent, mock_event_stream):

    # Max iterations should be extended to current iteration + initial max_iterations
    assert (
-        controller.state.max_iterations == 20
+        controller.state.iteration_flag.max_value == 20
    )  # Current iteration (10 initial because _step() should not have been executed) + initial max_iterations (10)
-    assert controller.state.traffic_control_state == TrafficControlState.NORMAL
    assert controller.state.agent_state == AgentState.RUNNING

    # Close the controller to clean up
    await controller.close()

    # Test with headless_mode=True - should NOT extend max_iterations
-    initial_state = State(max_iterations=10)
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
-        initial_state=initial_state,
    )
    controller.state.agent_state = AgentState.RUNNING
-    controller.state.iteration = 10
-    assert controller.state.traffic_control_state == TrafficControlState.NORMAL
+    controller.state.iteration_flag.current_value = 10

    # Simulate a new user message
    message_action = MessageAction(content='Test message')
@@ -434,64 +431,143 @@ async def test_max_iterations_extension(mock_agent, mock_event_stream):
    await send_event_to_controller(controller, message_action)

    # Max iterations should NOT be extended in headless mode
-    assert controller.state.max_iterations == 10  # Original value unchanged
+    assert controller.state.iteration_flag.max_value == 10  # Original value unchanged

    # Trigger throttling by calling _step() when we hit max_iterations
    await controller._step()

-    assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
    assert controller.state.agent_state == AgentState.ERROR
    await controller.close()


@pytest.mark.asyncio
 async def test_step_max_budget(mock_agent, mock_event_stream):
+    # Metrics are always synced with budget flag before
+    metrics = Metrics()
+    metrics.accumulated_cost = 10.1
+    budget_flag = BudgetControlFlag(
+        limit_increase_amount=10, current_value=10.1, max_value=10
+    )
+
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
-        max_budget_per_task=10,
+        iteration_delta=10,
+        budget_per_task_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=False,
+        initial_state=State(budget_flag=budget_flag, metrics=metrics),
    )
    controller.state.agent_state = AgentState.RUNNING
-    controller.state.metrics.accumulated_cost = 10.1
-    assert controller.state.traffic_control_state == TrafficControlState.NORMAL
    await controller._step()
-    assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
    assert controller.state.agent_state == AgentState.ERROR
    await controller.close()


@pytest.mark.asyncio
 async def test_step_max_budget_headless(mock_agent, mock_event_stream):
+    # Metrics are always synced with budget flag before
+    metrics = Metrics()
+    metrics.accumulated_cost = 10.1
+    budget_flag = BudgetControlFlag(
+        limit_increase_amount=10, current_value=10.1, max_value=10
+    )
+
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
-        max_budget_per_task=10,
+        iteration_delta=10,
+        budget_per_task_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
+        initial_state=State(budget_flag=budget_flag, metrics=metrics),
    )
    controller.state.agent_state = AgentState.RUNNING
-    controller.state.metrics.accumulated_cost = 10.1
-    assert controller.state.traffic_control_state == TrafficControlState.NORMAL
    await controller._step()
-    assert controller.state.traffic_control_state == TrafficControlState.THROTTLING
-    # In headless mode, throttling results in an error
    assert controller.state.agent_state == AgentState.ERROR
    await controller.close()


+@pytest.mark.asyncio
+async def test_budget_reset_on_continue(mock_agent, mock_event_stream):
+    """Test that when a user continues after hitting the budget limit:
+    1. Error is thrown when budget cap is exceeded
+    2. LLM budget does not reset when user continues
+    3. Budget is extended by adding the initial budget cap to the current accumulated cost
+    """
+
+    # Create a real Metrics instance shared between controller state and llm
+    metrics = Metrics()
+    metrics.accumulated_cost = 6.0
+
+    initial_budget = 5.0
+
+    initial_state = State(
+        metrics=metrics,
+        budget_flag=BudgetControlFlag(
+            limit_increase_amount=initial_budget,
+            current_value=6.0,
+            max_value=initial_budget,
+        ),
+    )
+
+    # Create controller with budget cap
+    controller = AgentController(
+        agent=mock_agent,
+        event_stream=mock_event_stream,
+        iteration_delta=10,
+        budget_per_task_delta=initial_budget,
+        sid='test',
+        confirmation_mode=False,
+        headless_mode=False,
+        initial_state=initial_state,
+    )
+
+    # Set up initial state
+    controller.state.agent_state = AgentState.RUNNING
+
+    # Set up metrics to simulate having spent more than the budget
+    assert controller.state.budget_flag.current_value == 6.0
+    assert controller.agent.llm.metrics.accumulated_cost == 6.0
+
+    # Trigger budget limit
+    await controller._step()
+
+    # Verify budget limit was hit and error was thrown
+    assert controller.state.agent_state == AgentState.ERROR
+    assert 'budget' in controller.state.last_error.lower()
+
+    # Now set the agent state to RUNNING (simulating user clicking "continue")
+    await controller.set_agent_state_to(AgentState.RUNNING)
+
+    # Now simulate user sending a message
+    message_action = MessageAction(content='Please continue')
+    message_action._source = EventSource.USER
+    await controller._on_event(message_action)
+
+    # Verify budget cap was extended by adding initial budget to current accumulated cost
+    # accumulated cost (6.0) + initial budget (5.0) = 11.0
+    assert controller.state.budget_flag.max_value == 11.0
+
+    # Verify LLM metrics were NOT reset - they should still be 6.0
+    assert controller.agent.llm.metrics.accumulated_cost == 6.0
+
+    # The controller state metrics are same as llm metrics
+    assert controller.state.metrics.accumulated_cost == 6.0
+
+    # Verify traffic control state was reset
+    await controller.close()
+
+
@pytest.mark.asyncio
 async def test_reset_with_pending_action_no_observation(mock_agent, mock_event_stream):
    """Test reset() when there's a pending action with tool call metadata but no observation."""
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -540,7 +616,7 @@ async def test_reset_with_pending_action_existing_observation(
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -582,7 +658,7 @@ async def test_reset_without_pending_action(mock_agent, mock_event_stream):
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -613,7 +689,7 @@ async def test_reset_with_pending_action_no_metadata(
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -662,6 +738,8 @@ async def test_run_controller_max_iterations_has_metrics(
    mock_agent.llm.metrics = Metrics()
    mock_agent.llm.config = config.get_llm_config()

+    step_count = 0
+
    def agent_step_fn(state):
        print(f'agent_step_fn received state: {state}')
        # Mock the cost of the LLM
@@ -669,7 +747,9 @@ async def test_run_controller_max_iterations_has_metrics(
        print(
            f'mock_agent.llm.metrics.accumulated_cost: {mock_agent.llm.metrics.accumulated_cost}'
        )
-        return CmdRunAction(command='ls')
+        nonlocal step_count
+        step_count += 1
+        return CmdRunAction(command=f'ls {step_count}')

    mock_agent.step = agent_step_fn

@@ -685,6 +765,7 @@ async def test_run_controller_max_iterations_has_metrics(

    event_stream.subscribe(EventStreamSubscriber.RUNTIME, on_event, str(uuid4()))
    runtime.event_stream = event_stream
+    runtime.config = copy.deepcopy(config)

    def on_event_memory(event: Event):
        if isinstance(event, RecallAction):
@@ -706,11 +787,13 @@ async def test_run_controller_max_iterations_has_metrics(
        fake_user_response_fn=lambda _: 'repeat',
        memory=mock_memory,
    )
-    assert state.iteration == 3
+
+    state.metrics = mock_agent.llm.metrics
+    assert state.iteration_flag.current_value == 3
    assert state.agent_state == AgentState.ERROR
    assert (
        state.last_error
-        == 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 3, max iteration: 3'
+        == 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
    )
    error_observations = test_event_stream.get_matching_events(
        reverse=True, limit=1, event_types=(AgentStateChangedObservation)
@@ -720,7 +803,7 @@ async def test_run_controller_max_iterations_has_metrics(

    assert (
        error_observation.reason
-        == 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 3, max iteration: 3'
+        == 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
    )

    assert state.metrics.accumulated_cost == 10.0 * 3, (
@@ -734,12 +817,19 @@ async def test_notify_on_llm_retry(mock_agent, mock_event_stream, mock_status_ca
        agent=mock_agent,
        event_stream=mock_event_stream,
        status_callback=mock_status_callback,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
    )
-    controller._notify_on_llm_retry(1, 2)
+
+    def notify_on_llm_retry(attempt, max_attempts):
+        controller.status_callback('info', 'STATUS$LLM_RETRY', ANY)
+
+    # Attach the retry listener to the agent's LLM
+    controller.agent.llm.retry_listener = notify_on_llm_retry
+
+    controller.agent.llm.retry_listener(1, 2)
    controller.status_callback.assert_called_once_with('info', 'STATUS$LLM_RETRY', ANY)
    await controller.close()

@@ -797,7 +887,9 @@ async def test_context_window_exceeded_error_handling(
    test_event_stream.subscribe(
        EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
    )
+    config = OpenHandsConfig(max_iterations=max_iterations)
    mock_runtime.event_stream = test_event_stream
+    mock_runtime.config = copy.deepcopy(config)

    # Now we can run the controller for a fixed number of steps. Since the step
    # state is set to error out before then, if this terminates and we have a
@@ -805,7 +897,7 @@ async def test_context_window_exceeded_error_handling(
    # handles the truncation correctly.
    final_state = await asyncio.wait_for(
        run_controller(
-            config=OpenHandsConfig(max_iterations=max_iterations),
+            config=config,
            initial_user_action=MessageAction(content='INITIAL'),
            runtime=mock_runtime,
            sid='test',
@@ -941,11 +1033,13 @@ async def test_run_controller_with_context_window_exceeded_with_truncation(
        EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
    )
    mock_runtime.event_stream = test_event_stream
+    config = OpenHandsConfig(max_iterations=5)
+    mock_runtime.config = copy.deepcopy(config)

    try:
        state = await asyncio.wait_for(
            run_controller(
-                config=OpenHandsConfig(max_iterations=5),
+                config=config,
                initial_user_action=MessageAction(content='INITIAL'),
                runtime=mock_runtime,
                sid='test',
@@ -965,11 +1059,11 @@ async def test_run_controller_with_context_window_exceeded_with_truncation(

    # Hitting the iteration limit indicates the controller is failing for the
    # expected reason
-    assert state.iteration == 5
+    assert state.iteration_flag.current_value == 5
    assert state.agent_state == AgentState.ERROR
    assert (
        state.last_error
-        == 'RuntimeError: Agent reached maximum iteration in headless mode. Current iteration: 5, max iteration: 5'
+        == 'RuntimeError: Agent reached maximum iteration. Current iteration: 5, max iteration: 5'
    )

    # Check that the context window exceeded error was raised during the run
@@ -1018,10 +1112,12 @@ async def test_run_controller_with_context_window_exceeded_without_truncation(
        EventStreamSubscriber.MEMORY, on_event_memory, str(uuid4())
    )
    mock_runtime.event_stream = test_event_stream
+    config = OpenHandsConfig(max_iterations=3)
+    mock_runtime.config = copy.deepcopy(config)
    try:
        state = await asyncio.wait_for(
            run_controller(
-                config=OpenHandsConfig(max_iterations=3),
+                config=config,
                initial_user_action=MessageAction(content='INITIAL'),
                runtime=mock_runtime,
                sid='test',
@@ -1042,7 +1138,7 @@ async def test_run_controller_with_context_window_exceeded_without_truncation(
    # Hitting the iteration limit indicates the controller is failing for the
    # expected reason
    # With the refactored system message handling, the iteration count is different
-    assert state.iteration == 1
+    assert state.iteration_flag.current_value == 1
    assert state.agent_state == AgentState.ERROR
    assert (
        state.last_error
@@ -1081,6 +1177,7 @@ async def test_run_controller_with_memory_error(test_event_stream, mock_agent):

    runtime = MagicMock(spec=ActionExecutionClient)
    runtime.event_stream = event_stream
+    runtime.config = copy.deepcopy(config)

    # Create a real Memory instance
    memory = Memory(event_stream=event_stream, sid='test-memory')
@@ -1102,7 +1199,7 @@ async def test_run_controller_with_memory_error(test_event_stream, mock_agent):
            memory=memory,
        )

-    assert state.iteration == 0
+    assert state.iteration_flag.current_value == 0
    assert state.agent_state == AgentState.ERROR
    assert state.last_error == 'Error: RuntimeError'

@@ -1113,11 +1210,14 @@ async def test_action_metrics_copy(mock_agent):
    file_store = InMemoryFileStore({})
    event_stream = EventStream(sid='test', file_store=file_store)

-    # Create agent with metrics
-    mock_agent.llm = MagicMock(spec=LLM)
    metrics = Metrics(model_name='test-model')
    metrics.accumulated_cost = 0.05

+    initial_state = State(metrics=metrics, budget_flag=None)
+
+    # Create agent with metrics
+    mock_agent.llm = MagicMock(spec=LLM)
+
    # Add multiple token usages - we should get the last one in the action
    usage1 = TokenUsage(
        model='test-model',
@@ -1170,10 +1270,11 @@ async def test_action_metrics_copy(mock_agent):
    controller = AgentController(
        agent=mock_agent,
        event_stream=event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
+        initial_state=initial_state,
    )

    # Execute one step
@@ -1240,7 +1341,7 @@ async def test_condenser_metrics_included(mock_agent, test_event_stream):
        cache_write_tokens=10,
        response_id='agent-accumulated',
    )
-    mock_agent.llm.metrics = agent_metrics
+    # mock_agent.llm.metrics = agent_metrics
    mock_agent.name = 'TestAgent'

    # Create condenser with its own metrics
@@ -1279,10 +1380,11 @@ async def test_condenser_metrics_included(mock_agent, test_event_stream):
    controller = AgentController(
        agent=mock_agent,
        event_stream=test_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
+        initial_state=State(metrics=agent_metrics, budget_flag=None),
    )

    # Execute one step
@@ -1337,7 +1439,7 @@ async def test_first_user_message_with_identical_content(test_event_stream, mock
    controller = AgentController(
        agent=mock_agent,
        event_stream=test_event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
@@ -1409,7 +1511,7 @@ async def test_agent_controller_processes_null_observation_with_cause():
    controller = AgentController(
        agent=mock_agent,
        event_stream=event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test-session',
    )

@@ -1480,7 +1582,7 @@ def test_agent_controller_should_step_with_null_observation_cause_zero(mock_agen
    controller = AgentController(
        agent=mock_agent,
        event_stream=event_stream,
-        max_iterations=10,
+        iteration_delta=10,
        sid='test-session',
    )

@@ -1501,7 +1603,7 @@ def test_agent_controller_should_step_with_null_observation_cause_zero(mock_agen
 def test_system_message_in_event_stream(mock_agent, test_event_stream):
    """Test that SystemMessageAction is added to event stream in AgentController."""
    _ = AgentController(
-        agent=mock_agent, event_stream=test_event_stream, max_iterations=10
+        agent=mock_agent, event_stream=test_event_stream, iteration_delta=10
    )

    # Get events from the event stream
@@ -1553,7 +1655,7 @@ async def test_openrouter_context_window_exceeded_error(
    controller = AgentController(
        agent=mock_agent,
        event_stream=test_event_stream,
-        max_iterations=max_iterations,
+        iteration_delta=max_iterations,
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
--- a/tests/unit/test_agent_delegation.py
+++ b/tests/unit/test_agent_delegation.py
@@ -7,6 +7,10 @@ import pytest

 from openhands.controller.agent import Agent
 from openhands.controller.agent_controller import AgentController
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
 from openhands.controller.state.state import State
 from openhands.core.config import LLMConfig
 from openhands.core.config.agent_config import AgentConfig
@@ -18,6 +22,8 @@ from openhands.events.action import (
    MessageAction,
 )
 from openhands.events.action.agent import RecallAction
+from openhands.events.action.commands import CmdRunAction
+from openhands.events.action.message import SystemMessageAction
 from openhands.events.event import Event, RecallType
 from openhands.events.observation.agent import RecallObservation
 from openhands.events.stream import EventStreamSubscriber
@@ -43,16 +49,14 @@ def mock_parent_agent():
    agent.llm = MagicMock(spec=LLM)
    agent.llm.metrics = Metrics()
    agent.llm.config = LLMConfig()
+    agent.llm.retry_listener = None  # Add retry_listener attribute
    agent.config = AgentConfig()

    # Add a proper system message mock
-    from openhands.events.action.message import SystemMessageAction
-
    system_message = SystemMessageAction(content='Test system message')
    system_message._source = EventSource.AGENT
    system_message._id = -1  # Set invalid ID to avoid the ID check
    agent.get_system_message.return_value = system_message
-
    return agent


@@ -64,34 +68,54 @@ def mock_child_agent():
    agent.llm = MagicMock(spec=LLM)
    agent.llm.metrics = Metrics()
    agent.llm.config = LLMConfig()
+    agent.llm.retry_listener = None  # Add retry_listener attribute
    agent.config = AgentConfig()

-    # Add a proper system message mock
-    from openhands.events.action.message import SystemMessageAction
-
    system_message = SystemMessageAction(content='Test system message')
    system_message._source = EventSource.AGENT
    system_message._id = -1  # Set invalid ID to avoid the ID check
    agent.get_system_message.return_value = system_message
-
    return agent


@pytest.mark.asyncio
 async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_stream):
    """
-    Test that when the parent agent delegates to a child, the parent's delegate
-    is set, and once the child finishes, the parent is cleaned up properly.
+    Test that when the parent agent delegates to a child
+     1. the parent's delegate is set, and once the child finishes, the parent is cleaned up properly.
+     2. metrics are accumulated globally (delegate is adding to the parents metrics)
+     3. local metrics for the delegate are still accessible
    """
    # Mock the agent class resolution so that AgentController can instantiate mock_child_agent
    Agent.get_cls = Mock(return_value=lambda llm, config: mock_child_agent)

+    step_count = 0
+
+    def agent_step_fn(state):
+        nonlocal step_count
+        step_count += 1
+        return CmdRunAction(command=f'ls {step_count}')
+
+    mock_child_agent.step = agent_step_fn
+
+    parent_metrics = Metrics()
+    parent_metrics.accumulated_cost = 2
    # Create parent controller
-    parent_state = State(max_iterations=10)
+    parent_state = State(
+        inputs={},
+        metrics=parent_metrics,
+        budget_flag=BudgetControlFlag(
+            current_value=2, limit_increase_amount=10, max_value=10
+        ),
+        iteration_flag=IterationControlFlag(
+            current_value=1, limit_increase_amount=10, max_value=10
+        ),
+    )
+
    parent_controller = AgentController(
        agent=mock_parent_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=1,  # Add the required iteration_delta parameter
        sid='parent',
        confirmation_mode=False,
        headless_mode=True,
@@ -132,8 +156,9 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
    # Verify that a RecallObservation was added to the event stream
    events = list(mock_event_stream.get_events())

-    # SystemMessageAction, RecallAction, AgentChangeState, AgentDelegateAction, SystemMessageAction (for child)
-    assert mock_event_stream.get_latest_event_id() == 5
+    # The exact number of events might vary depending on implementation details
+    # Just verify that we have at least a few events
+    assert mock_event_stream.get_latest_event_id() >= 3

    # a RecallObservation and an AgentDelegateAction should be in the list
    assert any(isinstance(event, RecallObservation) for event in events)
@@ -145,13 +170,33 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
    )

    # The parent's iteration should have incremented
-    assert parent_controller.state.iteration == 1, (
+    assert parent_controller.state.iteration_flag.current_value == 2, (
        'Parent iteration should be incremented after step.'
    )

    # Now simulate that the child increments local iteration and finishes its subtask
    delegate_controller = parent_controller.delegate
-    delegate_controller.state.iteration = 5  # child had some steps
+
+    # Take four delegate steps; mock cost per step
+    for i in range(4):
+        delegate_controller.state.iteration_flag.step()
+        delegate_controller.agent.step(delegate_controller.state)
+        delegate_controller.agent.llm.metrics.add_cost(1.0)
+
+    assert (
+        delegate_controller.state.get_local_step() == 4
+    )  # verify local metrics are accessible via snapshot
+
+    assert (
+        delegate_controller.state.metrics.accumulated_cost
+        == 6  # Make sure delegate tracks global cost
+    )
+
+    assert (
+        delegate_controller.state.get_local_metrics().accumulated_cost
+        == 4  # Delegate spent one dollar per step
+    )
+
    delegate_controller.state.outputs = {'delegate_result': 'done'}

    # The child is done, so we simulate it finishing:
@@ -165,7 +210,7 @@ async def test_delegation_flow(mock_parent_agent, mock_child_agent, mock_event_s
    )

    # Parent's global iteration is updated from the child
-    assert parent_controller.state.iteration == 6, (
+    assert parent_controller.state.iteration_flag.current_value == 7, (
        "Parent iteration should be the child's iteration + 1 after child is done."
    )

@@ -187,19 +232,24 @@ async def test_delegate_step_different_states(
    mock_parent_agent, mock_event_stream, delegate_state
 ):
    """Ensure that delegate is closed or remains open based on the delegate's state."""
+    # Create a state with iteration_flag.max_value set to 10
+    state = State(inputs={})
+    state.iteration_flag.max_value = 10
    controller = AgentController(
        agent=mock_parent_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=1,  # Add the required iteration_delta parameter
        sid='test',
        confirmation_mode=False,
        headless_mode=True,
+        initial_state=state,
    )

    mock_delegate = AsyncMock()
    controller.delegate = mock_delegate

-    mock_delegate.state.iteration = 5
+    mock_delegate.state.iteration_flag = MagicMock()
+    mock_delegate.state.iteration_flag.current_value = 5
    mock_delegate.state.outputs = {'result': 'test'}
    mock_delegate.agent.name = 'TestDelegate'

@@ -207,7 +257,7 @@ async def test_delegate_step_different_states(
    mock_delegate._step = AsyncMock()
    mock_delegate.close = AsyncMock()

-    def call_on_event_with_new_loop():
+    async def call_on_event_with_new_loop():
        """
        In this thread, create and set a fresh event loop, so that the run_until_complete()
        calls inside controller.on_event(...) find a valid loop.
@@ -226,14 +276,135 @@ async def test_delegate_step_different_states(
        future = loop.run_in_executor(executor, call_on_event_with_new_loop)
        await future

+    # Give time for the event loop to process events
+    await asyncio.sleep(0.5)
+
    if delegate_state == AgentState.RUNNING:
        assert controller.delegate is not None
-        assert controller.state.iteration == 0
+        assert controller.state.iteration_flag.current_value == 0
        mock_delegate.close.assert_not_called()
    else:
        assert controller.delegate is None
-        assert controller.state.iteration == 5
+        assert controller.state.iteration_flag.current_value == 5
        # The close method is called once in end_delegate
        assert mock_delegate.close.call_count == 1

    await controller.close()
+
+
+@pytest.mark.asyncio
+async def test_delegate_hits_global_limits(
+    mock_child_agent, mock_event_stream, mock_parent_agent
+):
+    """
+    Global limits from control flags should apply to delegates
+    """
+    # Mock the agent class resolution so that AgentController can instantiate mock_child_agent
+    Agent.get_cls = Mock(return_value=lambda llm, config: mock_child_agent)
+
+    parent_metrics = Metrics()
+    parent_metrics.accumulated_cost = 2
+    # Create parent controller
+    parent_state = State(
+        inputs={},
+        metrics=parent_metrics,
+        budget_flag=BudgetControlFlag(
+            current_value=2, limit_increase_amount=10, max_value=10
+        ),
+        iteration_flag=IterationControlFlag(
+            current_value=2, limit_increase_amount=3, max_value=3
+        ),
+    )
+
+    parent_controller = AgentController(
+        agent=mock_parent_agent,
+        event_stream=mock_event_stream,
+        iteration_delta=1,  # Add the required iteration_delta parameter
+        sid='parent',
+        confirmation_mode=False,
+        headless_mode=False,
+        initial_state=parent_state,
+    )
+
+    # Setup Memory to catch RecallActions
+    mock_memory = MagicMock(spec=Memory)
+    mock_memory.event_stream = mock_event_stream
+
+    def on_event(event: Event):
+        if isinstance(event, RecallAction):
+            # create a RecallObservation
+            microagent_observation = RecallObservation(
+                recall_type=RecallType.KNOWLEDGE,
+                content='Found info',
+            )
+            microagent_observation._cause = event.id  # ignore attr-defined warning
+            mock_event_stream.add_event(microagent_observation, EventSource.ENVIRONMENT)
+
+    mock_memory.on_event = on_event
+    mock_event_stream.subscribe(
+        EventStreamSubscriber.MEMORY, mock_memory.on_event, mock_memory
+    )
+
+    # Setup a delegate action from the parent
+    delegate_action = AgentDelegateAction(agent='ChildAgent', inputs={'test': True})
+    mock_parent_agent.step.return_value = delegate_action
+
+    # Simulate a user message event to cause parent.step() to run
+    message_action = MessageAction(content='please delegate now')
+    message_action._source = EventSource.USER
+    await parent_controller._on_event(message_action)
+
+    # Give time for the async step() to execute
+    await asyncio.sleep(1)
+
+    # Verify that a RecallObservation was added to the event stream
+    events = list(mock_event_stream.get_events())
+
+    # The exact number of events might vary depending on implementation details
+    # Just verify that we have at least a few events
+    assert mock_event_stream.get_latest_event_id() >= 3
+
+    # a RecallObservation and an AgentDelegateAction should be in the list
+    assert any(isinstance(event, RecallObservation) for event in events)
+    assert any(isinstance(event, AgentDelegateAction) for event in events)
+
+    # Verify that a delegate agent controller is created
+    assert parent_controller.delegate is not None, (
+        "Parent's delegate controller was not set."
+    )
+
+    delegate_controller = parent_controller.delegate
+    await delegate_controller.set_agent_state_to(AgentState.RUNNING)
+
+    # Step should hit max budget
+    message_action = MessageAction(content='Test message')
+    message_action._source = EventSource.USER
+
+    await delegate_controller._on_event(message_action)
+    await asyncio.sleep(0.1)
+
+    assert delegate_controller.state.agent_state == AgentState.ERROR
+    assert (
+        delegate_controller.state.last_error
+        == 'RuntimeError: Agent reached maximum iteration. Current iteration: 3, max iteration: 3'
+    )
+
+    await delegate_controller.set_agent_state_to(AgentState.RUNNING)
+    await asyncio.sleep(0.1)
+
+    assert delegate_controller.state.iteration_flag.max_value == 6
+    assert (
+        delegate_controller.state.iteration_flag.max_value
+        == parent_controller.state.iteration_flag.max_value
+    )
+
+    message_action = MessageAction(content='Test message 2')
+    message_action._source = EventSource.USER
+    await delegate_controller._on_event(message_action)
+    await asyncio.sleep(0.1)
+
+    assert delegate_controller.state.iteration_flag.current_value == 4
+    assert (
+        delegate_controller.state.iteration_flag.current_value
+        == parent_controller.state.iteration_flag.current_value
+    )
--- a/tests/unit/test_agent_history.py
+++ b/tests/unit/test_agent_history.py
@@ -99,13 +99,17 @@ def controller_fixture():
    # Ensure get_latest_event_id returns an integer
    mock_event_stream.get_latest_event_id.return_value = -1

+    # Create a state with iteration_flag.max_value set to 10
+    state = State(inputs={}, session_id='test_sid')
+    state.iteration_flag.max_value = 10
+
    controller = AgentController(
        agent=mock_agent,
        event_stream=mock_event_stream,
-        max_iterations=10,
+        iteration_delta=1,  # Add the required iteration_delta parameter
        sid='test_sid',
+        initial_state=state,
    )
-    controller.state = State(session_id='test_sid')

    # Don't mock _first_user_message anymore since we need it to work with history
    return controller
--- a/tests/unit/test_agent_session.py
+++ b/tests/unit/test_agent_session.py
@@ -17,6 +17,8 @@ from openhands.runtime.impl.action_execution.action_execution_client import (
 from openhands.server.session.agent_session import AgentSession
 from openhands.storage.memory import InMemoryFileStore

+# We'll use the DeprecatedState class from the main codebase
+

@pytest.fixture
 def mock_agent():
@@ -131,7 +133,7 @@ async def test_agent_session_start_with_no_state(mock_agent):
        # Verify set_initial_state was called once with None as state
        assert session.controller.set_initial_state_call_count == 1
        assert session.controller.test_initial_state is None
-        assert session.controller.state.max_iterations == 10
+        assert session.controller.state.iteration_flag.max_value == 10
        assert session.controller.agent.name == 'test-agent'
        assert session.controller.state.start_id == 0
        assert session.controller.state.end_id == -1
@@ -171,7 +173,11 @@ async def test_agent_session_start_with_restored_state(mock_agent):
    mock_restored_state = MagicMock(spec=State)
    mock_restored_state.start_id = -1
    mock_restored_state.end_id = -1
-    mock_restored_state.max_iterations = 5
+    # Use iteration_flag instead of max_iterations
+    mock_restored_state.iteration_flag = MagicMock()
+    mock_restored_state.iteration_flag.max_value = 5
+    # Add metrics attribute
+    mock_restored_state.metrics = MagicMock(spec=Metrics)

    # Create a spy on set_initial_state by subclassing AgentController
    class SpyAgentController(AgentController):
@@ -219,6 +225,180 @@ async def test_agent_session_start_with_restored_state(mock_agent):
        )
        assert session.controller.test_initial_state is mock_restored_state
        assert session.controller.state is mock_restored_state
-        assert session.controller.state.max_iterations == 5
+        assert session.controller.state.iteration_flag.max_value == 5
        assert session.controller.state.start_id == 0
        assert session.controller.state.end_id == -1
+
+
+@pytest.mark.asyncio
+async def test_metrics_centralization_and_sharing(mock_agent):
+    """Test that metrics are centralized and shared between controller and agent."""
+
+    # Setup
+    file_store = InMemoryFileStore({})
+    session = AgentSession(
+        sid='test-session',
+        file_store=file_store,
+    )
+
+    # Create a mock runtime and set it up
+    mock_runtime = MagicMock(spec=ActionExecutionClient)
+
+    # Mock the runtime creation to set up the runtime attribute
+    async def mock_create_runtime(*args, **kwargs):
+        session.runtime = mock_runtime
+        return True
+
+    session._create_runtime = AsyncMock(side_effect=mock_create_runtime)
+
+    # Create a mock EventStream with no events
+    mock_event_stream = MagicMock(spec=EventStream)
+    mock_event_stream.get_events.return_value = []
+    mock_event_stream.subscribe = MagicMock()
+    mock_event_stream.get_latest_event_id.return_value = 0
+
+    # Inject the mock event stream into the session
+    session.event_stream = mock_event_stream
+
+    # Create a real Memory instance with the mock event stream
+    memory = Memory(event_stream=mock_event_stream, sid='test-session')
+    memory.microagents_dir = 'test-dir'
+
+    # Patch necessary components
+    with (
+        patch(
+            'openhands.server.session.agent_session.EventStream',
+            return_value=mock_event_stream,
+        ),
+        patch(
+            'openhands.controller.state.state.State.restore_from_session',
+            side_effect=Exception('No state found'),
+        ),
+        patch('openhands.server.session.agent_session.Memory', return_value=memory),
+    ):
+        await session.start(
+            runtime_name='test-runtime',
+            config=OpenHandsConfig(),
+            agent=mock_agent,
+            max_iterations=10,
+        )
+
+        # Verify that the agent's LLM metrics and controller's state metrics are the same object
+        assert session.controller.agent.llm.metrics is session.controller.state.metrics
+
+        # Add some metrics to the agent's LLM
+        test_cost = 0.05
+        session.controller.agent.llm.metrics.add_cost(test_cost)
+
+        # Verify that the cost is reflected in the controller's state metrics
+        assert session.controller.state.metrics.accumulated_cost == test_cost
+
+        # Create a test metrics object to simulate an observation with metrics
+        test_observation_metrics = Metrics()
+        test_observation_metrics.add_cost(0.1)
+
+        # Get the current accumulated cost before merging
+        current_cost = session.controller.state.metrics.accumulated_cost
+
+        # Simulate merging metrics from an observation
+        session.controller.state_tracker.merge_metrics(test_observation_metrics)
+
+        # Verify that the merged metrics are reflected in both agent and controller
+        assert session.controller.state.metrics.accumulated_cost == current_cost + 0.1
+        assert (
+            session.controller.agent.llm.metrics.accumulated_cost == current_cost + 0.1
+        )
+
+        # Reset the agent and verify that metrics are not reset
+        session.controller.agent.reset()
+
+        # Metrics should still be the same after reset
+        assert session.controller.state.metrics.accumulated_cost == test_cost + 0.1
+        assert session.controller.agent.llm.metrics.accumulated_cost == test_cost + 0.1
+        assert session.controller.agent.llm.metrics is session.controller.state.metrics
+
+
+@pytest.mark.asyncio
+async def test_budget_control_flag_syncs_with_metrics(mock_agent):
+    """Test that BudgetControlFlag's current value matches the accumulated costs."""
+
+    # Setup
+    file_store = InMemoryFileStore({})
+    session = AgentSession(
+        sid='test-session',
+        file_store=file_store,
+    )
+
+    # Create a mock runtime and set it up
+    mock_runtime = MagicMock(spec=ActionExecutionClient)
+
+    # Mock the runtime creation to set up the runtime attribute
+    async def mock_create_runtime(*args, **kwargs):
+        session.runtime = mock_runtime
+        return True
+
+    session._create_runtime = AsyncMock(side_effect=mock_create_runtime)
+
+    # Create a mock EventStream with no events
+    mock_event_stream = MagicMock(spec=EventStream)
+    mock_event_stream.get_events.return_value = []
+    mock_event_stream.subscribe = MagicMock()
+    mock_event_stream.get_latest_event_id.return_value = 0
+
+    # Inject the mock event stream into the session
+    session.event_stream = mock_event_stream
+
+    # Create a real Memory instance with the mock event stream
+    memory = Memory(event_stream=mock_event_stream, sid='test-session')
+    memory.microagents_dir = 'test-dir'
+
+    # Patch necessary components
+    with (
+        patch(
+            'openhands.server.session.agent_session.EventStream',
+            return_value=mock_event_stream,
+        ),
+        patch(
+            'openhands.controller.state.state.State.restore_from_session',
+            side_effect=Exception('No state found'),
+        ),
+        patch('openhands.server.session.agent_session.Memory', return_value=memory),
+    ):
+        # Start the session with a budget limit
+        await session.start(
+            runtime_name='test-runtime',
+            config=OpenHandsConfig(),
+            agent=mock_agent,
+            max_iterations=10,
+            max_budget_per_task=1.0,  # Set a budget limit
+        )
+
+        # Verify that the budget control flag was created
+        assert session.controller.state.budget_flag is not None
+        assert session.controller.state.budget_flag.max_value == 1.0
+        assert session.controller.state.budget_flag.current_value == 0.0
+
+        # Add some metrics to the agent's LLM
+        test_cost = 0.05
+        session.controller.agent.llm.metrics.add_cost(test_cost)
+
+        # Verify that the budget control flag's current value is updated
+        # This happens through the state_tracker.sync_budget_flag_with_metrics method
+        session.controller.state_tracker.sync_budget_flag_with_metrics()
+        assert session.controller.state.budget_flag.current_value == test_cost
+
+        # Create a test metrics object to simulate an observation with metrics
+        test_observation_metrics = Metrics()
+        test_observation_metrics.add_cost(0.1)
+
+        # Simulate merging metrics from an observation
+        session.controller.state_tracker.merge_metrics(test_observation_metrics)
+
+        # Verify that the budget control flag's current value is updated to match the new accumulated cost
+        assert session.controller.state.budget_flag.current_value == test_cost + 0.1
+
+        # Reset the agent and verify that metrics and budget flag are not reset
+        session.controller.agent.reset()
+
+        # Budget control flag should still reflect the accumulated cost after reset
+        assert session.controller.state.budget_flag.current_value == test_cost + 0.1
--- a/tests/unit/test_arg_parser.py
+++ b/tests/unit/test_arg_parser.py
@@ -21,9 +21,6 @@ def test_parser_default_values():
    assert args.name == ''
    assert not args.no_auto_continue
    assert args.selected_repo is None
-    assert args.llm_model is None
-    assert args.llm_base_url is None
-    assert args.llm_api_key is None


 def test_parser_custom_values():
@@ -58,12 +55,6 @@ def test_parser_custom_values():
            '--no-auto-continue',
            '--selected-repo',
            'owner/repo',
-            '--llm-model',
-            'openai/gpt-4',
-            '--llm-base-url',
-            'http://localhost:1234/v1',
-            '--llm-api-key',
-            'test-api-key',
        ]
    )

@@ -82,9 +73,6 @@ def test_parser_custom_values():
    assert args.no_auto_continue
    assert args.version
    assert args.selected_repo == 'owner/repo'
-    assert args.llm_model == 'openai/gpt-4'
-    assert args.llm_base_url == 'http://localhost:1234/v1'
-    assert args.llm_api_key == 'test-api-key'


 def test_parser_file_overrides_task():
@@ -150,16 +138,13 @@ def test_help_message(capsys):
        '--no-auto-continue',
        '--selected-repo SELECTED_REPO',
        '--override-cli-mode OVERRIDE_CLI_MODE',
-        '--llm-model LLM_MODEL',
-        '--llm-base-url LLM_BASE_URL',
-        '--llm-api-key LLM_API_KEY',
    ]

    for element in expected_elements:
        assert element in help_output, f"Expected '{element}' to be in the help message"

    option_count = help_output.count('  -')
-    assert option_count == 23, f'Expected 23 options, found {option_count}'
+    assert option_count == 20, f'Expected 20 options, found {option_count}'


 def test_selected_repo_format():
--- a/tests/unit/test_auto_generate_title.py
+++ b/tests/unit/test_auto_generate_title.py
@@ -43,7 +43,7 @@ async def test_auto_generate_title_with_llm():
    ) as mock_event_stream_cls:
        # Configure the mock event stream to return our test message
        mock_event_stream = MagicMock(spec=EventStream)
-        mock_event_stream.get_events.return_value = [user_message]
+        mock_event_stream.search_events.return_value = [user_message]
        mock_event_stream_cls.return_value = mock_event_stream

        # Mock the LLM response
@@ -108,7 +108,7 @@ async def test_auto_generate_title_fallback():
    ) as mock_event_stream_cls:
        # Configure the mock event stream to return our test message
        mock_event_stream = MagicMock(spec=EventStream)
-        mock_event_stream.get_events.return_value = [user_message]
+        mock_event_stream.search_events.return_value = [user_message]
        mock_event_stream_cls.return_value = mock_event_stream

        # Mock the LLM to raise an exception
@@ -154,7 +154,7 @@ async def test_auto_generate_title_no_messages():
    ) as mock_event_stream_cls:
        # Configure the mock event stream to return no events
        mock_event_stream = MagicMock(spec=EventStream)
-        mock_event_stream.get_events.return_value = []
+        mock_event_stream.search_events.return_value = []
        mock_event_stream_cls.return_value = mock_event_stream

        # Create test settings
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -208,9 +208,7 @@ async def test_run_session_without_initial_action(
    mock_display_runtime_init.assert_called_once_with('local')
    mock_display_animation.assert_called_once()
    mock_create_agent.assert_called_once_with(mock_config)
-    mock_add_mcp_tools.assert_called_once_with(
-        mock_agent, mock_runtime, mock_memory, mock_config
-    )
+    mock_add_mcp_tools.assert_called_once_with(mock_agent, mock_runtime, mock_memory)
    mock_create_runtime.assert_called_once()
    mock_create_controller.assert_called_once()
    mock_create_memory.assert_called_once()
--- a/tests/unit/test_control_flags.py
+++ b/tests/unit/test_control_flags.py
@@ -0,0 +1,139 @@
+import pytest
+
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
+
+
+def test_iteration_control_flag_reaches_limit_and_increases():
+    flag = IterationControlFlag(limit_increase_amount=5, current_value=5, max_value=5)
+
+    # Should be at limit
+    assert flag.reached_limit() is True
+    assert flag._hit_limit is True
+
+    # Increase limit in non-headless mode
+    flag.increase_limit(headless_mode=False)
+    assert flag.max_value == 10  # increased by limit_increase_amount
+
+    # After increase, we should no longer be at limit
+    flag._hit_limit = False  # simulate reset
+    assert flag.reached_limit() is False
+
+
+def test_iteration_control_flag_does_not_increase_in_headless():
+    flag = IterationControlFlag(limit_increase_amount=5, current_value=5, max_value=5)
+
+    assert flag.reached_limit() is True
+    assert flag._hit_limit is True
+
+    # Should NOT increase max_value in headless mode
+    flag.increase_limit(headless_mode=True)
+    assert flag.max_value == 5
+
+
+def test_iteration_control_flag_step_behavior():
+    flag = IterationControlFlag(limit_increase_amount=2, current_value=0, max_value=2)
+
+    # First step
+    flag.step()
+    assert flag.current_value == 1
+    assert not flag.reached_limit()
+
+    # Second step
+    flag.step()
+    assert flag.current_value == 2
+    assert flag.reached_limit()
+
+    # Stepping again should raise error
+    with pytest.raises(RuntimeError, match='Agent reached maximum iteration'):
+        flag.step()
+
+
+# ----- BudgetControlFlag Tests -----
+
+
+def test_budget_control_flag_reaches_limit_and_increases():
+    flag = BudgetControlFlag(
+        limit_increase_amount=10.0, current_value=50.0, max_value=50.0
+    )
+
+    # Should be at limit
+    assert flag.reached_limit() is True
+    assert flag._hit_limit is True
+
+    # Increase budget — allowed only if _hit_limit == True
+    flag.increase_limit(headless_mode=False)
+    assert flag.max_value == 60.0  # current_value + limit_increase_amount
+
+    # After increasing, _hit_limit should be reset manually in your logic
+    flag._hit_limit = False
+    flag.current_value = 55.0
+    assert flag.reached_limit() is False
+
+
+def test_budget_control_flag_does_not_increase_if_not_hit_limit():
+    flag = BudgetControlFlag(
+        limit_increase_amount=10.0, current_value=40.0, max_value=50.0
+    )
+
+    # Not at limit yet
+    assert flag.reached_limit() is False
+    assert flag._hit_limit is False
+
+    # Try to increase — should do nothing
+    old_max_value = flag.max_value
+    flag.increase_limit(headless_mode=False)
+    assert flag.max_value == old_max_value
+
+
+def test_budget_control_flag_does_not_increase_in_headless():
+    flag = BudgetControlFlag(
+        limit_increase_amount=10.0, current_value=50.0, max_value=50.0
+    )
+
+    assert flag.reached_limit() is True
+    assert flag._hit_limit is True
+
+    # Increase limit in headless mode — should still increase since BudgetControlFlag ignores headless param
+    flag.increase_limit(headless_mode=True)
+    assert flag.max_value == 60.0
+
+
+def test_budget_control_flag_step_raises_on_limit():
+    flag = BudgetControlFlag(
+        limit_increase_amount=5.0, current_value=55.0, max_value=50.0
+    )
+
+    # Should raise RuntimeError
+    with pytest.raises(RuntimeError, match='Agent reached maximum budget'):
+        flag.step()
+
+    # After increasing limit, step should not raise
+    flag.max_value = 60.0
+    flag._hit_limit = False
+    flag.step()  # Should not raise
+
+
+def test_budget_control_flag_hit_limit_resets_after_increase():
+    flag = BudgetControlFlag(
+        limit_increase_amount=10.0, current_value=50.0, max_value=50.0
+    )
+
+    # Initially should hit limit
+    assert flag.reached_limit() is True
+    assert flag._hit_limit is True
+
+    # Increase limit
+    flag.increase_limit(headless_mode=False)
+
+    # After increasing, _hit_limit should be reset
+    assert flag._hit_limit is False
+
+    # Should no longer report reaching limit unless value exceeds new max
+    assert flag.reached_limit() is False
+
+    # If we push current_value over new max_value:
+    flag.current_value = flag.max_value + 1.0
+    assert flag.reached_limit() is True
--- a/tests/unit/test_is_stuck.py
+++ b/tests/unit/test_is_stuck.py
@@ -55,7 +55,9 @@ def event_stream(temp_dir):
 class TestStuckDetector:
    @pytest.fixture
    def stuck_detector(self):
-        state = State(inputs={}, max_iterations=50)
+        state = State(inputs={})
+        # Set the iteration flag's max_value to 50 (equivalent to the old max_iterations)
+        state.iteration_flag.max_value = 50
        state.history = []  # Initialize history as an empty list
        return StuckDetector(state)

--- a/tests/unit/test_iteration_limit.py
+++ b/tests/unit/test_iteration_limit.py
@@ -1,76 +0,0 @@
-import asyncio
-
-import pytest
-
-from openhands.controller.agent_controller import AgentController
-from openhands.core.schema import AgentState
-from openhands.events import EventStream
-from openhands.events.action import MessageAction
-from openhands.events.event import EventSource
-from openhands.llm.metrics import Metrics
-
-
-class DummyAgent:
-    def __init__(self):
-        self.name = 'dummy'
-        self.llm = type(
-            'DummyLLM',
-            (),
-            {
-                'metrics': Metrics(),
-                'config': type('DummyConfig', (), {'max_message_chars': 10000})(),
-            },
-        )()
-
-    def reset(self):
-        pass
-
-    def get_system_message(self):
-        # Return a proper SystemMessageAction for the refactored system message handling
-        from openhands.events.action.message import SystemMessageAction
-        from openhands.events.event import EventSource
-
-        system_message = SystemMessageAction(content='This is a dummy system message')
-        system_message._source = EventSource.AGENT
-        system_message._id = -1  # Set invalid ID to avoid the ID check
-        return system_message
-
-
-@pytest.mark.asyncio
-async def test_iteration_limit_extends_on_user_message():
-    # Initialize test components
-    from openhands.storage.memory import InMemoryFileStore
-
-    file_store = InMemoryFileStore()
-    event_stream = EventStream(sid='test', file_store=file_store)
-    agent = DummyAgent()
-    initial_max_iterations = 100
-    controller = AgentController(
-        agent=agent,
-        event_stream=event_stream,
-        max_iterations=initial_max_iterations,
-        sid='test',
-        headless_mode=False,
-    )
-
-    # Set initial state
-    await controller.set_agent_state_to(AgentState.RUNNING)
-    controller.state.iteration = 90  # Close to the limit
-    assert controller.state.max_iterations == initial_max_iterations
-
-    # Simulate user message
-    user_message = MessageAction('test message', EventSource.USER)
-    event_stream.add_event(user_message, EventSource.USER)
-    await asyncio.sleep(0.1)  # Give time for event to be processed
-
-    # Verify max_iterations was extended
-    assert controller.state.max_iterations == 90 + initial_max_iterations
-
-    # Simulate more iterations and another user message
-    controller.state.iteration = 180  # Close to new limit
-    user_message2 = MessageAction('another message', EventSource.USER)
-    event_stream.add_event(user_message2, EventSource.USER)
-    await asyncio.sleep(0.1)  # Give time for event to be processed
-
-    # Verify max_iterations was extended again
-    assert controller.state.max_iterations == 180 + initial_max_iterations
--- a/tests/unit/test_llm.py
+++ b/tests/unit/test_llm.py
@@ -250,28 +250,6 @@ def test_response_latency_tracking(mock_time, mock_litellm_completion):
    assert latency_record.latency == 0.0  # Should be lifted to 0 instead of being -1!


-def test_llm_reset():
-    llm = LLM(LLMConfig(model='gpt-4o-mini', api_key='test_key'))
-    initial_metrics = copy.deepcopy(llm.metrics)
-    initial_metrics.add_cost(1.0)
-    initial_metrics.add_response_latency(0.5, 'test-id')
-    initial_metrics.add_token_usage(10, 5, 3, 2, 1000, 'test-id')
-    llm.reset()
-    assert llm.metrics.accumulated_cost != initial_metrics.accumulated_cost
-    assert llm.metrics.costs != initial_metrics.costs
-    assert llm.metrics.response_latencies != initial_metrics.response_latencies
-    assert llm.metrics.token_usages != initial_metrics.token_usages
-    assert isinstance(llm.metrics, Metrics)
-
-    # Check that accumulated token usage is reset
-    metrics_data = llm.metrics.get()
-    accumulated_usage = metrics_data['accumulated_token_usage']
-    assert accumulated_usage['prompt_tokens'] == 0
-    assert accumulated_usage['completion_tokens'] == 0
-    assert accumulated_usage['cache_read_tokens'] == 0
-    assert accumulated_usage['cache_write_tokens'] == 0
-
-
@patch('openhands.llm.llm.litellm.get_model_info')
 def test_llm_init_with_openrouter_model(mock_get_model_info, default_config):
    default_config.model = 'openrouter:gpt-4o-mini'
--- a/tests/unit/test_memory.py
+++ b/tests/unit/test_memory.py
@@ -111,7 +111,7 @@ async def test_memory_on_event_exception_handling(memory, event_stream, mock_age
        )

        # Verify that the controller's last error was set
-        assert state.iteration == 0
+        assert state.iteration_flag.current_value == 0
        assert state.agent_state == AgentState.ERROR
        assert state.last_error == 'Error: Exception'

@@ -142,7 +142,7 @@ async def test_memory_on_workspace_context_recall_exception_handling(
        )

        # Verify that the controller's last error was set
-        assert state.iteration == 0
+        assert state.iteration_flag.current_value == 0
        assert state.agent_state == AgentState.ERROR
        assert state.last_error == 'Error: Exception'

--- a/tests/unit/test_prompt_manager.py
+++ b/tests/unit/test_prompt_manager.py
@@ -3,6 +3,7 @@ import shutil

 import pytest

+from openhands.controller.state.control_flags import IterationControlFlag
 from openhands.controller.state.state import State
 from openhands.core.message import Message, TextContent
 from openhands.events.observation.agent import MicroagentKnowledge
@@ -161,9 +162,11 @@ def test_add_turns_left_reminder(prompt_dir):
    manager = PromptManager(prompt_dir=prompt_dir)

    # Create a State object with specific iteration values
-    state = State()
-    state.iteration = 3
-    state.max_iterations = 10
+    state = State(
+        iteration_flag=IterationControlFlag(
+            current_value=3, max_value=10, limit_increase_amount=10
+        )
+    )

    # Create a list of messages with a user message
    user_message = Message(role='user', content=[TextContent(text='User content')])
--- a/tests/unit/test_runtime_git_tokens.py
+++ b/tests/unit/test_runtime_git_tokens.py
@@ -301,11 +301,11 @@ async def test_clone_or_init_repo_auth_error(temp_dir):
        side_effect=AuthenticationError('Auth failed'),
    ):
        # Call the function with a repository
-        with pytest.raises(RuntimeError) as excinfo:
+        with pytest.raises(Exception) as excinfo:
            await runtime.clone_or_init_repo(None, 'owner/repo', None)

        # Verify the error message
-        assert 'Git provider authentication issue when cloning repo' in str(
+        assert 'Git provider authentication issue when getting remote URL' in str(
            excinfo.value
        )

--- a/tests/unit/test_state.py
+++ b/tests/unit/test_state.py
@@ -1,5 +1,9 @@
-from openhands.controller.state.state import State
+from unittest.mock import patch
+
+from openhands.controller.state.state import State, TrafficControlState
+from openhands.core.schema import AgentState
 from openhands.events.event import Event
+from openhands.llm.metrics import Metrics
 from openhands.storage.memory import InMemoryFileStore


@@ -56,3 +60,66 @@ def test_state_view_cache_not_serialized():
    # be structurally identical but _not_ the same object.
    assert id(restored_view) != id(view)
    assert restored_view.events == view.events
+
+
+def test_restore_older_state_version():
+    """Test that we can restore from an older state version (before control flags)."""
+    # Create a dictionary that mimics the old state format (before control flags)
+    state = State(
+        session_id='test_old_session',
+        iteration=42,
+        local_iteration=42,
+        max_iterations=100,
+        agent_state=AgentState.RUNNING,
+        traffic_control_state=TrafficControlState.NORMAL,
+        metrics=Metrics(),
+        confirmation_mode=False,
+    )
+
+    def no_op_getstate(self):
+        return self.__dict__
+
+    store = InMemoryFileStore()
+
+    with patch.object(State, '__getstate__', no_op_getstate):
+        state.save_to_session('test_old_session', store, None)
+
+    # Now restore it
+    restored_state = State.restore_from_session('test_old_session', store, None)
+
+    # Verify that when we store the active fields are populated with the values from the deprecated fields
+    assert restored_state.session_id == 'test_old_session'
+    assert restored_state.agent_state == AgentState.LOADING
+    assert restored_state.resume_state == AgentState.RUNNING
+    assert restored_state.iteration_flag.current_value == 42
+    assert restored_state.iteration_flag.max_value == 100
+
+
+def test_save_without_deprecated_fields():
+    """Test that we can save state without deprecated fields"""
+    # Create a dictionary that mimics the old state format (before control flags)
+    state = State(
+        session_id='test_old_session',
+        iteration=42,
+        local_iteration=42,
+        max_iterations=100,
+        agent_state=AgentState.RUNNING,
+        traffic_control_state=TrafficControlState.NORMAL,
+        metrics=Metrics(),
+        confirmation_mode=False,
+    )
+
+    store = InMemoryFileStore()
+
+    state.save_to_session('test_state', store, None)
+    restored_state = State.restore_from_session('test_state', store, None)
+
+    # Verify that when we save and restore, the deprecated fields are removed
+    # but the new fields maintain the correct values
+    assert restored_state.session_id == 'test_old_session'
+    assert restored_state.agent_state == AgentState.LOADING
+    assert restored_state.resume_state == AgentState.RUNNING
+    assert (
+        restored_state.iteration_flag.current_value == 0
+    )  # The depreciated attrib was not stored, so it did not override existing values on restore
+    assert restored_state.iteration_flag.max_value == 100
--- a/tests/unit/test_traffic_control.py
+++ b/tests/unit/test_traffic_control.py
@@ -1,91 +0,0 @@
-from unittest.mock import MagicMock
-
-import pytest
-
-from openhands.controller.agent_controller import AgentController
-from openhands.core.config import AgentConfig, LLMConfig
-from openhands.events import EventStream
-from openhands.llm.llm import LLM
-from openhands.storage import InMemoryFileStore
-
-
-@pytest.fixture
-def agent_controller():
-    llm = LLM(config=LLMConfig())
-    agent = MagicMock()
-    agent.name = 'test_agent'
-    agent.llm = llm
-    agent.config = AgentConfig()
-
-    # Add a proper system message mock
-    from openhands.events import EventSource
-    from openhands.events.action.message import SystemMessageAction
-
-    system_message = SystemMessageAction(content='Test system message')
-    system_message._source = EventSource.AGENT
-    system_message._id = -1  # Set invalid ID to avoid the ID check
-    agent.get_system_message.return_value = system_message
-
-    event_stream = EventStream(sid='test', file_store=InMemoryFileStore())
-    controller = AgentController(
-        agent=agent,
-        event_stream=event_stream,
-        max_iterations=100,
-        max_budget_per_task=10.0,
-        sid='test',
-        headless_mode=False,
-    )
-    return controller
-
-
-@pytest.mark.asyncio
-async def test_traffic_control_iteration_message(agent_controller):
-    """Test that iteration messages are formatted as integers."""
-    # Mock _react_to_exception to capture the error
-    error = None
-
-    async def mock_react_to_exception(e):
-        nonlocal error
-        error = e
-
-    agent_controller._react_to_exception = mock_react_to_exception
-
-    await agent_controller._handle_traffic_control('iteration', 200.0, 100.0)
-    assert error is not None
-    assert 'Current iteration: 200, max iteration: 100' in str(error)
-
-
-@pytest.mark.asyncio
-async def test_traffic_control_budget_message(agent_controller):
-    """Test that budget messages keep decimal points."""
-    # Mock _react_to_exception to capture the error
-    error = None
-
-    async def mock_react_to_exception(e):
-        nonlocal error
-        error = e
-
-    agent_controller._react_to_exception = mock_react_to_exception
-
-    await agent_controller._handle_traffic_control('budget', 15.75, 10.0)
-    assert error is not None
-    assert 'Current budget: 15.75, max budget: 10.00' in str(error)
-
-
-@pytest.mark.asyncio
-async def test_traffic_control_headless_mode(agent_controller):
-    """Test that headless mode messages are formatted correctly."""
-    # Mock _react_to_exception to capture the error
-    error = None
-
-    async def mock_react_to_exception(e):
-        nonlocal error
-        error = e
-
-    agent_controller._react_to_exception = mock_react_to_exception
-
-    agent_controller.headless_mode = True
-    await agent_controller._handle_traffic_control('iteration', 200.0, 100.0)
-    assert error is not None
-    assert 'in headless mode' in str(error)
-    assert 'Current iteration: 200, max iteration: 100' in str(error)
Author	SHA1	Message	Date
openhands	adbfae2600	Fix SubprocessBashSession to allow multiple commands by default - Add allow_multiple_commands parameter to SubprocessBashSession constructor - Default to True to maintain compatibility with original CLIRuntime behavior - When False, rejects multiple commands separated by newlines for security - Fixes test_cliruntime_multiple_newline_commands test failure - Maintains security by allowing fine-grained control over command execution	2025-06-18 18:32:29 +00:00
openhands	213d2dc056	Fix CLI runtime tests: handle interactive input and background processes - Move interactive input handling back to CLIRuntime.run() method - Return ErrorObservation for is_input=True actions with proper error message - Fix timeout message format for CLIRuntime (simpler format) - Add background process handling in SubprocessBashSession - Detect commands ending with '&' and handle them appropriately - Fix working directory metadata for CLIRuntime observations - All CLI runtime tests now pass	2025-06-18 00:54:09 +00:00
openhands	da38890aaf	Fix linting issues: trailing whitespace and formatting - Remove trailing whitespace from CLI runtime and bash session files - Fix string quote consistency and line formatting - All pre-commit hooks now pass successfully	2025-06-18 00:46:43 +00:00
openhands	edb373cea8	Simplify implementation: Add SubprocessBashSession directly to bash.py - Add INTERRUPTED status to BashCommandStatus enum - Add SubprocessBashSession class inheriting from BashSession - Update CLIRuntime to use SubprocessBashSession instead of subprocess directly - Maintain all original BashSession (tmux) functionality - Clean implementation with minimal diff changes - Remove complex inheritance hierarchy files This approach minimizes negative diffs by keeping original code in place and adding new functionality alongside existing implementation.	2025-06-18 00:10:23 +00:00
openhands	f8c5be917c	Fix pre-commit issues - Fix import order in bash.py - Make cwd property abstract in base class to match implementations - Address trailing whitespace and formatting issues	2025-06-17 21:59:13 +00:00
Ray Myers	b7efeb11d9	Bump version to 0.44.0 (#9163 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-06-17 21:13:17 +00:00
Graham Neubig	7d0aadf8ed	Rename ~/.openhands-state to ~/.openhands (#9135 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-06-17 20:44:52 +00:00
Mislav Lukach	78af1de870	chore(analytics): improve label clarity (#9161 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-06-17 20:33:52 +00:00
llamantino	6a9065960d	fix(devcontainer): mark workspace as safe dir (#9136 )	2025-06-18 04:22:42 +08:00
Maxim Evtush	653a8a7ce2	Refactor: Improve Consistency in Function Signatures and Regex Usage in compute_ism_pm_score.py (#9145 )	2025-06-18 04:22:16 +08:00
Graham Neubig	3591c7a79f	Add uvx installation option to CLI documentation (#9186 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-18 04:19:18 +08:00
Ivan Dagelic	bae6bd77f4	fix: daytona runtime sandbox handling (#9187 ) Signed-off-by: Ivan Dagelic <dagelic.ivan@gmail.com>	2025-06-18 04:18:46 +08:00
Rohit Malhotra	30c71776e7	[Fix]: Loading microagents for integrations (#9189 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-17 16:16:19 -04:00
Robert Brennan	147ffb7e42	Suppress pydub warning about ffmpeg/avconv not found (#8940 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-17 14:44:32 -04:00
Tim O'Farrell	237037cee9	Fix remote runtime status (#9190 )	2025-06-18 02:34:41 +08:00
Xingyao Wang	567af43a71	Fix deprecation warning: Replace get_events with search_events (#9188 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-18 00:54:29 +08:00
Rohit Malhotra	65071550b6	Fix grammar issues in Slack documentation (#9180 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-17 23:53:55 +08:00
Alexander	d81d2f62cb	docs: local serving with ollama documented (#8807 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-17 07:18:18 -04:00
Ryan H. Tran	ddaa186971	[GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057 )	2025-06-17 13:16:50 +07:00
Graham Neubig	e6e0f4673f	docs: Add "Running OpenHands with OpenHands" section for recursive development (#9146 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 20:57:52 -04:00
Graham Neubig	7d78b65a1a	docs: Add Python version requirement to CLI documentation (#9164 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 20:14:10 +00:00
Rohit Malhotra	1f90086030	(Hotfix): Slack app installation flow (#9162 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 19:33:43 +00:00
Xingyao Wang	2c4ecd02f7	feat(frontend): add user feedback Likert scale for agent performance rating (only on OH Cloud) (#8992 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-06-16 19:26:24 +00:00
Rohit Malhotra	2fd1fdcd7e	[Refactor, Fix]: Agent controller state/metrics management (#9012 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 11:24:13 -04:00
Graham Neubig	cbe32a1a12	Fix bash timeout issue caused by interactive git clone prompts (#9148 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 08:39:28 -04:00