Add debug logging to runtime_build.py to help diagnose hash calculation issues

Fix JSON logging tests to use environment-configured level key
The tests were failing because they expected 'level' as the JSON log level key, but the environment was configured to use 'severity' via LOG_JSON_LEVEL_KEY. Updated all TestJsonOutput tests to use LOG_JSON_LEVEL_KEY instead of hardcoded 'level' to make them environment-agnostic and work correctly regardless of the LOG_JSON_LEVEL_KEY environment variable setting.
2026-04-29 03:00:45 -04:00 · 2025-07-09 15:27:43 +00:00 · 2025-06-26 23:28:49 +00:00 · 2025-06-17 18:59:22 +00:00 · 2025-06-17 14:44:32 -04:00 · 2025-06-18 02:34:41 +08:00
124 changed files with 3562 additions and 3960 deletions
--- a/.openhands/microagents/repo.md
+++ b/.openhands/microagents/repo.md
@@ -5,6 +5,18 @@ This repository contains the code for OpenHands, an automated AI software engine
 To set up the entire repo, including frontend and backend, run `make build`.
 You don't need to do this unless the user asks you to, or if you're trying to run the entire application.

+## Running OpenHands with OpenHands:
+To run the full application for development or self-improvement:
+```bash
+export INSTALL_DOCKER=0
+export RUNTIME=local
+make build && make run
+```
+For external access (cloud environments), use:
+```bash
+make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
+```
+
 IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.

 Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
--- a/Development.md
+++ b/Development.md
@@ -103,6 +103,29 @@ components or interface enhancements.
  make start-frontend
  ```

+### 5. Running OpenHands with OpenHands
+
+You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
+
+#### Quick Start
+
+1. **Build and run OpenHands:**
+   ```bash
+   export INSTALL_DOCKER=0
+   export RUNTIME=local
+   make build && make run
+   ```
+
+2. **Access the interface:**
+   - Local development: http://localhost:3001
+   - Remote/cloud environments: Use the appropriate external URL
+
+3. **Configure for external access (if needed):**
+   ```bash
+   # For external access (e.g., cloud environments)
+   make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
+   ```
+
 ### 6. LLM Debugging

 If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
--- a/LOGGING_IMPROVEMENTS.md
+++ b/LOGGING_IMPROVEMENTS.md
@@ -0,0 +1,131 @@
+# Action Execution Server Logging Improvements
+
+## Overview
+
+This document describes the comprehensive logging improvements added to the Action Execution Server to help debug issues like files disappearing and provide better observability into action execution.
+
+## Changes Made
+
+### 1. Enhanced Action Execution Logging
+
+Added structured logging to the main action execution flow in `ActionExecutor.run_action()`:
+
+- **Action Start Logging**: Logs when each action begins execution with metadata
+- **Action Success Logging**: Logs successful completion with execution time and observation metadata
+- **Action Failure Logging**: Logs failures with error details and execution time
+- **Execution Timing**: Tracks and logs execution time in milliseconds for performance monitoring
+
+### 2. Metadata Extraction Functions
+
+Added two new helper methods to extract relevant metadata while excluding large content:
+
+#### `_extract_action_metadata(action: Action) -> dict[str, Any]`
+Extracts metadata from actions including:
+- **File Operations**: Path, line ranges, content lengths (not actual content)
+- **Commands**: Command text (truncated if >200 chars), blocking status, working directory
+- **IPython**: Code length and preview (truncated if >100 chars)
+- **Browser Actions**: URLs and action counts
+- **Common**: Timeout values, action IDs
+
+#### `_extract_observation_metadata(observation) -> dict[str, Any]`
+Extracts metadata from observations including:
+- **Common**: Observation type, error status, content lengths
+- **File Operations**: File paths, content previews (truncated)
+- **Commands**: Exit codes, output lengths
+- **Errors**: Error messages (truncated to 200 chars)
+- **File Edits**: Diff information and content lengths
+
+### 3. HTTP Endpoint Logging
+
+Enhanced the `/execute_action` endpoint with:
+- **Request Logging**: Logs incoming action requests with action type and ID
+- **Response Logging**: Logs completed requests with total request time
+- **Error Logging**: Logs HTTP-level errors with timing information
+
+### 4. Operation-Specific Logging
+
+Added detailed logging to individual action handlers:
+
+#### File Operations
+- **Read Operations**: Logs file read attempts, success/failure, file types, sizes
+- **Write Operations**: Logs file write attempts, directory creation, file existence checks
+- **Edit Operations**: Logs edit attempts, success/failure, diff information
+- **Error Handling**: Logs specific error types (file not found, permission errors, etc.)
+
+#### Command Execution
+- **Command Logging**: Logs command execution with previews and parameters
+- **Result Logging**: Logs exit codes, output lengths, success status
+- **Error Logging**: Logs command execution failures with error details
+
+#### IPython Execution
+- **Code Logging**: Logs IPython code execution with code previews
+- **Result Logging**: Logs execution results and output lengths
+- **Working Directory**: Logs directory changes and synchronization
+
+#### File Management Endpoints
+- **File Listing**: Logs directory listing operations with entry counts
+- **File Upload**: Logs file upload operations with file details and types
+- **File Download**: Logs download operations with file counts and zip creation
+
+### 5. Structured Logging Format
+
+All logs use structured logging with the `extra` parameter to include:
+- **Operation Type**: Identifies the type of operation being performed
+- **Metadata**: Relevant metadata specific to each operation
+- **Timing**: Execution times where applicable
+- **Success/Failure**: Clear indication of operation outcomes
+- **Error Details**: Comprehensive error information when failures occur
+
+## Benefits for Debugging
+
+### File Disappearance Issues
+The enhanced logging will help debug file disappearance by:
+- Tracking all file operations (read, write, edit, delete)
+- Logging file existence checks and directory operations
+- Recording file sizes and modification details
+- Capturing permission and ownership changes
+- Logging file upload/download operations
+
+### Performance Monitoring
+- Execution timing for all actions
+- Request processing times
+- File operation performance
+- Command execution duration
+
+### Error Tracking
+- Comprehensive error logging with context
+- Error categorization (file not found, permission errors, etc.)
+- Stack traces for unexpected failures
+- Request-level error tracking
+
+### Operational Visibility
+- Action execution patterns
+- File system activity
+- Command execution frequency
+- Resource usage patterns
+
+## Log Levels Used
+
+- **INFO**: Action execution start/completion, successful operations
+- **DEBUG**: Detailed operation logging, file system operations, request/response details
+- **WARNING**: Non-fatal errors, permission issues, missing files
+- **ERROR**: Action failures, HTTP errors, unexpected exceptions
+
+## Content Exclusion
+
+To prevent log bloat, the following content is excluded or truncated:
+- File contents (only lengths and previews logged)
+- Large command outputs (only lengths logged)
+- Long error messages (truncated to 200 characters)
+- Code content (only lengths and previews logged)
+
+## Example Log Entries
+
+```
+INFO - Executing action: read - action_type=read, action_id=123, action_metadata={'path': '/workspace/file.txt', 'start': 1, 'end': 10}
+DEBUG - Attempting to read file: /workspace/file.txt - operation=file_read, path=/workspace/file.txt, working_dir=/workspace
+DEBUG - Successfully read text file: /workspace/file.txt - operation=file_read, path=/workspace/file.txt, file_type=text, lines_read=10
+INFO - Action completed successfully: read - action_type=read, execution_time_ms=45.2, observation_type=FileReadObservation, success=true
+```
+
+This comprehensive logging will provide the visibility needed to debug complex issues like file disappearance while maintaining reasonable log sizes and performance.
--- a/docs/usage/cloud/slack-installation.mdx
+++ b/docs/usage/cloud/slack-installation.mdx
@@ -5,15 +5,38 @@ description: This guide walks you through installing the OpenHands Slack app.

 ## Prerequisites

- You are a slack workspace admin
 - Access to OpenHands Cloud

 ## Installation Steps

-1. Log in to [OpenHands Cloud](https://app.all-hands.dev)
-2. Click the button below to OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
-3. In the top right corner, select the workspace to install the OpenHands Slack app.
-4. Review permissions and click allow
+<AccordionGroup>
+<Accordion title="Install Slack App (only for Slack admins/owners)">
+
+  **This step is for Slack admins/owners**
+
+  1. Make sure you have permissions to install Apps to your workspace.
+  2. Click the button below to install OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+</Accordion>
+
+<Accordion title="Authorize Slack App (for all Slack workspace members)">
+
+  **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first**
+
+  Every user in the Slack workspace (including admins/owners) must link their Cloud OpenHands account to the OpenHands Slack App. To do this:
+  1. Visit [integrations settings](https://app.all-hands.dev/settings/integrations) in OpenHands Cloud.
+  2. Click the button "Install Slack App".
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+  Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App.
+
+</Accordion>
+
+</AccordionGroup>
+

 ## Working With the Slack App

@@ -45,6 +68,6 @@ You can mention a repo name when starting a new conversation in the following fo
 2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)

 The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
-If the repo name partially matches against, multiple repos, you'll be asked to select a repo from the filtered list.
+If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list.

 ![slack-pro-tip.png](/static/img/slack-pro-tip.png)
--- a/docs/usage/how-to/cli-mode.mdx
+++ b/docs/usage/how-to/cli-mode.mdx
@@ -11,19 +11,22 @@ for scripting.

 ### Running with Python

+**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported)
+
 1. Install OpenHands using pip:

 ```bash
 pip install openhands-ai
 ```

-2. Set your model, API key, and other preferences using environment variables or with the [`config.toml`](https://github.com/All-Hands-AI/OpenHands/blob/main/config.template.toml) file.
-3. Launch an interactive OpenHands conversation from the command line:
+2. Launch an interactive OpenHands conversation from the command line:

 ```bash
 openhands
 ```

+3. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
+
 This command opens an interactive prompt where you can type tasks or commands and get responses from OpenHands.

 #### For Developers
--- a/docs/usage/llms/local-llms.mdx
+++ b/docs/usage/llms/local-llms.mdx
@@ -126,6 +126,18 @@ vllm serve all-hands/openhands-lm-32b-v0.1 \
    --enable-prefix-caching
 ```

+### Create an OpenAI-Compatible Endpoint with Ollama
+
+- Install Ollama following [the official documentation](https://ollama.com/download).
+- For Ollama configuration, use `ollama/<modelname>` as custom model in web. Api key also can be set to `ollama`.
+- Example launch command for Devstral LM 24B:
+
+```bash
+OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve&
+#The minimum context size is ~8196, even the system prompt won't fit smaller
+ollama pull devstral:latest
+```
+
 ## Advanced: Run and Configure OpenHands

 ### Run OpenHands
--- a/evaluation/benchmarks/browsing_delegation/run_infer.py
+++ b/evaluation/benchmarks/browsing_delegation/run_infer.py
@@ -144,7 +144,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False

    if llm_config is None:
--- a/evaluation/benchmarks/gaia/.gitignore
+++ b/evaluation/benchmarks/gaia/.gitignore
@@ -0,0 +1 @@
+data/
--- a/evaluation/benchmarks/gaia/README.md
+++ b/evaluation/benchmarks/gaia/README.md
@@ -6,6 +6,13 @@ This folder contains evaluation harness for evaluating agents on the [GAIA bench

 Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.

+To enable the Tavily MCP Server, you can add the Tavily API key under the `core` section of your `config.toml` file, like below:
+
+```toml
+[core]
+search_api_key = "tvly-******"
+```
+
 ## Run the evaluation

 We are using the GAIA dataset hosted on [Hugging Face](https://huggingface.co/datasets/gaia-benchmark/GAIA).
--- a/evaluation/benchmarks/gaia/run_infer.py
+++ b/evaluation/benchmarks/gaia/run_infer.py
@@ -1,4 +1,5 @@
 import asyncio
+import copy
 import functools
 import os
 import re
@@ -6,6 +7,7 @@ import re
 import huggingface_hub
 import pandas as pd
 from datasets import load_dataset
+from pydantic import SecretStr

 from evaluation.benchmarks.gaia.scorer import question_scorer
 from evaluation.utils.shared import (
@@ -24,6 +26,7 @@ from openhands.core.config import (
    OpenHandsConfig,
    get_llm_config_arg,
    get_parser,
+    load_from_toml,
 )
 from openhands.core.config.utils import get_agent_config_arg
 from openhands.core.logger import openhands_logger as logger
@@ -41,7 +44,7 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
 }

 AGENT_CLS_TO_INST_SUFFIX = {
-    'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
+    'CodeActAgent': 'When you think you have solved the question, please use the finish tool and include your final answer in the message parameter of the finish tool. Your final answer MUST be encapsulated within <solution> and </solution>.\n'
 }


@@ -49,7 +52,7 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    sandbox_config = get_default_sandbox_config_for_eval()
-    sandbox_config.base_container_image = 'python:3.12-bookworm'
+    sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
    config = OpenHandsConfig(
        default_agent=metadata.agent_class,
        run_as_openhands=False,
@@ -67,6 +70,11 @@ def get_config(
        logger.info('Agent config not provided, using default settings')
        agent_config = config.get_agent_config(metadata.agent_class)
        agent_config.enable_prompt_extensions = False
+
+    config_copy = copy.deepcopy(config)
+    load_from_toml(config_copy)
+    if config_copy.search_api_key:
+        config.search_api_key = SecretStr(config_copy.search_api_key)
    return config


@@ -134,16 +142,26 @@ def process_instance(
        dest_file = None

    # Prepare instruction
-    instruction = f'{instance["Question"]}\n'
+    instruction = """You have one question to answer. It is paramount that you provide a correct answer.
+Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
+You must make sure you find the correct answer! You MUST strictly follow the task-specific formatting instructions for your final answer.
+Here is the task:
+{task_question}
+""".format(
+        task_question=instance['Question'],
+    )
    logger.info(f'Instruction: {instruction}')
    if dest_file:
        instruction += f'\n\nThe mentioned file is provided in the workspace at: {dest_file.split("/")[-1]}'

-    instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
-    instruction += 'Please encapsulate your final answer (answer ONLY) within <solution> and </solution>.\n'
+    instruction += """IMPORTANT: When seeking information from a website, REFRAIN from arbitrary URL navigation. You should utilize the designated search engine tool with precise keywords to obtain relevant URLs or use the specific website's search interface. DO NOT navigate directly to specific URLs as they may not exist.\n\nFor example: if you want to search for a research paper on Arxiv, either use the search engine tool with specific keywords or navigate to arxiv.org and then use its interface.\n"""
+    instruction += 'IMPORTANT: You should NEVER ask for Human Help.\n'
+    instruction += 'IMPORTANT: Please encapsulate your final answer (answer ONLY) within <solution> and </solution>. Your answer will be evaluated using string matching approaches so it important that you STRICTLY adhere to the output formatting instructions specified in the task (e.g., alphabetization, sequencing, units, rounding, decimal places, etc.)\n'
    instruction += (
        'For example: The answer to the question is <solution> 42 </solution>.\n'
    )
+    instruction += "IMPORTANT: Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, express it numerically (i.e., with digits rather than words), do not use commas, and do not include units such as $ or percent signs unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities). If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.\n"
+
    # NOTE: You can actually set slightly different instruction for different agents
    instruction += AGENT_CLS_TO_INST_SUFFIX.get(metadata.agent_class, '')
    logger.info(f'Instruction:\n{instruction}', extra={'msg_type': 'OBSERVATION'})
@@ -175,7 +193,7 @@ def process_instance(
    for event in reversed(state.history):
        if event.source == 'agent':
            if isinstance(event, AgentFinishAction):
-                model_answer_raw = event.thought
+                model_answer_raw = event.final_thought
                break
            elif isinstance(event, CmdRunAction):
                model_answer_raw = event.thought
@@ -222,6 +240,7 @@ def process_instance(
        error=state.last_error if state and state.last_error else None,
        test_result=test_result,
    )
+    runtime.close()
    return output


@@ -253,6 +272,8 @@ if __name__ == '__main__':
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')

+    toml_config = OpenHandsConfig()
+    load_from_toml(toml_config)
    metadata = make_metadata(
        llm_config=llm_config,
        dataset_name='gaia',
@@ -261,7 +282,10 @@ if __name__ == '__main__':
        eval_note=args.eval_note,
        eval_output_dir=args.eval_output_dir,
        data_split=args.data_split,
-        details={'gaia-level': args.level},
+        details={
+            'gaia-level': args.level,
+            'mcp-servers': ['tavily'] if toml_config.search_api_key else [],
+        },
        agent_config=agent_config,
    )

--- a/evaluation/benchmarks/gaia/scripts/run_infer.sh
+++ b/evaluation/benchmarks/gaia/scripts/run_infer.sh
@@ -39,7 +39,7 @@ echo "LEVELS: $LEVELS"
 COMMAND="poetry run python ./evaluation/benchmarks/gaia/run_infer.py \
  --agent-cls $AGENT \
  --llm-config $MODEL_CONFIG \
-  --max-iterations 30 \
+  --max-iterations 60 \
  --level $LEVELS \
  --data-split validation \
  --eval-num-workers $NUM_WORKERS \
--- a/evaluation/benchmarks/miniwob/run_infer.py
+++ b/evaluation/benchmarks/miniwob/run_infer.py
@@ -223,7 +223,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
--- a/evaluation/benchmarks/swe_bench/README.md
+++ b/evaluation/benchmarks/swe_bench/README.md
@@ -2,6 +2,8 @@

 This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).

+**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
+
 **UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**

 **UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
--- a/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
+++ b/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
@@ -0,0 +1,65 @@
+# SWE-bench-Live
+
+<p align="center">
+<a href="https://arxiv.org/abs/2505.23419">📃 Paper</a>
+•
+<a href="https://huggingface.co/SWE-bench-Live" >🤗 HuggingFace</a>
+•
+<a href="https://SWE-bench-Live.github.io" >📊 Leaderboard</a>
+</p>
+
+SWE-bench-Live is a live benchmark for issue resolving, providing a dataset that contains the latest issue tasks. This document explains how to run the evaluation of OpenHands on SWE-bench-Live.
+
+Since SWE-bench-Live has an almost identical setting to SWE-bench, you only need to simply change the dataset name to `SWE-bench-Live/SWE-bench-Live`, the other parts are basically the same as running on SWE-bench.
+
+## Setting Up
+
+Set up the development environment and configure your LLM provider by following the [README](README.md).
+
+## Running Inference
+
+Use the same script, but change the dataset name to `SWE-bench-Live` and select the split (either `lite` or `full`). The lite split contains 300 instances from the past six months, while the full split includes 1,319 instances created after 2024.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
+```
+
+In the original SWE-bench-Live paper, max_iterations is set to 100.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.your_llm HEAD CodeActAgent 300 100 3 SWE-bench-Live/SWE-bench-Live lite
+```
+
+## Evaluating Results
+
+After OpenHands generates patch results for each issue, we evaluate the results using the [SWE-bench-Live evaluation harness](https://github.com/microsoft/SWE-bench-Live).
+
+Convert to the format of predictions for SWE benchmarks:
+
+```shell
+# You can find output.jsonl in evaluation/evaluation_outputs
+python evaluation/benchmarks/swe_bench/scripts/live/convert.py --output_jsonl [path/to/evaluation/output.jsonl] > preds.jsonl
+```
+
+Please refer to the original [SWE-bench-Live repository](https://github.com/microsoft/SWE-bench-Live) to set up the evaluation harness and use the provided scripts to generate the evaluation report:
+
+```shell
+python -m swebench.harness.run_evaluation \
+    --dataset_name SWE-bench-Live/SWE-bench-Live \
+    --split lite \
+    --namespace starryzhang \
+    --predictions_path preds.jsonl \
+    --max_workers 10 \
+    --run_id openhands
+```
+
+## Citation
+
+```bibtex
+@article{zhang2025swebenchgoeslive,
+  title={SWE-bench Goes Live!},
+  author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang},
+  journal={arXiv preprint arXiv:2505.23419},
+  year={2025}
+}
+```
--- a/evaluation/benchmarks/swe_bench/live_utils.py
+++ b/evaluation/benchmarks/swe_bench/live_utils.py
@@ -0,0 +1,80 @@
+from typing import Any
+
+import pandas as pd
+
+from evaluation.utils.shared import assert_and_raise
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action import CmdRunAction
+from openhands.events.observation import (
+    CmdOutputObservation,
+    ErrorObservation,
+)
+from openhands.runtime.base import Runtime
+from openhands.utils.shutdown_listener import sleep_if_should_continue
+
+
+def complete_runtime(
+    runtime: Runtime,
+    instance: pd.Series,
+) -> dict[str, Any]:
+    """Complete the runtime and export the git patch for SWE-bench-Live."""
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Completion Fn')
+    logger.info('-' * 30)
+    obs: CmdOutputObservation
+    workspace_dir_name = instance.instance_id
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action)
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+    action = CmdRunAction(command='git config --global core.pager ""')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git config --global core.pager "": {str(obs)}',
+    )
+    action = CmdRunAction(command='git add -A')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git add -A: {str(obs)}',
+    )
+    n_retries = 0
+    git_patch = None
+    while n_retries < 5:
+        action = CmdRunAction(
+            command=f'git diff --no-color --cached {instance["base_commit"]}',
+        )
+        action.set_hard_timeout(100 + 10 * n_retries)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+        n_retries += 1
+        if isinstance(obs, CmdOutputObservation):
+            if obs.exit_code == 0:
+                git_patch = obs.content.strip()
+                break
+            else:
+                logger.info('Failed to get git diff, retrying...')
+                sleep_if_should_continue(10)
+        elif isinstance(obs, ErrorObservation):
+            logger.error(f'Error occurred: {obs.content}. Retrying...')
+            sleep_if_should_continue(10)
+        else:
+            assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+    assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
+    logger.info('-' * 30)
+    logger.info('END Runtime Completion Fn')
+    logger.info('-' * 30)
+    return {'git_patch': git_patch}
--- a/evaluation/benchmarks/swe_bench/run_infer.py
+++ b/evaluation/benchmarks/swe_bench/run_infer.py
@@ -66,6 +66,26 @@ RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'tru
 ENABLE_LLM_EDITOR = os.environ.get('ENABLE_LLM_EDITOR', 'false').lower() == 'true'
 BenchMode = Literal['swe', 'swt', 'swt-ci']

+# Global variable to track dataset type
+DATASET_TYPE = 'SWE-bench'
+
+
+def set_dataset_type(dataset_name: str) -> str:
+    """Set dataset type based on dataset name."""
+    global DATASET_TYPE
+    name_lower = dataset_name.lower()
+
+    if 'swe-gym' in name_lower:
+        DATASET_TYPE = 'SWE-Gym'
+    elif 'swe-bench-live' in name_lower:
+        DATASET_TYPE = 'SWE-bench-Live'
+    elif 'multimodal' in name_lower:
+        DATASET_TYPE = 'Multimodal'
+    else:
+        DATASET_TYPE = 'SWE-bench'
+
+    logger.info(f'Dataset type set to: {DATASET_TYPE}')
+

 AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
    'CodeActAgent': codeact_user_response,
@@ -73,7 +93,10 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {


 def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
-    return f'{instance.repo}__{instance.version}'.replace('/', '__')
+    if DATASET_TYPE == 'SWE-bench-Live':
+        return instance.instance_id
+    else:
+        return f'{instance.repo}__{instance.version}'.replace('/', '__')


 def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
@@ -153,9 +176,13 @@ def get_instance_docker_image(
    if swebench_official_image:
        # Official SWE-Bench image
        # swebench/sweb.eval.x86_64.django_1776_django-11333:v1
-        docker_image_prefix = 'docker.io/swebench/'
+        # SWE-bench-Live uses the same naming convention as SWE-Bench
+        if DATASET_TYPE == 'SWE-bench-Live':
+            docker_image_prefix = 'docker.io/starryzhang/'
+        elif DATASET_TYPE == 'SWE-bench':
+            docker_image_prefix = 'docker.io/swebench/'
        repo, name = instance_id.split('__')
-        image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
+        image_name = f'{docker_image_prefix.rstrip("/")}/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
        logger.debug(f'Using official SWE-Bench image: {image_name}')
        return image_name
    else:
@@ -173,7 +200,8 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    # We use a different instance image for the each instance of swe-bench eval
-    use_swebench_official_image = 'swe-gym' not in metadata.dataset.lower()
+    use_swebench_official_image = DATASET_TYPE != 'SWE-Gym'
+
    base_container_image = get_instance_docker_image(
        instance['instance_id'],
        swebench_official_image=use_swebench_official_image,
@@ -290,8 +318,12 @@ def initialize_runtime(
        runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')

        # inject the instance swe entry
+        if DATASET_TYPE == 'SWE-bench-Live':
+            entry_script_path = 'instance_swe_entry_live.sh'
+        else:
+            entry_script_path = 'instance_swe_entry.sh'
        runtime.copy_to(
-            str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
+            str(os.path.join(script_dir, f'scripts/setup/{entry_script_path}')),
            '/swe_util/',
        )

@@ -311,14 +343,14 @@ def initialize_runtime(
        logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
    assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')

-    action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
+    action = CmdRunAction(command=f'source /swe_util/{entry_script_path}')
    action.set_hard_timeout(600)
    logger.info(action, extra={'msg_type': 'ACTION'})
    obs = runtime.run_action(action)
    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
    assert_and_raise(
        obs.exit_code == 0,
-        f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
+        f'Failed to source /swe_util/{entry_script_path}: {str(obs)}',
    )

    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
@@ -371,9 +403,9 @@ def initialize_runtime(
            obs = runtime.run_action(action)
            logger.info(obs, extra={'msg_type': 'OBSERVATION'})

-    if 'multimodal' not in metadata.dataset.lower():
+    if DATASET_TYPE != 'Multimodal' and DATASET_TYPE != 'SWE-bench-Live':
        # Only for non-multimodal datasets, we need to activate the testbed environment for Python
-        # SWE-Bench multimodal datasets are not using the testbed environment
+        # SWE-Bench multimodal datasets and SWE-bench-Live are not using the testbed environment
        action = CmdRunAction(command='which python')
        action.set_hard_timeout(600)
        logger.info(action, extra={'msg_type': 'ACTION'})
@@ -615,7 +647,13 @@ def process_instance(

        # ======= THIS IS SWE-Bench specific =======
        # Get git patch
-        return_val = complete_runtime(runtime, instance)
+        if DATASET_TYPE == 'SWE-bench-Live':
+            from evaluation.benchmarks.swe_bench.live_utils import (
+                complete_runtime as complete_runtime_fn,
+            )
+        else:
+            complete_runtime_fn = complete_runtime
+        return_val = complete_runtime_fn(runtime, instance)
        git_patch = return_val['git_patch']
        logger.info(
            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
@@ -720,11 +758,15 @@ if __name__ == '__main__':
    # NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
    # so we don't need to manage file uploading to OpenHands's repo
    dataset = load_dataset(args.dataset, split=args.split)
+
+    # Set the global dataset type based on dataset name
+    set_dataset_type(args.dataset)
+
    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
    logger.info(
        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
    )
-    if 'SWE-Gym' in args.dataset:
+    if DATASET_TYPE == 'SWE-Gym':
        with open(
            os.path.join(
                os.path.dirname(os.path.abspath(__file__)),
--- a/evaluation/benchmarks/swe_bench/run_localize.py
+++ b/evaluation/benchmarks/swe_bench/run_localize.py
@@ -192,6 +192,8 @@ def get_config(
        dataset_name=metadata.dataset,
        instance_id=instance['instance_id'],
    )
+    oh_aci_li_cmd = '/openhands/micromamba/bin/micromamba run -n openhands poetry run pip install openhands-aci[llama]'
+    sandbox_config.runtime_extra_deps = oh_aci_li_cmd
    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
    sandbox_config.runtime_startup_env_vars = {
        'REPO_PATH': f'/workspace/{workspace_dir_name}/',
@@ -216,6 +218,7 @@ def get_config(
        enable_jupyter=False,
        enable_browsing=RUN_WITH_BROWSING,
        enable_llm_editor=False,
+        enable_mcp=os.environ.get('ENABLE_MCP', False),
        condenser=metadata.condenser_config,
        enable_prompt_extensions=False,
    )
--- a/evaluation/benchmarks/swe_bench/scripts/live/convert.py
+++ b/evaluation/benchmarks/swe_bench/scripts/live/convert.py
@@ -0,0 +1,33 @@
+import argparse
+import json
+
+
+def main(output_jsonl: str):
+    with open(output_jsonl, 'r') as f:
+        for line in f:
+            try:
+                output = json.loads(line)
+                pred = {
+                    'instance_id': output['instance_id'],
+                    'model_name_or_path': output['metadata']['llm_config']['model'],
+                    'model_patch': output['test_result']['git_patch'],
+                }
+            except Exception as e:
+                print(
+                    f'Error while reading output of instance {output["instance_id"]}: {e}'
+                )
+
+            print(json.dumps(pred))
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--output_jsonl',
+        type=str,
+        required=True,
+        help='Path to the prediction file (.../outputs.jsonl)',
+    )
+    args = parser.parse_args()
+
+    main(args.output_jsonl)
--- a/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
+++ b/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+
+source ~/.bashrc
+SWEUTIL_DIR=/swe_util
+
+# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
+# SWE_INSTANCE_ID=django__django-11099
+if [ -z "$SWE_INSTANCE_ID" ]; then
+    echo "Error: SWE_INSTANCE_ID is not set." >&2
+    exit 1
+fi
+
+# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
+item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
+
+if [[ -z "$item" ]]; then
+  echo "No item found for the provided instance ID."
+  exit 1
+fi
+
+
+echo "WORKSPACE_NAME: $SWE_INSTANCE_ID"
+
+# Clear the workspace
+if [ -d /workspace ]; then
+    rm -rf /workspace/*
+else
+    mkdir /workspace
+fi
+# Copy repo to workspace
+if [ -d /workspace/$SWE_INSTANCE_ID ]; then
+    rm -rf /workspace/$SWE_INSTANCE_ID
+fi
+mkdir -p /workspace
+cp -r /testbed /workspace/$SWE_INSTANCE_ID
+
+# SWE-bench-Live does not use conda to manage Python
+# if [ -d /opt/miniconda3 ]; then
+#     . /opt/miniconda3/etc/profile.d/conda.sh
+#     conda activate testbed
+# fi
--- a/evaluation/benchmarks/webarena/run_infer.py
+++ b/evaluation/benchmarks/webarena/run_infer.py
@@ -212,7 +212,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
--- a/evaluation/utils/shared.py
+++ b/evaluation/utils/shared.py
@@ -263,8 +263,19 @@ def prepare_dataset(
            f'Randomly sampling {eval_n_limit} unique instances with random seed 42.'
        )

+    def make_serializable(instance: pd.Series) -> dict:
+        import numpy as np
+
+        instance_dict = instance.to_dict()
+        for k, v in instance_dict.items():
+            if isinstance(v, np.ndarray):
+                instance_dict[k] = v.tolist()
+            elif isinstance(v, pd.Timestamp):
+                instance_dict[k] = str(v)
+        return instance_dict
+
    new_dataset = [
-        instance
+        make_serializable(instance)
        for _, instance in dataset.iterrows()
        if str(instance[id_column]) not in finished_ids
    ]
--- a/frontend/tests/components/features/home/repo-connector.test.tsx
+++ b/frontend/tests/components/features/home/repo-connector.test.tsx
@@ -31,7 +31,7 @@ const renderRepoConnector = () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
      ],
    },
--- a/frontend/tests/routes/git-settings.test.tsx
+++ b/frontend/tests/routes/git-settings.test.tsx
@@ -35,13 +35,13 @@ const queryClient = new QueryClient();
 const GitSettingsRouterStub = createRoutesStub([
  {
    Component: GitSettingsScreen,
-    path: "/settings/github",
+    path: "/settings/integrations",
  },
 ]);

 const renderGitSettingsScreen = () => {
  const { rerender, ...rest } = render(
-    <GitSettingsRouterStub initialEntries={["/settings/github"]} />,
+    <GitSettingsRouterStub initialEntries={["/settings/integrations"]} />,
    {
      wrapper: ({ children }) => (
        <QueryClientProvider client={queryClient}>
@@ -54,7 +54,7 @@ const renderGitSettingsScreen = () => {
  const rerenderGitSettingsScreen = () =>
    rerender(
      <QueryClientProvider client={queryClient}>
-        <GitSettingsRouterStub initialEntries={["/settings/github"]} />
+        <GitSettingsRouterStub initialEntries={["/settings/integrations"]} />
      </QueryClientProvider>,
    );

@@ -89,9 +89,6 @@ describe("Content", () => {
    await screen.findByTestId("gitlab-token-input");
    await screen.findByTestId("gitlab-token-help-anchor");

-    await screen.findByTestId("azure-devops-token-input");
-    await screen.findByTestId("azure-devops-token-help-anchor");
-
    getConfigSpy.mockResolvedValue(VALID_SAAS_CONFIG);
    queryClient.invalidateQueries();
    rerender();
@@ -110,13 +107,6 @@ describe("Content", () => {
      expect(
        screen.queryByTestId("gitlab-token-help-anchor"),
      ).not.toBeInTheDocument();
-
-      expect(
-        screen.queryByTestId("azure-devops-token-input"),
-      ).not.toBeInTheDocument();
-      expect(
-        screen.queryByTestId("azure-devops-token-help-anchor"),
-      ).not.toBeInTheDocument();
    });
  });

@@ -143,12 +133,6 @@ describe("Content", () => {
      expect(
        screen.queryByTestId("gl-set-token-indicator"),
      ).not.toBeInTheDocument();
-
-      const azureDevOpsInput = screen.getByTestId("azure-devops-token-input");
-      expect(azureDevOpsInput).toHaveProperty("placeholder", "");
-      expect(
-        screen.queryByTestId("ado-set-token-indicator"),
-      ).not.toBeInTheDocument();
    });

    getSettingsSpy.mockResolvedValue({
@@ -156,7 +140,6 @@ describe("Content", () => {
      provider_tokens_set: {
        github: null,
        gitlab: null,
-        azure_devops: null,
      },
    });
    queryClient.invalidateQueries();
@@ -175,19 +158,12 @@ describe("Content", () => {
      expect(
        screen.queryByTestId("gl-set-token-indicator"),
      ).toBeInTheDocument();
-
-      const azureDevOpsInput = screen.getByTestId("azure-devops-token-input");
-      expect(azureDevOpsInput).toHaveProperty("placeholder", "<hidden>");
-      expect(
-        screen.queryByTestId("ado-set-token-indicator"),
-      ).toBeInTheDocument();
    });

    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {
        gitlab: null,
-        azure_devops: null,
      },
    });
    queryClient.invalidateQueries();
@@ -206,12 +182,6 @@ describe("Content", () => {
      expect(
        screen.queryByTestId("gl-set-token-indicator"),
      ).toBeInTheDocument();
-
-      const azureDevOpsInput = screen.getByTestId("azure-devops-token-input");
-      expect(azureDevOpsInput).toHaveProperty("placeholder", "<hidden>");
-      expect(
-        screen.queryByTestId("ado-set-token-indicator"),
-      ).toBeInTheDocument();
    });
  });

@@ -273,49 +243,15 @@ describe("Form submission", () => {
    expect(saveProvidersSpy).toHaveBeenCalledWith({
      github: { token: "test-token", host: "" },
      gitlab: { token: "", host: "" },
-      azure_devops: { token: "", host: "" },
    });
-  });
-
-  it("should save the GitLab token", async () => {
-    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
-
-    renderGitSettingsScreen();

    const gitlabInput = await screen.findByTestId("gitlab-token-input");
-    const submit = await screen.findByTestId("submit-button");
-
    await userEvent.type(gitlabInput, "test-token");
    await userEvent.click(submit);

    expect(saveProvidersSpy).toHaveBeenCalledWith({
-      github: { token: "", host: "" },
-      gitlab: { token: "test-token", host: "" },
-      azure_devops: { token: "", host: "" },
-    });
-  });
-
-  it("should save the Azure DevOps token", async () => {
-    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
-
-    renderGitSettingsScreen();
-
-    const azureDevOpsInput = await screen.findByTestId("azure-devops-token-input");
-    const azureDevOpsHostInput = await screen.findByTestId("azure-devops-host-input");
-    const submit = await screen.findByTestId("submit-button");
-
-    await userEvent.type(azureDevOpsInput, "test-token");
-    await userEvent.type(azureDevOpsHostInput, "https://dev.azure.com/test-org");
-    await userEvent.click(submit);
-
-    expect(saveProvidersSpy).toHaveBeenCalledWith({
-      github: { token: "", host: "" },
+      github: { token: "test-token", host: "" },
      gitlab: { token: "", host: "" },
-      azure_devops: { token: "test-token", host: "https://dev.azure.com/test-org" },
    });
  });

@@ -343,14 +279,6 @@ describe("Form submission", () => {

    await userEvent.clear(gitlabInput);
    expect(submit).toBeDisabled();
-
-    const azureDevOpsInput = await screen.findByTestId("azure-devops-token-input");
-    await userEvent.type(azureDevOpsInput, "test-token");
-
-    expect(submit).not.toBeDisabled();
-
-    await userEvent.clear(azureDevOpsInput);
-    expect(submit).toBeDisabled();
  });

  it("should enable a disconnect tokens button if there is at least one token set", async () => {
@@ -363,7 +291,6 @@ describe("Form submission", () => {
      provider_tokens_set: {
        github: null,
        gitlab: null,
-        azure_devops: null,
      },
    });

@@ -395,7 +322,6 @@ describe("Form submission", () => {
      provider_tokens_set: {
        github: null,
        gitlab: null,
-        azure_devops: null,
      },
    });

--- a/frontend/tests/routes/secrets-settings.test.tsx
+++ b/frontend/tests/routes/secrets-settings.test.tsx
@@ -31,7 +31,7 @@ const RouterStub = createRoutesStub([
      },
      {
        Component: () => <div data-testid="git-settings-screen" />,
-        path: "/settings/git",
+        path: "/settings/integrations",
      },
    ],
  },
--- a/frontend/tests/routes/settings-with-payment.test.tsx
+++ b/frontend/tests/routes/settings-with-payment.test.tsx
@@ -30,7 +30,7 @@ vi.mock("react-i18next", async () => {
    useTranslation: () => ({
      t: (key: string) => {
        const translations: Record<string, string> = {
-          "SETTINGS$NAV_GIT": "Git",
+          "SETTINGS$NAV_INTEGRATIONS": "Integrations",
          "SETTINGS$NAV_APPLICATION": "Application",
          "SETTINGS$NAV_CREDITS": "Credits",
          "SETTINGS$NAV_API_KEYS": "API Keys",
@@ -61,7 +61,7 @@ describe("Settings Billing", () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
        {
          Component: () => <div data-testid="user-settings-screen" />,
--- a/frontend/tests/routes/settings.test.tsx
+++ b/frontend/tests/routes/settings.test.tsx
@@ -14,7 +14,7 @@ vi.mock("react-i18next", async () => {
    useTranslation: () => ({
      t: (key: string) => {
        const translations: Record<string, string> = {
-          SETTINGS$NAV_GIT: "Git",
+          SETTINGS$NAV_INTEGRATIONS: "Integrations",
          SETTINGS$NAV_APPLICATION: "Application",
          SETTINGS$NAV_CREDITS: "Credits",
          SETTINGS$NAV_API_KEYS: "API Keys",
@@ -49,7 +49,7 @@ describe("Settings Screen", () => {
        },
        {
          Component: () => <div data-testid="git-settings-screen" />,
-          path: "/settings/git",
+          path: "/settings/integrations",
        },
        {
          Component: () => <div data-testid="application-settings-screen" />,
@@ -79,7 +79,7 @@ describe("Settings Screen", () => {
  };

  it("should render the navbar", async () => {
-    const sectionsToInclude = ["llm", "git", "application", "secrets"];
+    const sectionsToInclude = ["llm", "integrations", "application", "secrets"];
    const sectionsToExclude = ["api keys", "credits"];
    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
    // @ts-expect-error - only return app mode
@@ -111,7 +111,7 @@ describe("Settings Screen", () => {
      APP_MODE: "saas",
    });
    const sectionsToInclude = [
-      "git",
+      "integrations",
      "application",
      "credits",
      "secrets",
--- a/frontend/src/api/open-hands.ts
+++ b/frontend/src/api/open-hands.ts
@@ -111,6 +111,59 @@ class OpenHands {
    return data;
  }

+  /**
+   * Submit conversation feedback with rating
+   * @param conversationId The conversation ID
+   * @param rating The rating (1-5)
+   * @param eventId Optional event ID this feedback corresponds to
+   * @param reason Optional reason for the rating
+   * @returns Response from the feedback endpoint
+   */
+  static async submitConversationFeedback(
+    conversationId: string,
+    rating: number,
+    eventId?: number,
+    reason?: string,
+  ): Promise<{ status: string; message: string }> {
+    const url = `/feedback/conversation`;
+    const payload = {
+      conversation_id: conversationId,
+      event_id: eventId,
+      rating,
+      reason,
+      metadata: { source: "likert-scale" },
+    };
+    const { data } = await openHands.post<{ status: string; message: string }>(
+      url,
+      payload,
+    );
+    return data;
+  }
+
+  /**
+   * Check if feedback exists for a specific conversation and event
+   * @param conversationId The conversation ID
+   * @param eventId The event ID to check
+   * @returns Feedback data including existence, rating, and reason
+   */
+  static async checkFeedbackExists(
+    conversationId: string,
+    eventId: number,
+  ): Promise<{ exists: boolean; rating?: number; reason?: string }> {
+    try {
+      const url = `/feedback/conversation/${conversationId}/${eventId}`;
+      const { data } = await openHands.get<{
+        exists: boolean;
+        rating?: number;
+        reason?: string;
+      }>(url);
+      return data;
+    } catch (error) {
+      // Error checking if feedback exists
+      return { exists: false };
+    }
+  }
+
  /**
   * Authenticate with GitHub token
   * @returns Response with authentication status and user info if successful
--- a/frontend/src/assets/branding/azure-devops-logo.svg
+++ b/frontend/src/assets/branding/azure-devops-logo.svg
@@ -1 +0,0 @@
-<svg height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m22 18-5 4-8-3v3l-4.19-5.75 12.91 1.05v-10.96l4.28-.69zm-17.19-1.75v-7.29l12.91-2.62-7.12-4.34v2.84l-6.63 1.92-1.97 2.62v5.69z"/></svg>
--- a/frontend/src/components/features/chat/action-suggestions.tsx
+++ b/frontend/src/components/features/chat/action-suggestions.tsx
@@ -20,31 +20,19 @@ export function ActionSuggestions({

  const providersAreSet = providers.length > 0;
  const isGitLab = providers.includes("gitlab");
-  const isAzureDevOps = providers.includes("azure_devops");

-  // Determine the correct terminology based on the provider
-  let pr;
-  let prShort;
-  let providerName;
-  if (isGitLab) {
-    pr = "merge request";
-    prShort = "MR";
-    providerName = "GitLab";
-  } else if (isAzureDevOps) {
-    pr = "pull request";
-    prShort = "PR";
-    providerName = "Azure DevOps";
-  } else {
-    pr = "pull request";
-    prShort = "PR";
-    providerName = "GitHub";
-  }
+  const pr = isGitLab ? "merge request" : "pull request";
+  const prShort = isGitLab ? "MR" : "PR";

  const terms = {
    pr,
    prShort,
-    pushToBranch: `Please push the changes to a remote branch on ${providerName}, but do NOT create a ${pr}. Please use the exact SAME branch name as the one you are currently on.`,
-    createPR: `Please push the changes to ${providerName} and open a ${pr}. Please create a meaningful branch name that describes the changes. If a ${pr} template exists in the repository, please follow it when creating the ${prShort} description.`,
+    pushToBranch: `Please push the changes to a remote branch on ${
+      isGitLab ? "GitLab" : "GitHub"
+    }, but do NOT create a ${pr}. Please use the exact SAME branch name as the one you are currently on.`,
+    createPR: `Please push the changes to ${
+      isGitLab ? "GitLab" : "GitHub"
+    } and open a ${pr}. Please create a meaningful branch name that describes the changes. If a ${pr} template exists in the repository, please follow it when creating the ${prShort} description.`,
    pushToPR: `Please push the latest changes to the existing ${pr}.`,
  };

--- a/frontend/src/components/features/chat/chat-interface.tsx
+++ b/frontend/src/components/features/chat/chat-interface.tsx
@@ -18,6 +18,7 @@ import { useWsClient } from "#/context/ws-client-provider";
 import { Messages } from "./messages";
 import { ChatSuggestions } from "./chat-suggestions";
 import { ActionSuggestions } from "./action-suggestions";
+import { ScrollProvider } from "#/context/scroll-context";

 import { ScrollToBottomButton } from "#/components/shared/buttons/scroll-to-bottom-button";
 import { LoadingSpinner } from "#/components/shared/loading-spinner";
@@ -28,6 +29,7 @@ import { useOptimisticUserMessage } from "#/hooks/use-optimistic-user-message";
 import { useWSErrorMessage } from "#/hooks/use-ws-error-message";
 import { ErrorMessageBanner } from "./error-message-banner";
 import { shouldRenderEvent } from "./event-content-helpers/should-render-event";
+import { useConfig } from "#/hooks/query/use-config";

 function getEntryPoint(
  hasRepository: boolean | null,
@@ -45,8 +47,15 @@ export function ChatInterface() {
    useOptimisticUserMessage();
  const { t } = useTranslation();
  const scrollRef = React.useRef<HTMLDivElement>(null);
-  const { scrollDomToBottom, onChatBodyScroll, hitBottom } =
-    useScrollToBottom(scrollRef);
+  const {
+    scrollDomToBottom,
+    onChatBodyScroll,
+    hitBottom,
+    autoScroll,
+    setAutoScroll,
+    setHitBottom,
+  } = useScrollToBottom(scrollRef);
+  const { data: config } = useConfig();

  const { curAgentState } = useSelector((state: RootState) => state.agent);

@@ -126,80 +135,97 @@ export function ChatInterface() {
    curAgentState === AgentState.AWAITING_USER_INPUT ||
    curAgentState === AgentState.FINISHED;

+  // Create a ScrollProvider with the scroll hook values
+  const scrollProviderValue = {
+    scrollRef,
+    autoScroll,
+    setAutoScroll,
+    scrollDomToBottom,
+    hitBottom,
+    setHitBottom,
+    onChatBodyScroll,
+  };
+
  return (
-    <div className="h-full flex flex-col justify-between">
-      {events.length === 0 && !optimisticUserMessage && (
-        <ChatSuggestions onSuggestionsClick={setMessageToSend} />
-      )}
-
-      <div
-        ref={scrollRef}
-        onScroll={(e) => onChatBodyScroll(e.currentTarget)}
-        className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
-      >
-        {isLoadingMessages && (
-          <div className="flex justify-center">
-            <LoadingSpinner size="small" />
-          </div>
+    <ScrollProvider value={scrollProviderValue}>
+      <div className="h-full flex flex-col justify-between">
+        {events.length === 0 && !optimisticUserMessage && (
+          <ChatSuggestions onSuggestionsClick={setMessageToSend} />
        )}

-        {!isLoadingMessages && (
-          <Messages
-            messages={events}
-            isAwaitingUserConfirmation={
-              curAgentState === AgentState.AWAITING_USER_CONFIRMATION
-            }
-          />
-        )}
+        <div
+          ref={scrollRef}
+          onScroll={(e) => onChatBodyScroll(e.currentTarget)}
+          className="scrollbar scrollbar-thin scrollbar-thumb-gray-400 scrollbar-thumb-rounded-full scrollbar-track-gray-800 hover:scrollbar-thumb-gray-300 flex flex-col grow overflow-y-auto overflow-x-hidden px-4 pt-4 gap-2 fast-smooth-scroll"
+        >
+          {isLoadingMessages && (
+            <div className="flex justify-center">
+              <LoadingSpinner size="small" />
+            </div>
+          )}

-        {isWaitingForUserInput &&
-          events.length > 0 &&
-          !optimisticUserMessage && (
-            <ActionSuggestions
-              onSuggestionsClick={(value) => handleSendMessage(value, [])}
+          {!isLoadingMessages && (
+            <Messages
+              messages={events}
+              isAwaitingUserConfirmation={
+                curAgentState === AgentState.AWAITING_USER_CONFIRMATION
+              }
            />
          )}
-      </div>

-      <div className="flex flex-col gap-[6px] px-4 pb-4">
-        <div className="flex justify-between relative">
-          <TrajectoryActions
-            onPositiveFeedback={() =>
-              onClickShareFeedbackActionButton("positive")
-            }
-            onNegativeFeedback={() =>
-              onClickShareFeedbackActionButton("negative")
-            }
-            onExportTrajectory={() => onClickExportTrajectoryButton()}
-          />
-
-          <div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
-            {curAgentState === AgentState.RUNNING && <TypingIndicator />}
-          </div>
-
-          {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
+          {isWaitingForUserInput &&
+            events.length > 0 &&
+            !optimisticUserMessage && (
+              <ActionSuggestions
+                onSuggestionsClick={(value) => handleSendMessage(value, [])}
+              />
+            )}
        </div>

-        {errorMessage && <ErrorMessageBanner message={errorMessage} />}
+        <div className="flex flex-col gap-[6px] px-4 pb-4">
+          <div className="flex justify-between relative">
+            {config?.APP_MODE !== "saas" && (
+              <TrajectoryActions
+                onPositiveFeedback={() =>
+                  onClickShareFeedbackActionButton("positive")
+                }
+                onNegativeFeedback={() =>
+                  onClickShareFeedbackActionButton("negative")
+                }
+                onExportTrajectory={() => onClickExportTrajectoryButton()}
+              />
+            )}

-        <InteractiveChatBox
-          onSubmit={handleSendMessage}
-          onStop={handleStop}
-          isDisabled={
-            curAgentState === AgentState.LOADING ||
-            curAgentState === AgentState.AWAITING_USER_CONFIRMATION
-          }
-          mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
-          value={messageToSend ?? undefined}
-          onChange={setMessageToSend}
-        />
+            <div className="absolute left-1/2 transform -translate-x-1/2 bottom-0">
+              {curAgentState === AgentState.RUNNING && <TypingIndicator />}
+            </div>
+
+            {!hitBottom && <ScrollToBottomButton onClick={scrollDomToBottom} />}
+          </div>
+
+          {errorMessage && <ErrorMessageBanner message={errorMessage} />}
+
+          <InteractiveChatBox
+            onSubmit={handleSendMessage}
+            onStop={handleStop}
+            isDisabled={
+              curAgentState === AgentState.LOADING ||
+              curAgentState === AgentState.AWAITING_USER_CONFIRMATION
+            }
+            mode={curAgentState === AgentState.RUNNING ? "stop" : "submit"}
+            value={messageToSend ?? undefined}
+            onChange={setMessageToSend}
+          />
+        </div>
+
+        {config?.APP_MODE !== "saas" && (
+          <FeedbackModal
+            isOpen={feedbackModalIsOpen}
+            onClose={() => setFeedbackModalIsOpen(false)}
+            polarity={feedbackPolarity}
+          />
+        )}
      </div>
-
-      <FeedbackModal
-        isOpen={feedbackModalIsOpen}
-        onClose={() => setFeedbackModalIsOpen(false)}
-        polarity={feedbackPolarity}
-      />
-    </div>
+    </ScrollProvider>
  );
 }
--- a/frontend/src/components/features/chat/event-message.tsx
+++ b/frontend/src/components/features/chat/event-message.tsx
@@ -1,3 +1,4 @@
+import React from "react";
 import { ConfirmationButtons } from "#/components/shared/buttons/confirmation-buttons";
 import { OpenHandsAction } from "#/types/core/actions";
 import {
@@ -18,6 +19,10 @@ import { MCPObservationContent } from "./mcp-observation-content";
 import { getObservationResult } from "./event-content-helpers/get-observation-result";
 import { getEventContent } from "./event-content-helpers/get-event-content";
 import { GenericEventMessage } from "./generic-event-message";
+import { LikertScale } from "../feedback/likert-scale";
+
+import { useConfig } from "#/hooks/query/use-config";
+import { useFeedbackExists } from "#/hooks/query/use-feedback-exists";

 const hasThoughtProperty = (
  obj: Record<string, unknown>,
@@ -39,6 +44,14 @@ export function EventMessage({
  const shouldShowConfirmationButtons =
    isLastMessage && event.source === "agent" && isAwaitingUserConfirmation;

+  const { data: config } = useConfig();
+
+  // Use our query hook to check if feedback exists and get rating/reason
+  const {
+    data: feedbackData = { exists: false },
+    isLoading: isCheckingFeedback,
+  } = useFeedbackExists(isFinishAction(event) ? event.id : undefined);
+
  if (isErrorObservation(event)) {
    return (
      <ErrorMessage
@@ -55,9 +68,25 @@ export function EventMessage({
    return null;
  }

+  const showLikertScale =
+    config?.APP_MODE === "saas" &&
+    isFinishAction(event) &&
+    isLastMessage &&
+    !isCheckingFeedback;
+
  if (isFinishAction(event)) {
    return (
-      <ChatMessage type="agent" message={getEventContent(event).details} />
+      <>
+        <ChatMessage type="agent" message={getEventContent(event).details} />
+        {showLikertScale && (
+          <LikertScale
+            eventId={event.id}
+            initiallySubmitted={feedbackData.exists}
+            initialRating={feedbackData.rating}
+            initialReason={feedbackData.reason}
+          />
+        )}
+      </>
    );
  }

--- a/frontend/src/components/features/feedback/likert-scale.tsx
+++ b/frontend/src/components/features/feedback/likert-scale.tsx
@@ -0,0 +1,248 @@
+import React, { useState, useEffect, useContext } from "react";
+import { cn } from "#/utils/utils";
+import i18n from "#/i18n";
+import { useSubmitConversationFeedback } from "#/hooks/mutation/use-submit-conversation-feedback";
+import { ScrollContext } from "#/context/scroll-context";
+
+// Global timeout duration in milliseconds
+const AUTO_SUBMIT_TIMEOUT = 10000;
+
+interface LikertScaleProps {
+  eventId?: number;
+  initiallySubmitted?: boolean;
+  initialRating?: number;
+  initialReason?: string;
+}
+
+const FEEDBACK_REASONS = [
+  i18n.t("FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION"),
+  i18n.t("FEEDBACK$REASON_FORGOT_CONTEXT"),
+  i18n.t("FEEDBACK$REASON_UNNECESSARY_CHANGES"),
+  i18n.t("FEEDBACK$REASON_OTHER"),
+];
+
+export function LikertScale({
+  eventId,
+  initiallySubmitted = false,
+  initialRating,
+  initialReason,
+}: LikertScaleProps) {
+  const [selectedRating, setSelectedRating] = useState<number | null>(
+    initialRating || null,
+  );
+  const [selectedReason, setSelectedReason] = useState<string | null>(
+    initialReason || null,
+  );
+  const [showReasons, setShowReasons] = useState(false);
+  const [reasonTimeout, setReasonTimeout] = useState<NodeJS.Timeout | null>(
+    null,
+  );
+  const [isSubmitted, setIsSubmitted] = useState(initiallySubmitted);
+  const [countdown, setCountdown] = useState<number>(0);
+
+  // Get scroll context
+  const scrollContext = useContext(ScrollContext);
+
+  // If scrollContext is undefined, we're not inside a ScrollProvider
+  const scrollToBottom = scrollContext?.scrollDomToBottom;
+  const autoScroll = scrollContext?.autoScroll;
+
+  // Use our mutation hook
+  const { mutate: submitConversationFeedback } =
+    useSubmitConversationFeedback();
+
+  // Update isSubmitted if initiallySubmitted changes
+  useEffect(() => {
+    setIsSubmitted(initiallySubmitted);
+  }, [initiallySubmitted]);
+
+  // Update selectedRating if initialRating changes
+  useEffect(() => {
+    if (initialRating) {
+      setSelectedRating(initialRating);
+    }
+  }, [initialRating]);
+
+  // Update selectedReason if initialReason changes
+  useEffect(() => {
+    if (initialReason) {
+      setSelectedReason(initialReason);
+    }
+  }, [initialReason]);
+
+  // Submit feedback and disable the component
+  const submitFeedback = (rating: number, reason?: string) => {
+    submitConversationFeedback(
+      {
+        rating,
+        eventId,
+        reason,
+      },
+      {
+        onSuccess: () => {
+          setSelectedReason(reason || null);
+          setShowReasons(false);
+          setIsSubmitted(true);
+        },
+      },
+    );
+  };
+
+  // Handle star rating selection
+  const handleRatingClick = (rating: number) => {
+    if (isSubmitted) return; // Prevent changes after submission
+
+    setSelectedRating(rating);
+
+    // Only show reasons if rating is 3 or less (1, 2, or 3 stars)
+    // For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
+    if (rating <= 3) {
+      setShowReasons(true);
+      setCountdown(Math.ceil(AUTO_SUBMIT_TIMEOUT / 1000));
+
+      // Set a timeout to auto-submit if no reason is selected
+      const timeout = setTimeout(() => {
+        submitFeedback(rating);
+      }, AUTO_SUBMIT_TIMEOUT);
+
+      setReasonTimeout(timeout);
+
+      // Only scroll to bottom if the user is already at the bottom (autoScroll is true)
+      if (scrollToBottom && autoScroll) {
+        // Small delay to ensure the reasons are fully rendered
+        setTimeout(() => {
+          scrollToBottom();
+        }, 100);
+      }
+    } else {
+      // For ratings > 3 (4 or 5 stars), submit immediately without showing reasons
+      setShowReasons(false);
+      submitFeedback(rating);
+    }
+  };
+
+  // Handle reason selection
+  const handleReasonClick = (reason: string) => {
+    if (selectedRating && reasonTimeout && !isSubmitted) {
+      clearTimeout(reasonTimeout);
+      setCountdown(0);
+      submitFeedback(selectedRating, reason);
+    }
+  };
+
+  // Countdown effect
+  useEffect(() => {
+    if (countdown > 0 && showReasons && !isSubmitted) {
+      const timer = setTimeout(() => {
+        setCountdown(countdown - 1);
+      }, 1000);
+      return () => clearTimeout(timer);
+    }
+    return () => {};
+  }, [countdown, showReasons, isSubmitted]);
+
+  // Clean up timeout on unmount
+  useEffect(
+    () => () => {
+      if (reasonTimeout) {
+        clearTimeout(reasonTimeout);
+      }
+    },
+    [reasonTimeout],
+  );
+
+  // Scroll to bottom when component mounts, but only if user is already at the bottom
+  useEffect(() => {
+    if (scrollToBottom && autoScroll && !isSubmitted) {
+      // Small delay to ensure the component is fully rendered
+      setTimeout(() => {
+        scrollToBottom();
+      }, 100);
+    }
+  }, [scrollToBottom, autoScroll, isSubmitted]);
+
+  // Scroll to bottom when reasons are shown, but only if user is already at the bottom
+  useEffect(() => {
+    if (scrollToBottom && autoScroll && showReasons) {
+      // Small delay to ensure the reasons are fully rendered
+      setTimeout(() => {
+        scrollToBottom();
+      }, 100);
+    }
+  }, [scrollToBottom, autoScroll, showReasons]);
+
+  // Helper function to get button class based on state
+  const getButtonClass = (rating: number) => {
+    if (isSubmitted) {
+      return selectedRating && selectedRating >= rating
+        ? "text-yellow-400 cursor-not-allowed"
+        : "text-gray-300 opacity-50 cursor-not-allowed";
+    }
+
+    return selectedRating && selectedRating >= rating
+      ? "text-yellow-400"
+      : "text-gray-300 hover:text-yellow-200";
+  };
+
+  return (
+    <div className="mt-3 flex flex-col gap-1">
+      <div className="text-sm text-gray-500 mb-1">
+        {isSubmitted
+          ? i18n.t("FEEDBACK$THANK_YOU_FOR_FEEDBACK")
+          : i18n.t("FEEDBACK$RATE_AGENT_PERFORMANCE")}
+      </div>
+      <div className="flex flex-col gap-1">
+        <span className="flex gap-2 items-center flex-wrap">
+          {[1, 2, 3, 4, 5].map((rating) => (
+            <button
+              type="button"
+              key={rating}
+              onClick={() => handleRatingClick(rating)}
+              disabled={isSubmitted}
+              className={cn("text-xl transition-all", getButtonClass(rating))}
+              aria-label={`Rate ${rating} stars`}
+            >
+              ★
+            </button>
+          ))}
+          {/* Show selected reason inline with stars when submitted (only for ratings <= 3) */}
+          {isSubmitted &&
+            selectedReason &&
+            selectedRating &&
+            selectedRating <= 3 && (
+              <span className="text-sm text-gray-500 italic">
+                {selectedReason}
+              </span>
+            )}
+        </span>
+      </div>
+
+      {showReasons && !isSubmitted && (
+        <div className="mt-1 flex flex-col gap-1">
+          <div className="text-xs text-gray-500 mb-1">
+            {i18n.t("FEEDBACK$SELECT_REASON")}
+          </div>
+          {countdown > 0 && (
+            <div className="text-xs text-gray-400 mb-1 italic">
+              {i18n.t("FEEDBACK$SELECT_REASON_COUNTDOWN", {
+                countdown,
+              })}
+            </div>
+          )}
+          <div className="flex flex-col gap-0.5">
+            {FEEDBACK_REASONS.map((reason) => (
+              <button
+                type="button"
+                key={reason}
+                onClick={() => handleReasonClick(reason)}
+                className="text-sm text-left py-1 px-2 rounded hover:bg-gray-700 transition-colors"
+              >
+                {reason}
+              </button>
+            ))}
+          </div>
+        </div>
+      )}
+    </div>
+  );
+}
--- a/frontend/src/components/features/home/connect-to-provider-message.tsx
+++ b/frontend/src/components/features/home/connect-to-provider-message.tsx
@@ -10,7 +10,10 @@ export function ConnectToProviderMessage() {
  return (
    <div className="flex flex-col gap-4">
      <p>{t("HOME$CONNECT_PROVIDER_MESSAGE")}</p>
-      <Link data-testid="navigate-to-settings-button" to="/settings/git">
+      <Link
+        data-testid="navigate-to-settings-button"
+        to="/settings/integrations"
+      >
        <BrandButton type="button" variant="primary" isDisabled={isLoading}>
          {!isLoading && t("SETTINGS$TITLE")}
          {isLoading && t("HOME$LOADING")}
--- a/frontend/src/components/features/home/tasks/task-card.tsx
+++ b/frontend/src/components/features/home/tasks/task-card.tsx
@@ -54,17 +54,7 @@ export function TaskCard({ task }: TaskCardProps) {
    const issueType =
      task.task_type === "OPEN_ISSUE" ? "issues" : "merge_requests";
    href = `https://gitlab.com/${task.repo}/-/${issueType}/${task.issue_number}`;
-  } else if (task.git_provider === "azure_devops") {
-    // Azure DevOps URLs format: https://dev.azure.com/{organization}/{project}/_workitems/edit/{id}
-    // For pull requests: https://dev.azure.com/{organization}/{project}/_git/{repository}/pullrequest/{id}
-    const [project, repository] = task.repo.split("/");
-    if (task.task_type === "OPEN_ISSUE") {
-      href = `https://dev.azure.com/${project}/_workitems/edit/${task.issue_number}`;
-    } else {
-      href = `https://dev.azure.com/${project}/_git/${repository}/pullrequest/${task.issue_number}`;
-    }
  } else {
-    // Default to GitHub
    const hrefType = task.task_type === "OPEN_ISSUE" ? "issues" : "pull";
    href = `https://github.com/${task.repo}/${hrefType}/${task.issue_number}`;
  }
--- a/frontend/src/components/features/settings/git-settings/azure-devops-token-help-anchor.tsx
+++ b/frontend/src/components/features/settings/git-settings/azure-devops-token-help-anchor.tsx
@@ -1,30 +0,0 @@
-import { Trans } from "react-i18next";
-import { I18nKey } from "#/i18n/declaration";
-
-export function AzureDevOpsTokenHelpAnchor() {
-  return (
-    <p data-testid="azure-devops-token-help-anchor" className="text-xs">
-      <Trans
-        i18nKey={I18nKey.AZURE_DEVOPS$TOKEN_HELP_TEXT}
-        components={[
-          <a
-            key="azure-devops-token-help-anchor-link"
-            aria-label="Azure DevOps token help link"
-            href="https://dev.azure.com/_usersSettings/tokens"
-            target="_blank"
-            className="underline underline-offset-2"
-            rel="noopener noreferrer"
-          />,
-          <a
-            key="azure-devops-token-help-anchor-link-2"
-            aria-label="Azure DevOps token see more link"
-            href="https://learn.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate"
-            target="_blank"
-            className="underline underline-offset-2"
-            rel="noopener noreferrer"
-          />,
-        ]}
-      />
-    </p>
-  );
-}
--- a/frontend/src/components/features/settings/git-settings/azure-devops-token-input.tsx
+++ b/frontend/src/components/features/settings/git-settings/azure-devops-token-input.tsx
@@ -1,64 +0,0 @@
-import { useTranslation } from "react-i18next";
-import { I18nKey } from "#/i18n/declaration";
-import { SettingsInput } from "../settings-input";
-import { AzureDevOpsTokenHelpAnchor } from "./azure-devops-token-help-anchor";
-import { KeyStatusIcon } from "../key-status-icon";
-
-interface AzureDevOpsTokenInputProps {
-  onChange: (value: string) => void;
-  onAzureDevOpsHostChange: (value: string) => void;
-  isAzureDevOpsTokenSet: boolean;
-  name: string;
-  azureDevOpsHostSet: string | null | undefined;
-}
-
-export function AzureDevOpsTokenInput({
-  onChange,
-  onAzureDevOpsHostChange,
-  isAzureDevOpsTokenSet,
-  name,
-  azureDevOpsHostSet,
-}: AzureDevOpsTokenInputProps) {
-  const { t } = useTranslation();
-
-  return (
-    <div className="flex flex-col gap-6">
-      <SettingsInput
-        testId={name}
-        name={name}
-        onChange={onChange}
-        label={t(I18nKey.AZURE_DEVOPS$TOKEN_LABEL)}
-        type="password"
-        className="w-[680px]"
-        placeholder={isAzureDevOpsTokenSet ? "<hidden>" : ""}
-        startContent={
-          isAzureDevOpsTokenSet && (
-            <KeyStatusIcon
-              testId="ado-set-token-indicator"
-              isSet={isAzureDevOpsTokenSet}
-            />
-          )
-        }
-      />
-
-      <SettingsInput
-        onChange={onAzureDevOpsHostChange || (() => {})}
-        name="azure-devops-host-input"
-        testId="azure-devops-host-input"
-        label={t(I18nKey.AZURE_DEVOPS$HOST_LABEL)}
-        type="text"
-        className="w-[680px]"
-        placeholder="https://dev.azure.com/{your-org-name}"
-        defaultValue={azureDevOpsHostSet || undefined}
-        startContent={
-          azureDevOpsHostSet &&
-          azureDevOpsHostSet.trim() !== "" && (
-            <KeyStatusIcon testId="ado-set-host-indicator" isSet />
-          )
-        }
-      />
-
-      <AzureDevOpsTokenHelpAnchor />
-    </div>
-  );
-}
--- a/frontend/src/components/features/settings/git-settings/install-slack-app-anchor.tsx
+++ b/frontend/src/components/features/settings/git-settings/install-slack-app-anchor.tsx
@@ -0,0 +1,21 @@
+import { useTranslation } from "react-i18next";
+import { I18nKey } from "#/i18n/declaration";
+import { BrandButton } from "../brand-button";
+
+export function InstallSlackAppAnchor() {
+  const { t } = useTranslation();
+
+  return (
+    <a
+      data-testid="install-slack-app-button"
+      href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"
+      target="_blank"
+      rel="noreferrer noopener"
+      className="py-9"
+    >
+      <BrandButton type="button" variant="secondary">
+        {t(I18nKey.SLACK$INSTALL_APP)}
+      </BrandButton>
+    </a>
+  );
+}
--- a/frontend/src/components/features/waitlist/auth-modal.tsx
+++ b/frontend/src/components/features/waitlist/auth-modal.tsx
@@ -7,7 +7,6 @@ import { ModalBody } from "#/components/shared/modals/modal-body";
 import { BrandButton } from "../settings/brand-button";
 import GitHubLogo from "#/assets/branding/github-logo.svg?react";
 import GitLabLogo from "#/assets/branding/gitlab-logo.svg?react";
-import AzureDevOpsLogo from "#/assets/branding/azure-devops-logo.svg?react";
 import { useAuthUrl } from "#/hooks/use-auth-url";
 import { GetConfigResponse } from "#/api/open-hands.types";

@@ -24,11 +23,6 @@ export function AuthModal({ githubAuthUrl, appMode }: AuthModalProps) {
    identityProvider: "gitlab",
  });

-  const azureDevOpsAuthUrl = useAuthUrl({
-    appMode: appMode || null,
-    identityProvider: "azure_devops",
-  });
-
  const handleGitHubAuth = () => {
    if (githubAuthUrl) {
      // Always start the OIDC flow, let the backend handle TOS check
@@ -43,13 +37,6 @@ export function AuthModal({ githubAuthUrl, appMode }: AuthModalProps) {
    }
  };

-  const handleAzureDevOpsAuth = () => {
-    if (azureDevOpsAuthUrl) {
-      // Always start the OIDC flow, let the backend handle TOS check
-      window.location.href = azureDevOpsAuthUrl;
-    }
-  };
-
  return (
    <ModalBackdrop>
      <ModalBody className="border border-tertiary">
@@ -80,17 +67,6 @@ export function AuthModal({ githubAuthUrl, appMode }: AuthModalProps) {
          >
            {t(I18nKey.GITLAB$CONNECT_TO_GITLAB)}
          </BrandButton>
-
-          <BrandButton
-            type="button"
-            variant="primary"
-            onClick={handleAzureDevOpsAuth}
-            className="w-full"
-            startContent={<AzureDevOpsLogo width={20} height={20} />}
-          >
-            {t(I18nKey.AZURE_DEVOPS$CONNECT_TO_AZURE_DEVOPS) ||
-              "Connect to Azure DevOps"}
-          </BrandButton>
        </div>
      </ModalBody>
    </ModalBackdrop>
--- a/frontend/src/context/scroll-context.tsx
+++ b/frontend/src/context/scroll-context.tsx
@@ -0,0 +1,42 @@
+import React, { createContext, useContext, ReactNode, RefObject } from "react";
+import { useScrollToBottom } from "#/hooks/use-scroll-to-bottom";
+
+interface ScrollContextType {
+  scrollRef: RefObject<HTMLDivElement | null>;
+  autoScroll: boolean;
+  setAutoScroll: (value: boolean) => void;
+  scrollDomToBottom: () => void;
+  hitBottom: boolean;
+  setHitBottom: (value: boolean) => void;
+  onChatBodyScroll: (e: HTMLElement) => void;
+}
+
+export const ScrollContext = createContext<ScrollContextType | undefined>(
+  undefined,
+);
+
+interface ScrollProviderProps {
+  children: ReactNode;
+  value?: ScrollContextType;
+}
+
+export function ScrollProvider({ children, value }: ScrollProviderProps) {
+  const scrollHook = useScrollToBottom(React.useRef<HTMLDivElement>(null));
+
+  // Use provided value or default to the hook
+  const contextValue = value || scrollHook;
+
+  return (
+    <ScrollContext.Provider value={contextValue}>
+      {children}
+    </ScrollContext.Provider>
+  );
+}
+
+export function useScrollContext() {
+  const context = useContext(ScrollContext);
+  if (context === undefined) {
+    throw new Error("useScrollContext must be used within a ScrollProvider");
+  }
+  return context;
+}
--- a/frontend/src/hooks/mutation/use-submit-conversation-feedback.ts
+++ b/frontend/src/hooks/mutation/use-submit-conversation-feedback.ts
@@ -0,0 +1,39 @@
+import { useMutation, useQueryClient } from "@tanstack/react-query";
+import { useTranslation } from "react-i18next";
+import OpenHands from "#/api/open-hands";
+import { useConversationId } from "#/hooks/use-conversation-id";
+
+type SubmitConversationFeedbackArgs = {
+  rating: number;
+  eventId?: number;
+  reason?: string;
+};
+
+export const useSubmitConversationFeedback = () => {
+  const { conversationId } = useConversationId();
+  const queryClient = useQueryClient();
+  const { t } = useTranslation();
+
+  return useMutation({
+    mutationFn: ({ rating, eventId, reason }: SubmitConversationFeedbackArgs) =>
+      OpenHands.submitConversationFeedback(
+        conversationId,
+        rating,
+        eventId,
+        reason,
+      ),
+    onSuccess: (_, { eventId }) => {
+      // Invalidate the feedback existence query to trigger a refetch
+      if (eventId) {
+        queryClient.invalidateQueries({
+          queryKey: ["feedback", "exists", conversationId, eventId],
+        });
+      }
+    },
+    onError: (error) => {
+      // Log error but don't show toast - user will just see the UI stay in unsubmitted state
+      // eslint-disable-next-line no-console
+      console.error(t("FEEDBACK$FAILED_TO_SUBMIT"), error);
+    },
+  });
+};
--- a/frontend/src/hooks/query/use-feedback-exists.ts
+++ b/frontend/src/hooks/query/use-feedback-exists.ts
@@ -0,0 +1,24 @@
+import { useQuery } from "@tanstack/react-query";
+import OpenHands from "#/api/open-hands";
+import { useConversationId } from "#/hooks/use-conversation-id";
+
+export interface FeedbackData {
+  exists: boolean;
+  rating?: number;
+  reason?: string;
+}
+
+export const useFeedbackExists = (eventId?: number) => {
+  const { conversationId } = useConversationId();
+
+  return useQuery<FeedbackData>({
+    queryKey: ["feedback", "exists", conversationId, eventId],
+    queryFn: () => {
+      if (!eventId) return { exists: false };
+      return OpenHands.checkFeedbackExists(conversationId, eventId);
+    },
+    enabled: !!eventId,
+    staleTime: 1000 * 60 * 5, // 5 minutes
+    gcTime: 1000 * 60 * 15, // 15 minutes
+  });
+};
--- a/frontend/src/hooks/use-auto-login.ts
+++ b/frontend/src/hooks/use-auto-login.ts
@@ -15,7 +15,7 @@ export const useAutoLogin = () => {
  // Get the stored login method
  const loginMethod = getLoginMethod();

-  // Get the auth URLs for all providers
+  // Get the auth URLs for both providers
  const githubAuthUrl = useAuthUrl({
    appMode: config?.APP_MODE || null,
    identityProvider: "github",
@@ -26,11 +26,6 @@ export const useAutoLogin = () => {
    identityProvider: "gitlab",
  });

-  const azureDevOpsAuthUrl = useAuthUrl({
-    appMode: config?.APP_MODE || null,
-    identityProvider: "azure_devops",
-  });
-
  useEffect(() => {
    // Only auto-login in SAAS mode
    if (config?.APP_MODE !== "saas") {
@@ -53,14 +48,8 @@ export const useAutoLogin = () => {
    }

    // Get the appropriate auth URL based on the stored login method
-    let authUrl: string | null = null;
-    if (loginMethod === LoginMethod.GITHUB) {
-      authUrl = githubAuthUrl;
-    } else if (loginMethod === LoginMethod.GITLAB) {
-      authUrl = gitlabAuthUrl;
-    } else if (loginMethod === LoginMethod.AZURE_DEVOPS) {
-      authUrl = azureDevOpsAuthUrl;
-    }
+    const authUrl =
+      loginMethod === LoginMethod.GITHUB ? githubAuthUrl : gitlabAuthUrl;

    // If we have an auth URL, redirect to it
    if (authUrl) {
@@ -79,6 +68,5 @@ export const useAutoLogin = () => {
    loginMethod,
    githubAuthUrl,
    gitlabAuthUrl,
-    azureDevOpsAuthUrl,
  ]);
 };
--- a/frontend/src/i18n/declaration.ts
+++ b/frontend/src/i18n/declaration.ts
@@ -80,7 +80,7 @@ export enum I18nKey {
  ANALYTICS$CONFIRM_PREFERENCES = "ANALYTICS$CONFIRM_PREFERENCES",
  SETTINGS$SAVING = "SETTINGS$SAVING",
  SETTINGS$SAVE_CHANGES = "SETTINGS$SAVE_CHANGES",
-  SETTINGS$NAV_GIT = "SETTINGS$NAV_GIT",
+  SETTINGS$NAV_INTEGRATIONS = "SETTINGS$NAV_INTEGRATIONS",
  SETTINGS$NAV_APPLICATION = "SETTINGS$NAV_APPLICATION",
  SETTINGS$NAV_CREDITS = "SETTINGS$NAV_CREDITS",
  SETTINGS$NAV_SECRETS = "SETTINGS$NAV_SECRETS",
@@ -174,6 +174,7 @@ export enum I18nKey {
  GITHUB$TOKEN_INVALID = "GITHUB$TOKEN_INVALID",
  BUTTON$DISCONNECT = "BUTTON$DISCONNECT",
  GITHUB$CONFIGURE_REPOS = "GITHUB$CONFIGURE_REPOS",
+  SLACK$INSTALL_APP = "SLACK$INSTALL_APP",
  COMMON$CLICK_FOR_INSTRUCTIONS = "COMMON$CLICK_FOR_INSTRUCTIONS",
  LLM$SELECT_MODEL_PLACEHOLDER = "LLM$SELECT_MODEL_PLACEHOLDER",
  LLM$MODEL = "LLM$MODEL",
@@ -508,16 +509,6 @@ export enum I18nKey {
  SETTINGS_FORM$BASE_URL = "SETTINGS_FORM$BASE_URL",
  GITHUB$CONNECT_TO_GITHUB = "GITHUB$CONNECT_TO_GITHUB",
  GITLAB$CONNECT_TO_GITLAB = "GITLAB$CONNECT_TO_GITLAB",
-  AZURE_DEVOPS$CONNECT_TO_AZURE_DEVOPS = "AZURE_DEVOPS$CONNECT_TO_AZURE_DEVOPS",
-  AZURE_DEVOPS$TOKEN_LABEL = "AZURE_DEVOPS$TOKEN_LABEL",
-  AZURE_DEVOPS$HOST_LABEL = "AZURE_DEVOPS$HOST_LABEL",
-  AZURE_DEVOPS$HOST_HELP_TEXT = "AZURE_DEVOPS$HOST_HELP_TEXT",
-  AZURE_DEVOPS$HOST_REQUIRED_ERROR = "AZURE_DEVOPS$HOST_REQUIRED_ERROR",
-  AZURE_DEVOPS$TOKEN_REQUIRED_ERROR = "AZURE_DEVOPS$TOKEN_REQUIRED_ERROR",
-  AZURE_DEVOPS$GET_TOKEN = "AZURE_DEVOPS$GET_TOKEN",
-  AZURE_DEVOPS$TOKEN_HELP_TEXT = "AZURE_DEVOPS$TOKEN_HELP_TEXT",
-  AZURE_DEVOPS$TOKEN_LINK_TEXT = "AZURE_DEVOPS$TOKEN_LINK_TEXT",
-  AZURE_DEVOPS$INSTRUCTIONS_LINK_TEXT = "AZURE_DEVOPS$INSTRUCTIONS_LINK_TEXT",
  AUTH$SIGN_IN_WITH_IDENTITY_PROVIDER = "AUTH$SIGN_IN_WITH_IDENTITY_PROVIDER",
  WAITLIST$JOIN_WAITLIST = "WAITLIST$JOIN_WAITLIST",
  ACCOUNT_SETTINGS$ADDITIONAL_SETTINGS = "ACCOUNT_SETTINGS$ADDITIONAL_SETTINGS",
@@ -593,4 +584,13 @@ export enum I18nKey {
  SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE = "SETTINGS$EMAIL_VERIFICATION_RESTRICTION_MESSAGE",
  SETTINGS$RESEND_VERIFICATION = "SETTINGS$RESEND_VERIFICATION",
  SETTINGS$FAILED_TO_RESEND_VERIFICATION = "SETTINGS$FAILED_TO_RESEND_VERIFICATION",
+  FEEDBACK$RATE_AGENT_PERFORMANCE = "FEEDBACK$RATE_AGENT_PERFORMANCE",
+  FEEDBACK$SELECT_REASON = "FEEDBACK$SELECT_REASON",
+  FEEDBACK$SELECT_REASON_COUNTDOWN = "FEEDBACK$SELECT_REASON_COUNTDOWN",
+  FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION = "FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION",
+  FEEDBACK$REASON_FORGOT_CONTEXT = "FEEDBACK$REASON_FORGOT_CONTEXT",
+  FEEDBACK$REASON_UNNECESSARY_CHANGES = "FEEDBACK$REASON_UNNECESSARY_CHANGES",
+  FEEDBACK$REASON_OTHER = "FEEDBACK$REASON_OTHER",
+  FEEDBACK$THANK_YOU_FOR_FEEDBACK = "FEEDBACK$THANK_YOU_FOR_FEEDBACK",
+  FEEDBACK$FAILED_TO_SUBMIT = "FEEDBACK$FAILED_TO_SUBMIT",
 }
--- a/frontend/src/i18n/translation.json
+++ b/frontend/src/i18n/translation.json
@@ -1279,21 +1279,21 @@
        "de": "Änderungen speichern",
        "uk": "Зберегти зміни"
    },
-    "SETTINGS$NAV_GIT": {
-        "en": "Git",
-        "ja": "Git",
-        "zh-CN": "Git",
-        "zh-TW": "Git",
-        "ko-KR": "Git",
-        "no": "Git",
-        "it": "Git",
-        "pt": "Git",
-        "es": "Git",
-        "ar": "Git",
-        "fr": "Git",
-        "tr": "Git",
-        "de": "Git",
-        "uk": "Git"
+    "SETTINGS$NAV_INTEGRATIONS": {
+        "en": "Integrations",
+        "ja": "統合",
+        "zh-CN": "集成",
+        "zh-TW": "整合",
+        "ko-KR": "통합",
+        "no": "Integrasjoner",
+        "it": "Integrazioni",
+        "pt": "Integrações",
+        "es": "Integraciones",
+        "ar": "التكامل",
+        "fr": "Intégrations",
+        "tr": "Entegrasyonlar",
+        "de": "Integrationen",
+        "uk": "Інтеграції"
    },
    "SETTINGS$NAV_APPLICATION": {
        "en": "Application",
@@ -2783,6 +2783,22 @@
        "de": "GitHub-Repositories konfigurieren",
        "uk": "Налаштування репозиторіїв Github"
    },
+    "SLACK$INSTALL_APP": {
+        "en": "Install OpenHands Slack App",
+        "ja": "OpenHands Slackアプリをインストール",
+        "zh-CN": "安装 OpenHands Slack 应用",
+        "zh-TW": "安裝 OpenHands Slack 應用程式",
+        "ko-KR": "OpenHands Slack 앱 설치",
+        "no": "Installer OpenHands Slack-app",
+        "it": "Installa l'app Slack di OpenHands",
+        "pt": "Instalar aplicativo Slack do OpenHands",
+        "es": "Instalar aplicación Slack de OpenHands",
+        "ar": "تثبيت تطبيق OpenHands Slack",
+        "fr": "Installer l'application Slack OpenHands",
+        "tr": "OpenHands Slack uygulamasını yükle",
+        "de": "OpenHands Slack-App installieren",
+        "uk": "Встановити додаток OpenHands Slack"
+    },
    "COMMON$CLICK_FOR_INSTRUCTIONS": {
        "en": "Click here for instructions",
        "ja": "手順はこちらをクリック",
@@ -8127,166 +8143,6 @@
        "tr": "GitLab'a bağlan",
        "uk": "Увійти за допомогою GitLab"
    },
-    "AZURE_DEVOPS$CONNECT_TO_AZURE_DEVOPS": {
-        "en": "Log in with Azure DevOps",
-        "ja": "Azure DevOpsに接続",
-        "zh-CN": "连接到Azure DevOps",
-        "zh-TW": "連接到Azure DevOps",
-        "ko-KR": "Azure DevOps에 연결",
-        "de": "Mit Azure DevOps verbinden",
-        "no": "Koble til Azure DevOps",
-        "it": "Connetti a Azure DevOps",
-        "pt": "Conectar ao Azure DevOps",
-        "es": "Conectar a Azure DevOps",
-        "ar": "الاتصال بـ Azure DevOps",
-        "fr": "Se connecter à Azure DevOps",
-        "tr": "Azure DevOps'a bağlan",
-        "uk": "Увійти за допомогою Azure DevOps"
-    },
-    "AZURE_DEVOPS$TOKEN_LABEL": {
-        "en": "Azure DevOps Token",
-        "ja": "Azure DevOpsトークン",
-        "zh-CN": "Azure DevOps令牌",
-        "zh-TW": "Azure DevOps權杖",
-        "ko-KR": "Azure DevOps 토큰",
-        "no": "Azure DevOps-token",
-        "it": "Token Azure DevOps",
-        "pt": "Token do Azure DevOps",
-        "es": "Token de Azure DevOps",
-        "ar": "رمز Azure DevOps",
-        "fr": "Jeton Azure DevOps",
-        "tr": "Azure DevOps Token",
-        "de": "Azure DevOps-Token",
-        "uk": "Токен Azure DevOps"
-    },
-    "AZURE_DEVOPS$HOST_LABEL": {
-        "en": "Azure DevOps Organization URL (Required)",
-        "ja": "Azure DevOps組織URL（必須）",
-        "zh-CN": "Azure DevOps组织URL（必需）",
-        "zh-TW": "Azure DevOps組織URL（必需）",
-        "ko-KR": "Azure DevOps 조직 URL (필수)",
-        "no": "Azure DevOps organisasjons-URL (påkrevd)",
-        "it": "URL organizzazione Azure DevOps (obbligatorio)",
-        "pt": "URL da organização Azure DevOps (obrigatório)",
-        "es": "URL de organización de Azure DevOps (requerido)",
-        "ar": "رابط منظمة Azure DevOps (مطلوب)",
-        "fr": "URL d'organisation Azure DevOps (requis)",
-        "tr": "Azure DevOps Organizasyon URL'si (gerekli)",
-        "de": "Azure DevOps-Organisations-URL (erforderlich)",
-        "uk": "URL організації Azure DevOps (обов'язково)"
-    },
-    "AZURE_DEVOPS$HOST_HELP_TEXT": {
-        "en": "Enter your organization URL (e.g., dev.azure.com/your-org). This is required because Azure DevOps tokens are organization-scoped.",
-        "ja": "組織URL（例：dev.azure.com/your-org）を入力してください。Azure DevOpsトークンは組織スコープのため、これは必須です。",
-        "zh-CN": "输入您的组织URL（例如：dev.azure.com/your-org）。这是必需的，因为Azure DevOps令牌是组织范围的。",
-        "zh-TW": "輸入您的組織URL（例如：dev.azure.com/your-org）。這是必需的，因為Azure DevOps權杖是組織範圍的。",
-        "ko-KR": "조직 URL을 입력하세요 (예: dev.azure.com/your-org). Azure DevOps 토큰이 조직 범위이므로 필수입니다.",
-        "no": "Skriv inn organisasjons-URL (f.eks. dev.azure.com/your-org). Dette er påkrevd fordi Azure DevOps-tokens er organisasjonsbegrenset.",
-        "it": "Inserisci l'URL della tua organizzazione (es. dev.azure.com/your-org). Questo è obbligatorio perché i token Azure DevOps sono limitati all'organizzazione.",
-        "pt": "Digite a URL da sua organização (ex: dev.azure.com/your-org). Isso é obrigatório porque os tokens do Azure DevOps são limitados à organização.",
-        "es": "Ingrese la URL de su organización (ej: dev.azure.com/your-org). Esto es requerido porque los tokens de Azure DevOps están limitados a la organización.",
-        "ar": "أدخل رابط منظمتك (مثل: dev.azure.com/your-org). هذا مطلوب لأن رموز Azure DevOps محدودة النطاق للمنظمة.",
-        "fr": "Entrez l'URL de votre organisation (ex: dev.azure.com/your-org). Ceci est requis car les jetons Azure DevOps sont limités à l'organisation.",
-        "tr": "Organizasyon URL'nizi girin (örn: dev.azure.com/your-org). Azure DevOps tokenları organizasyon kapsamlı olduğu için bu gereklidir.",
-        "de": "Geben Sie Ihre Organisations-URL ein (z.B. dev.azure.com/your-org). Dies ist erforderlich, da Azure DevOps-Token organisationsbezogen sind.",
-        "uk": "Введіть URL вашої організації (наприклад: dev.azure.com/your-org). Це обов'язково, оскільки токени Azure DevOps обмежені організацією."
-    },
-    "AZURE_DEVOPS$HOST_REQUIRED_ERROR": {
-        "en": "Organization URL is required when Azure DevOps token is provided.",
-        "ja": "Azure DevOpsトークンが提供されている場合、組織URLが必要です。",
-        "zh-CN": "提供Azure DevOps令牌时需要组织URL。",
-        "zh-TW": "提供Azure DevOps權杖時需要組織URL。",
-        "ko-KR": "Azure DevOps 토큰이 제공될 때 조직 URL이 필요합니다.",
-        "no": "Organisasjons-URL kreves når Azure DevOps-token er oppgitt.",
-        "it": "L'URL dell'organizzazione è richiesto quando viene fornito il token Azure DevOps.",
-        "pt": "A URL da organização é necessária quando o token do Azure DevOps é fornecido.",
-        "es": "Se requiere la URL de la organización cuando se proporciona el token de Azure DevOps.",
-        "ar": "رابط المنظمة مطلوب عند توفير رمز Azure DevOps.",
-        "fr": "L'URL d'organisation est requise lorsque le jeton Azure DevOps est fourni.",
-        "tr": "Azure DevOps jetonu sağlandığında organizasyon URL'si gereklidir.",
-        "de": "Organisations-URL ist erforderlich, wenn Azure DevOps-Token bereitgestellt wird.",
-        "uk": "URL організації потрібен, коли надається токен Azure DevOps."
-    },
-    "AZURE_DEVOPS$TOKEN_REQUIRED_ERROR": {
-        "en": "Azure DevOps token is required when organization URL is provided.",
-        "ja": "組織URLが提供されている場合、Azure DevOpsトークンが必要です。",
-        "zh-CN": "提供组织URL时需要Azure DevOps令牌。",
-        "zh-TW": "提供組織URL時需要Azure DevOps權杖。",
-        "ko-KR": "조직 URL이 제공될 때 Azure DevOps 토큰이 필요합니다.",
-        "no": "Azure DevOps-token kreves når organisasjons-URL er oppgitt.",
-        "it": "Il token Azure DevOps è richiesto quando viene fornito l'URL dell'organizzazione.",
-        "pt": "O token do Azure DevOps é necessário quando a URL da organização é fornecida.",
-        "es": "Se requiere el token de Azure DevOps cuando se proporciona la URL de la organización.",
-        "ar": "رمز Azure DevOps مطلوب عند توفير رابط المنظمة.",
-        "fr": "Le jeton Azure DevOps est requis lorsque l'URL d'organisation est fournie.",
-        "tr": "Organizasyon URL'si sağlandığında Azure DevOps jetonu gereklidir.",
-        "de": "Azure DevOps-Token ist erforderlich, wenn Organisations-URL bereitgestellt wird.",
-        "uk": "Токен Azure DevOps потрібен, коли надається URL організації."
-    },
-    "AZURE_DEVOPS$GET_TOKEN": {
-        "en": "Get an Azure DevOps token",
-        "ja": "Azure DevOpsトークンを取得",
-        "zh-CN": "获取Azure DevOps令牌",
-        "zh-TW": "獲取Azure DevOps權杖",
-        "ko-KR": "Azure DevOps 토큰 받기",
-        "no": "Få et Azure DevOps-token",
-        "it": "Ottieni un token Azure DevOps",
-        "pt": "Obter um token do Azure DevOps",
-        "es": "Obtener un token de Azure DevOps",
-        "ar": "الحصول على رمز Azure DevOps",
-        "fr": "Obtenir un jeton Azure DevOps",
-        "tr": "Azure DevOps token al",
-        "de": "Azure DevOps-Token erhalten",
-        "uk": "Отримати токен Azure DevOps"
-    },
-    "AZURE_DEVOPS$TOKEN_HELP_TEXT": {
-        "en": "Get your <0>Azure DevOps personal access token</0> or <1>click here for instructions</1>.",
-        "ja": "<0>Azure DevOps個人アクセストークン</0>を取得するか、<1>手順についてはここをクリック</1>。",
-        "zh-CN": "获取您的<0>Azure DevOps个人访问令牌</0>或<1>点击此处获取说明</1>。",
-        "zh-TW": "取得您的<0>Azure DevOps個人存取權杖</0>或<1>點擊此處獲取說明</1>。",
-        "ko-KR": "<0>Azure DevOps 개인 액세스 토큰</0>을 받거나 <1>지침을 보려면 여기를 클릭</1>하세요.",
-        "no": "Få ditt <0>Azure DevOps personlige tilgangstoken</0> eller <1>klikk her for instruksjoner</1>.",
-        "it": "Ottieni il tuo <0>token di accesso personale Azure DevOps</0> o <1>clicca qui per istruzioni</1>.",
-        "pt": "Obtenha seu <0>token de acesso pessoal do Azure DevOps</0> ou <1>clique aqui para instruções</1>.",
-        "es": "Obtenga su <0>token de acceso personal de Azure DevOps</0> o <1>haga clic aquí para obtener instrucciones</1>.",
-        "ar": "احصل على <0>رمز الوصول الشخصي Azure DevOps</0> الخاص بك أو <1>انقر هنا للحصول على تعليمات</1>.",
-        "fr": "Obtenez votre <0>jeton d'accès personnel Azure DevOps</0> ou <1>cliquez ici pour les instructions</1>.",
-        "tr": "<0>Azure DevOps kişisel erişim jetonunuzu</0> alın veya <1>talimatlar için buraya tıklayın</1>.",
-        "de": "Holen Sie sich Ihr <0>Azure DevOps Personal Access Token</0> oder <1>klicken Sie hier für Anweisungen</1>.",
-        "uk": "Отримайте свій <0>особистий токен доступу Azure DevOps</0> або <1>натисніть тут, щоб отримати інструкції</1>."
-    },
-    "AZURE_DEVOPS$TOKEN_LINK_TEXT": {
-        "en": "Azure DevOps personal access token",
-        "ja": "Azure DevOps個人アクセストークン",
-        "zh-CN": "Azure DevOps个人访问令牌",
-        "zh-TW": "Azure DevOps個人存取權杖",
-        "ko-KR": "Azure DevOps 개인 액세스 토큰",
-        "no": "Azure DevOps personlige tilgangstoken",
-        "it": "token di accesso personale Azure DevOps",
-        "pt": "token de acesso pessoal do Azure DevOps",
-        "es": "token de acceso personal de Azure DevOps",
-        "ar": "رمز الوصول الشخصي Azure DevOps",
-        "fr": "jeton d'accès personnel Azure DevOps",
-        "tr": "Azure DevOps kişisel erişim jetonu",
-        "de": "Azure DevOps Personal Access Token",
-        "uk": "особистий токен доступу Azure DevOps"
-    },
-    "AZURE_DEVOPS$INSTRUCTIONS_LINK_TEXT": {
-        "en": "click here for instructions",
-        "ja": "手順についてはここをクリック",
-        "zh-CN": "点击此处获取说明",
-        "zh-TW": "點擊此處獲取說明",
-        "ko-KR": "지침을 보려면 여기를 클릭",
-        "no": "klikk her for instruksjoner",
-        "it": "clicca qui per istruzioni",
-        "pt": "clique aqui para instruções",
-        "es": "haga clic aquí para obtener instrucciones",
-        "ar": "انقر هنا للحصول على تعليمات",
-        "fr": "cliquez ici pour les instructions",
-        "tr": "talimatlar için buraya tıklayın",
-        "de": "klicken Sie hier für Anweisungen",
-        "uk": "натисніть тут, щоб отримати інструкції"
-    },
    "AUTH$SIGN_IN_WITH_IDENTITY_PROVIDER": {
        "en": "Log in to OpenHands",
        "ja": "IDプロバイダーでサインイン",
@@ -9486,5 +9342,149 @@
        "tr": "Doğrulama e-postası yeniden gönderilemedi",
        "de": "Bestätigungs-E-Mail konnte nicht erneut gesendet werden",
        "uk": "Не вдалося повторно надіслати лист підтвердження"
+    },
+    "FEEDBACK$RATE_AGENT_PERFORMANCE": {
+        "en": "Rate the agent's performance:",
+        "ja": "エージェントのパフォーマンスを評価してください：",
+        "zh-CN": "评价代理的表现：",
+        "zh-TW": "評價代理的表現：",
+        "ko-KR": "에이전트의 성능을 평가하세요:",
+        "no": "Vurder agentens ytelse:",
+        "it": "Valuta le prestazioni dell'agente:",
+        "pt": "Avalie o desempenho do agente:",
+        "es": "Evalúe el rendimiento del agente:",
+        "ar": "قيم أداء الوكيل:",
+        "fr": "Évaluez la performance de l'agent :",
+        "tr": "Ajanın performansını değerlendirin:",
+        "de": "Bewerten Sie die Leistung des Agenten:",
+        "uk": "Оцініть продуктивність агента:"
+    },
+    "FEEDBACK$SELECT_REASON": {
+        "en": "Select a reason (optional):",
+        "ja": "理由を選択してください（任意）：",
+        "zh-CN": "选择原因（可选）：",
+        "zh-TW": "選擇原因（可選）：",
+        "ko-KR": "이유 선택 (선택 사항):",
+        "no": "Velg en grunn (valgfritt):",
+        "it": "Seleziona un motivo (opzionale):",
+        "pt": "Selecione um motivo (opcional):",
+        "es": "Seleccione un motivo (opcional):",
+        "ar": "حدد سببًا (اختياري):",
+        "fr": "Sélectionnez une raison (facultatif) :",
+        "tr": "Bir neden seçin (isteğe bağlı):",
+        "de": "Wählen Sie einen Grund (optional):",
+        "uk": "Виберіть причину (необов'язково):"
+    },
+    "FEEDBACK$SELECT_REASON_COUNTDOWN": {
+        "en": "Auto-submitting in {{countdown}} seconds...",
+        "ja": "{{countdown}}秒後に自動送信されます...",
+        "zh-CN": "{{countdown}}秒后自动提交...",
+        "zh-TW": "{{countdown}}秒後自動提交...",
+        "ko-KR": "{{countdown}}초 후 자동 제출...",
+        "no": "Sender automatisk om {{countdown}} sekunder...",
+        "it": "Invio automatico tra {{countdown}} secondi...",
+        "pt": "Enviando automaticamente em {{countdown}} segundos...",
+        "es": "Enviando automáticamente en {{countdown}} segundos...",
+        "ar": "الإرسال التلقائي خلال {{countdown}} ثانية...",
+        "fr": "Envoi automatique dans {{countdown}} secondes...",
+        "tr": "{{countdown}} saniye içinde otomatik gönderilecek...",
+        "de": "Automatische Übermittlung in {{countdown}} Sekunden...",
+        "uk": "Автоматична відправка через {{countdown}} секунд..."
+    },
+    "FEEDBACK$REASON_MISUNDERSTOOD_INSTRUCTION": {
+        "en": "The agent misunderstood my instruction",
+        "ja": "エージェントは私の指示を誤解しました",
+        "zh-CN": "代理误解了我的指示",
+        "zh-TW": "代理誤解了我的指示",
+        "ko-KR": "에이전트가 내 지시를 잘못 이해했습니다",
+        "no": "Agenten misforsto instruksjonene mine",
+        "it": "L'agente ha frainteso le mie istruzioni",
+        "pt": "O agente não entendeu minhas instruções",
+        "es": "El agente malinterpretó mis instrucciones",
+        "ar": "أساء الوكيل فهم تعليماتي",
+        "fr": "L'agent a mal compris mes instructions",
+        "tr": "Ajan talimatlarımı yanlış anladı",
+        "de": "Der Agent hat meine Anweisungen missverstanden",
+        "uk": "Агент неправильно зрозумів мої інструкції"
+    },
+    "FEEDBACK$REASON_FORGOT_CONTEXT": {
+        "en": "The agent forgot about the earlier context",
+        "ja": "エージェントは以前のコンテキストを忘れました",
+        "zh-CN": "代理忘记了之前的上下文",
+        "zh-TW": "代理忘記了之前的上下文",
+        "ko-KR": "에이전트가 이전 컨텍스트를 잊었습니다",
+        "no": "Agenten glemte den tidligere konteksten",
+        "it": "L'agente ha dimenticato il contesto precedente",
+        "pt": "O agente esqueceu o contexto anterior",
+        "es": "El agente olvidó el contexto anterior",
+        "ar": "نسي الوكيل السياق السابق",
+        "fr": "L'agent a oublié le contexte précédent",
+        "tr": "Ajan önceki bağlamı unuttu",
+        "de": "Der Agent hat den früheren Kontext vergessen",
+        "uk": "Агент забув про попередній контекст"
+    },
+    "FEEDBACK$REASON_UNNECESSARY_CHANGES": {
+        "en": "The agent made unnecessary changes",
+        "ja": "エージェントは不要な変更を行いました",
+        "zh-CN": "代理进行了不必要的更改",
+        "zh-TW": "代理進行了不必要的更改",
+        "ko-KR": "에이전트가 불필요한 변경을 했습니다",
+        "no": "Agenten gjorde unødvendige endringer",
+        "it": "L'agente ha apportato modifiche non necessarie",
+        "pt": "O agente fez alterações desnecessárias",
+        "es": "El agente hizo cambios innecesarios",
+        "ar": "قام الوكيل بتغييرات غير ضرورية",
+        "fr": "L'agent a apporté des modifications inutiles",
+        "tr": "Ajan gereksiz değişiklikler yaptı",
+        "de": "Der Agent hat unnötige Änderungen vorgenommen",
+        "uk": "Агент зробив непотрібні зміни"
+    },
+    "FEEDBACK$REASON_OTHER": {
+        "en": "Other",
+        "ja": "その他",
+        "zh-CN": "其他",
+        "zh-TW": "其他",
+        "ko-KR": "기타",
+        "no": "Annet",
+        "it": "Altro",
+        "pt": "Outro",
+        "es": "Otro",
+        "ar": "أخرى",
+        "fr": "Autre",
+        "tr": "Diğer",
+        "de": "Andere",
+        "uk": "Інше"
+    },
+    "FEEDBACK$THANK_YOU_FOR_FEEDBACK": {
+        "en": "Thank you for your feedback! This will help us improve OpenHands going forward.",
+        "ja": "フィードバックをありがとうございます！これにより、今後OpenHandsを改善していくことができます。",
+        "zh-CN": "感谢您的反馈！这将帮助我们改进OpenHands。",
+        "zh-TW": "感謝您的反饋！這將幫助我們改進OpenHands。",
+        "ko-KR": "피드백 감사합니다! 이를 통해 OpenHands를 개선해 나가겠습니다.",
+        "no": "Takk for tilbakemeldingen! Dette vil hjelpe oss med å forbedre OpenHands fremover.",
+        "it": "Grazie per il tuo feedback! Questo ci aiuterà a migliorare OpenHands in futuro.",
+        "pt": "Obrigado pelo seu feedback! Isso nos ajudará a melhorar o OpenHands no futuro.",
+        "es": "¡Gracias por su comentario! Esto nos ayudará a mejorar OpenHands en el futuro.",
+        "ar": "شكرا على ملاحظاتك! سيساعدنا هذا في تحسين OpenHands في المستقبل.",
+        "fr": "Merci pour votre retour ! Cela nous aidera à améliorer OpenHands à l'avenir.",
+        "tr": "Geri bildiriminiz için teşekkürler! Bu, OpenHands'i ileride geliştirmemize yardımcı olacak.",
+        "de": "Vielen Dank für Ihr Feedback! Das hilft uns, OpenHands in Zukunft zu verbessern.",
+        "uk": "Дякуємо за ваш відгук! Це допоможе нам покращити OpenHands у майбутньому."
+    },
+    "FEEDBACK$FAILED_TO_SUBMIT": {
+        "en": "Failed to submit feedback",
+        "ja": "フィードバックの送信に失敗しました",
+        "zh-CN": "提交反馈失败",
+        "zh-TW": "提交反饋失敗",
+        "ko-KR": "피드백 제출 실패",
+        "no": "Kunne ikke sende tilbakemelding",
+        "it": "Impossibile inviare feedback",
+        "pt": "Falha ao enviar feedback",
+        "es": "Error al enviar comentarios",
+        "ar": "فشل في تقديم التعليقات",
+        "fr": "Échec de l'envoi des commentaires",
+        "tr": "Geri bildirim gönderilemedi",
+        "de": "Feedback konnte nicht gesendet werden",
+        "uk": "Не вдалося надіслати відгук"
    }
 }
--- a/frontend/src/routes.ts
+++ b/frontend/src/routes.ts
@@ -13,7 +13,7 @@ export default [
      index("routes/llm-settings.tsx"),
      route("mcp", "routes/mcp-settings.tsx"),
      route("user", "routes/user-settings.tsx"),
-      route("git", "routes/git-settings.tsx"),
+      route("integrations", "routes/git-settings.tsx"),
      route("app", "routes/app-settings.tsx"),
      route("billing", "routes/billing.tsx"),
      route("secrets", "routes/secrets-settings.tsx"),
--- a/frontend/src/routes/git-settings.tsx
+++ b/frontend/src/routes/git-settings.tsx
@@ -6,8 +6,8 @@ import { BrandButton } from "#/components/features/settings/brand-button";
 import { useLogout } from "#/hooks/mutation/use-logout";
 import { GitHubTokenInput } from "#/components/features/settings/git-settings/github-token-input";
 import { GitLabTokenInput } from "#/components/features/settings/git-settings/gitlab-token-input";
-import { AzureDevOpsTokenInput } from "#/components/features/settings/git-settings/azure-devops-token-input";
 import { ConfigureGitHubRepositoriesAnchor } from "#/components/features/settings/git-settings/configure-github-repositories-anchor";
+import { InstallSlackAppAnchor } from "#/components/features/settings/git-settings/install-slack-app-anchor";
 import { I18nKey } from "#/i18n/declaration";
 import {
  displayErrorToast,
@@ -33,24 +33,18 @@ function GitSettingsScreen() {
    React.useState(false);
  const [gitlabTokenInputHasValue, setGitlabTokenInputHasValue] =
    React.useState(false);
-  const [azureDevOpsTokenInputHasValue, setAzureDevOpsTokenInputHasValue] =
-    React.useState(false);

  const [githubHostInputHasValue, setGithubHostInputHasValue] =
    React.useState(false);
  const [gitlabHostInputHasValue, setGitlabHostInputHasValue] =
    React.useState(false);
-  const [azureDevOpsHostInputHasValue, setAzureDevOpsHostInputHasValue] =
-    React.useState(false);

  const existingGithubHost = settings?.PROVIDER_TOKENS_SET.github;
  const existingGitlabHost = settings?.PROVIDER_TOKENS_SET.gitlab;
-  const existingAzureDevOpsHost = settings?.PROVIDER_TOKENS_SET.azure_devops;

  const isSaas = config?.APP_MODE === "saas";
  const isGitHubTokenSet = providers.includes("github");
  const isGitLabTokenSet = providers.includes("gitlab");
-  const isAzureDevOpsTokenSet = providers.includes("azure_devops");

  const formAction = async (formData: FormData) => {
    const disconnectButtonClicked =
@@ -63,33 +57,14 @@ function GitSettingsScreen() {

    const githubToken = formData.get("github-token-input")?.toString() || "";
    const gitlabToken = formData.get("gitlab-token-input")?.toString() || "";
-    const azureDevOpsToken =
-      formData.get("azure-devops-token-input")?.toString() || "";
    const githubHost = formData.get("github-host-input")?.toString() || "";
    const gitlabHost = formData.get("gitlab-host-input")?.toString() || "";
-    const azureDevOpsHost =
-      formData.get("azure-devops-host-input")?.toString() || "";
-
-    // Validate Azure DevOps token and host dependency
-    const hasAzureDevOpsToken = azureDevOpsToken.trim() !== "";
-    const hasAzureDevOpsHost = azureDevOpsHost.trim() !== "";
-
-    if (hasAzureDevOpsToken && !hasAzureDevOpsHost) {
-      displayErrorToast(t(I18nKey.AZURE_DEVOPS$HOST_REQUIRED_ERROR));
-      return;
-    }
-
-    if (hasAzureDevOpsHost && !hasAzureDevOpsToken) {
-      displayErrorToast(t(I18nKey.AZURE_DEVOPS$TOKEN_REQUIRED_ERROR));
-      return;
-    }

    saveGitProviders(
      {
        providers: {
          github: { token: githubToken, host: githubHost },
          gitlab: { token: gitlabToken, host: gitlabHost },
-          azure_devops: { token: azureDevOpsToken, host: azureDevOpsHost },
        },
      },
      {
@@ -103,10 +78,8 @@ function GitSettingsScreen() {
        onSettled: () => {
          setGithubTokenInputHasValue(false);
          setGitlabTokenInputHasValue(false);
-          setAzureDevOpsTokenInputHasValue(false);
          setGithubHostInputHasValue(false);
          setGitlabHostInputHasValue(false);
-          setAzureDevOpsHostInputHasValue(false);
        },
      },
    );
@@ -115,10 +88,8 @@ function GitSettingsScreen() {
  const formIsClean =
    !githubTokenInputHasValue &&
    !gitlabTokenInputHasValue &&
-    !azureDevOpsTokenInputHasValue &&
    !githubHostInputHasValue &&
-    !gitlabHostInputHasValue &&
-    !azureDevOpsHostInputHasValue;
+    !gitlabHostInputHasValue;
  const shouldRenderExternalConfigureButtons = isSaas && config.APP_SLUG;

  return (
@@ -133,6 +104,10 @@ function GitSettingsScreen() {
            <ConfigureGitHubRepositoriesAnchor slug={config.APP_SLUG!} />
          )}

+          {shouldRenderExternalConfigureButtons && !isLoading && (
+            <InstallSlackAppAnchor />
+          )}
+
          {!isSaas && (
            <GitHubTokenInput
              name="github-token-input"
@@ -141,7 +116,7 @@ function GitSettingsScreen() {
                setGithubTokenInputHasValue(!!value);
              }}
              onGitHubHostChange={(value) => {
-                setGithubHostInputHasValue(!!value);
+                setGitlabHostInputHasValue(!!value);
              }}
              githubHostSet={existingGithubHost}
            />
@@ -160,20 +135,6 @@ function GitSettingsScreen() {
              gitlabHostSet={existingGitlabHost}
            />
          )}
-
-          {!isSaas && (
-            <AzureDevOpsTokenInput
-              name="azure-devops-token-input"
-              isAzureDevOpsTokenSet={isAzureDevOpsTokenSet}
-              onChange={(value) => {
-                setAzureDevOpsTokenInputHasValue(!!value);
-              }}
-              onAzureDevOpsHostChange={(value) => {
-                setAzureDevOpsHostInputHasValue(!!value);
-              }}
-              azureDevOpsHostSet={existingAzureDevOpsHost}
-            />
-          )}
        </div>
      )}

@@ -187,9 +148,7 @@ function GitSettingsScreen() {
              name="disconnect-tokens-button"
              type="submit"
              variant="secondary"
-              isDisabled={
-                !isGitHubTokenSet && !isGitLabTokenSet && !isAzureDevOpsTokenSet
-              }
+              isDisabled={!isGitHubTokenSet && !isGitLabTokenSet}
            >
              Disconnect Tokens
            </BrandButton>
--- a/frontend/src/routes/secrets-settings.tsx
+++ b/frontend/src/routes/secrets-settings.tsx
@@ -84,7 +84,11 @@ function SecretsSettingsScreen() {
      )}

      {shouldRenderConnectToGitButton && (
-        <Link to="/settings/git" data-testid="connect-git-button" type="button">
+        <Link
+          to="/settings/integrations"
+          data-testid="connect-git-button"
+          type="button"
+        >
          <BrandButton type="button" variant="secondary">
            Connect a Git provider to manage secrets
          </BrandButton>
--- a/frontend/src/routes/settings.tsx
+++ b/frontend/src/routes/settings.tsx
@@ -16,7 +16,7 @@ function SettingsScreen() {

  const saasNavItems = [
    { to: "/settings/user", text: t("SETTINGS$NAV_USER") },
-    { to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
+    { to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
    { to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
    { to: "/settings/billing", text: t("SETTINGS$NAV_CREDITS") },
    { to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
@@ -26,7 +26,7 @@ function SettingsScreen() {
  const ossNavItems = [
    { to: "/settings", text: t("SETTINGS$NAV_LLM") },
    { to: "/settings/mcp", text: t("SETTINGS$NAV_MCP") },
-    { to: "/settings/git", text: t("SETTINGS$NAV_GIT") },
+    { to: "/settings/integrations", text: t("SETTINGS$NAV_INTEGRATIONS") },
    { to: "/settings/app", text: t("SETTINGS$NAV_APPLICATION") },
    { to: "/settings/secrets", text: t("SETTINGS$NAV_SECRETS") },
  ];
--- a/frontend/src/types/settings.ts
+++ b/frontend/src/types/settings.ts
@@ -1,7 +1,6 @@
 export const ProviderOptions = {
  github: "github",
  gitlab: "gitlab",
-  azure_devops: "azure_devops",
 } as const;

 export type Provider = keyof typeof ProviderOptions;
--- a/frontend/src/utils/generate-auth-url.ts
+++ b/frontend/src/utils/generate-auth-url.ts
@@ -1,6 +1,6 @@
 /**
 * Generates a URL to redirect to for OAuth authentication
- * @param identityProvider The identity provider to use (e.g., "github", "gitlab", "azure_devops")
+ * @param identityProvider The identity provider to use (e.g., "github", "gitlab")
 * @param requestUrl The URL of the request
 * @returns The URL to redirect to for OAuth
 */
--- a/frontend/src/utils/local-storage.ts
+++ b/frontend/src/utils/local-storage.ts
@@ -7,12 +7,11 @@ export const LOCAL_STORAGE_KEYS = {
 export enum LoginMethod {
  GITHUB = "github",
  GITLAB = "gitlab",
-  AZURE_DEVOPS = "azure_devops",
 }

 /**
 * Set the login method in local storage
- * @param method The login method (github, gitlab, or azure_devops)
+ * @param method The login method (github or gitlab)
 */
 export const setLoginMethod = (method: LoginMethod): void => {
  localStorage.setItem(LOCAL_STORAGE_KEYS.LOGIN_METHOD, method);
--- a/microagents/azure_devops.md
+++ b/microagents/azure_devops.md
@@ -1,188 +0,0 @@
---
-name: azure_devops
-type: knowledge
-version: 1.0.0
-agent: CodeActAgent
-triggers:
- azure devops
- azure
- devops
---
-
-<ROLE>
-You are an Azure DevOps expert who can help users interact with Azure DevOps repositories, work items, and pull requests.
-</ROLE>
-
-<AZURE_DEVOPS_INTEGRATION>
-OpenHands supports Azure DevOps integration similar to GitHub and GitLab. You can use the `AZURE_DEVOPS_TOKEN` environment variable to authenticate with Azure DevOps.
-
-## Authentication
-To use Azure DevOps with OpenHands, you need a Personal Access Token (PAT) with appropriate permissions:
-1. Go to your Azure DevOps organization settings
-2. Select "Personal access tokens"
-3. Create a new token with the following scopes:
-   - Code (Read & Write)
-   - Work Items (Read & Write)
-   - Pull Request Threads (Read & Write)
-
-## Repository Format
-When working with Azure DevOps repositories in OpenHands, use the following format:
- Repository name: `project/repo`
- Organization: Your Azure DevOps organization name
-
-## Environment Variables
- `AZURE_DEVOPS_TOKEN`: Your Azure DevOps Personal Access Token
-
-## Common Operations
- Clone a repository: `git clone https://dev.azure.com/organization/project/_git/repo`
- Create a pull request: Use the Azure DevOps API or web interface
- Work with issues: Azure DevOps uses work items instead of issues
-
-## Azure DevOps API
-OpenHands uses the official Azure DevOps Python API to interact with Azure DevOps. The API is available at https://github.com/microsoft/azure-devops-python-api.
-
-```python
-from azure.devops.connection import Connection
-from msrest.authentication import BasicAuthentication
-import os
-
-# Authentication
-personal_access_token = os.environ.get('AZURE_DEVOPS_TOKEN')
-organization_url = 'https://dev.azure.com/your-organization'
-
-# Create a connection
-credentials = BasicAuthentication('', personal_access_token)
-connection = Connection(base_url=organization_url, creds=credentials)
-
-# Get clients
-git_client = connection.clients.get_git_client()
-work_item_client = connection.clients.get_work_item_tracking_client()
-
-# Example: Get repositories
-repositories = git_client.get_repositories()
-for repo in repositories:
-    print(f"{repo.name} - {repo.url}")
-
-# Example: Get work items
-work_items = work_item_client.get_work_items(ids=[1, 2, 3])
-for work_item in work_items:
-    print(f"{work_item.id} - {work_item.fields['System.Title']}")
-```
-</AZURE_DEVOPS_INTEGRATION>
-
-<TROUBLESHOOTING>
-## Common Issues and Solutions
-
-### Authentication Errors
- **Error**: "TF401019: The Git repository with name or identifier X does not exist or you do not have permissions for the operation you are attempting."
- **Solution**: Check that your PAT has the correct permissions and that you're using the correct organization, project, and repository names.
-
-### Repository Format
- **Error**: "Invalid repository name format: X. Expected format: project/repo"
- **Solution**: Make sure you're using the correct format for repository names: `project/repo`.
-
-### API Limitations
- Azure DevOps API has rate limits. If you encounter rate limit errors, add delays between API calls.
- Some operations may require additional permissions beyond what's listed above.
-
-### Work Item Types
- Azure DevOps uses different work item types (Bug, Task, User Story, etc.) instead of the Issue concept in GitHub/GitLab.
- When working with work items, make sure to specify the correct work item type.
-</TROUBLESHOOTING>
-
-<BEST_PRACTICES>
-## Best Practices for Azure DevOps
-
-### Repository Structure
- Use a clear branching strategy (e.g., GitFlow, trunk-based development)
- Protect your main branch with branch policies
-
-### Pull Requests
- Use descriptive titles and descriptions
- Link work items to pull requests
- Use the "Squash merge" option to keep history clean
-
-### Work Items
- Use the appropriate work item type for each task
- Maintain a clear hierarchy of work items
- Use tags for better organization
-
-### CI/CD Pipelines
- Store pipeline definitions as YAML in your repository
- Use templates for common tasks
- Leverage variable groups for secrets management
-</BEST_PRACTICES>
-
-<EXAMPLES>
-## Example Commands
-
-### Clone a Repository
-```bash
-git clone https://dev.azure.com/organization/project/_git/repo
-```
-
-### Create a Branch
-```bash
-git checkout -b feature/new-feature
-```
-
-### Push Changes
-```bash
-git add .
-git commit -m "Add new feature"
-git push -u origin feature/new-feature
-```
-
-### Create a Pull Request (using API)
-```python
-from azure.devops.connection import Connection
-from msrest.authentication import BasicAuthentication
-import os
-
-# Authentication
-personal_access_token = os.environ.get('AZURE_DEVOPS_TOKEN')
-organization_url = 'https://dev.azure.com/your-organization'
-
-# Create a connection
-credentials = BasicAuthentication('', personal_access_token)
-connection = Connection(base_url=organization_url, creds=credentials)
-
-# Get Git client
-git_client = connection.clients.get_git_client()
-
-# Create pull request
-pr = git_client.create_pull_request(
-    git_pull_request={
-        'source_ref_name': 'refs/heads/feature/new-feature',
-        'target_ref_name': 'refs/heads/main',
-        'title': 'Add new feature',
-        'description': 'This PR adds a new feature'
-    },
-    repository_id='repository-id',
-    project='project-name'
-)
-```
-
-### Get Work Items
-```python
-from azure.devops.connection import Connection
-from msrest.authentication import BasicAuthentication
-import os
-
-# Authentication
-personal_access_token = os.environ.get('AZURE_DEVOPS_TOKEN')
-organization_url = 'https://dev.azure.com/your-organization'
-
-# Create a connection
-credentials = BasicAuthentication('', personal_access_token)
-connection = Connection(base_url=organization_url, creds=credentials)
-
-# Get Work Item Tracking client
-wit_client = connection.clients.get_work_item_tracking_client()
-
-# Get work items
-work_items = wit_client.get_work_items(ids=[1, 2, 3])
-for work_item in work_items:
-    print(f"{work_item.id} - {work_item.fields['System.Title']}")
-```
-</EXAMPLES>
--- a/openhands/agenthub/browsing_agent/browsing_agent.py
+++ b/openhands/agenthub/browsing_agent/browsing_agent.py
@@ -125,9 +125,9 @@ class BrowsingAgent(Agent):
        self.reset()

    def reset(self) -> None:
-        """Resets the Browsing Agent."""
+        """Resets the Browsing Agent's internal state."""
        super().reset()
-        self.cost_accumulator = 0
+        # Reset agent-specific counters but not LLM metrics
        self.error_accumulator = 0

    def step(self, state: State) -> Action:
--- a/openhands/agenthub/codeact_agent/codeact_agent.py
+++ b/openhands/agenthub/codeact_agent/codeact_agent.py
@@ -136,8 +136,9 @@ class CodeActAgent(Agent):
        return tools

    def reset(self) -> None:
-        """Resets the CodeAct Agent."""
+        """Resets the CodeAct Agent's internal state."""
        super().reset()
+        # Only clear pending actions, not LLM metrics
        self.pending_actions.clear()

    def step(self, state: State) -> 'Action':
--- a/openhands/agenthub/dummy_agent/agent.py
+++ b/openhands/agenthub/dummy_agent/agent.py
@@ -119,14 +119,14 @@ class DummyAgent(Agent):
        ]

    def step(self, state: State) -> Action:
-        if state.iteration >= len(self.steps):
+        if state.iteration_flag.current_value >= len(self.steps):
            return AgentFinishAction()

-        current_step = self.steps[state.iteration]
+        current_step = self.steps[state.iteration_flag.current_value]
        action = current_step['action']

-        if state.iteration > 0:
-            prev_step = self.steps[state.iteration - 1]
+        if state.iteration_flag.current_value > 0:
+            prev_step = self.steps[state.iteration_flag.current_value - 1]

            if 'observations' in prev_step and prev_step['observations']:
                expected_observations = prev_step['observations']
--- a/openhands/agenthub/visualbrowsing_agent/visualbrowsing_agent.py
+++ b/openhands/agenthub/visualbrowsing_agent/visualbrowsing_agent.py
@@ -176,9 +176,9 @@ Note:
        self.reset()

    def reset(self) -> None:
-        """Resets the VisualBrowsingAgent."""
+        """Resets the VisualBrowsingAgent's internal state."""
        super().reset()
-        self.cost_accumulator = 0
+        # Reset agent-specific counters but not LLM metrics
        self.error_accumulator = 0

    def step(self, state: State) -> Action:
--- a/openhands/cli/main.py
+++ b/openhands/cli/main.py
@@ -8,6 +8,7 @@ from prompt_toolkit.formatted_text import HTML
 from prompt_toolkit.shortcuts import clear

 import openhands.agenthub  # noqa F401 (we import this to get the agents registered)
+import openhands.cli.suppress_warnings  # noqa: F401
 from openhands.cli.commands import (
    check_folder_security_agreement,
    handle_commands,
@@ -273,9 +274,9 @@ async def run_session(
            )
        )

-        config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
+        runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)

-        await add_mcp_tools_to_agent(agent, runtime, memory, config)
+        await add_mcp_tools_to_agent(agent, runtime, memory)

    # Clear loading animation
    is_loaded.set()
--- a/openhands/cli/settings.py
+++ b/openhands/cli/settings.py
@@ -215,10 +215,18 @@ async def modify_llm_settings_basic(
            ]
            provider_models = VERIFIED_ANTHROPIC_MODELS + provider_models

-        # Set default model to the first model in the list (which will be a verified model if available)
-        default_model = (
-            provider_models[0] if provider_models else 'claude-sonnet-4-20250514'
-        )
+        # Set default model to the best verified model for the provider
+        if provider == 'anthropic' and VERIFIED_ANTHROPIC_MODELS:
+            # Use the first model in the VERIFIED_ANTHROPIC_MODELS list as it's the best/newest
+            default_model = VERIFIED_ANTHROPIC_MODELS[0]
+        elif provider == 'openai' and VERIFIED_OPENAI_MODELS:
+            # Use the first model in the VERIFIED_OPENAI_MODELS list as it's the best/newest
+            default_model = VERIFIED_OPENAI_MODELS[0]
+        else:
+            # For other providers, use the first model in the list
+            default_model = (
+                provider_models[0] if provider_models else 'claude-sonnet-4-20250514'
+            )

        # Show the default model but allow changing it
        print_formatted_text(
--- a/openhands/cli/suppress_warnings.py
+++ b/openhands/cli/suppress_warnings.py
@@ -0,0 +1,10 @@
+"""Module to suppress common warnings."""
+
+import warnings
+
+# Suppress pydub warning about ffmpeg/avconv
+warnings.filterwarnings(
+    'ignore',
+    message="Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work",
+    category=RuntimeWarning,
+)
--- a/openhands/cli/utils.py
+++ b/openhands/cli/utils.py
@@ -158,17 +158,17 @@ VERIFIED_OPENAI_MODELS = [
 ]

 VERIFIED_ANTHROPIC_MODELS = [
-    'claude-2',
-    'claude-2.1',
-    'claude-3-5-sonnet-20240620',
-    'claude-3-5-sonnet-20241022',
-    'claude-3-5-haiku-20241022',
-    'claude-3-haiku-20240307',
-    'claude-3-opus-20240229',
-    'claude-3-sonnet-20240229',
-    'claude-3-7-sonnet-20250219',
    'claude-sonnet-4-20250514',
    'claude-opus-4-20250514',
+    'claude-3-7-sonnet-20250219',
+    'claude-3-sonnet-20240229',
+    'claude-3-opus-20240229',
+    'claude-3-haiku-20240307',
+    'claude-3-5-haiku-20241022',
+    'claude-3-5-sonnet-20241022',
+    'claude-3-5-sonnet-20240620',
+    'claude-2.1',
+    'claude-2',
 ]


--- a/openhands/controller/agent.py
+++ b/openhands/controller/agent.py
@@ -103,16 +103,10 @@ class Agent(ABC):
        pass

    def reset(self) -> None:
-        """Resets the agent's execution status and clears the history. This method can be used
-        to prepare the agent for restarting the instruction or cleaning up before destruction.
-
-        """
-        # TODO clear history
+        """Resets the agent's execution status."""
+        # Only reset the completion status, not the LLM metrics
        self._complete = False

-        if self.llm:
-            self.llm.reset()
-
    @property
    def name(self) -> str:
        return self.__class__.__name__
--- a/openhands/controller/agent_controller.py
+++ b/openhands/controller/agent_controller.py
@@ -7,7 +7,6 @@ import time
 import traceback
 from typing import Callable

-import litellm  # noqa
 from litellm.exceptions import (  # noqa
    APIConnectionError,
    APIError,
@@ -25,7 +24,8 @@ from litellm.exceptions import (  # noqa

 from openhands.controller.agent import Agent
 from openhands.controller.replay import ReplayManager
-from openhands.controller.state.state import State, TrafficControlState
+from openhands.controller.state.state import State
+from openhands.controller.state.state_tracker import StateTracker
 from openhands.controller.stuck import StuckDetector
 from openhands.core.config import AgentConfig, LLMConfig
 from openhands.core.exceptions import (
@@ -61,7 +61,6 @@ from openhands.events.action import (
 )
 from openhands.events.action.agent import CondensationAction, RecallAction
 from openhands.events.event import Event
-from openhands.events.event_filter import EventFilter
 from openhands.events.observation import (
    AgentDelegateObservation,
    AgentStateChangedObservation,
@@ -69,10 +68,11 @@ from openhands.events.observation import (
    NullObservation,
    Observation,
 )
-from openhands.events.serialization.event import event_to_trajectory, truncate_content
+from openhands.events.serialization.event import truncate_content
 from openhands.llm.llm import LLM
 from openhands.llm.metrics import Metrics, TokenUsage
 from openhands.memory.view import View
+from openhands.storage.files import FileStore

 # note: RESUME is only available on web GUI
 TRAFFIC_CONTROL_REMINDER = (
@@ -101,11 +101,13 @@ class AgentController:
        self,
        agent: Agent,
        event_stream: EventStream,
-        max_iterations: int,
-        max_budget_per_task: float | None = None,
+        iteration_delta: int,
+        budget_per_task_delta: float | None = None,
        agent_to_llm_config: dict[str, LLMConfig] | None = None,
        agent_configs: dict[str, AgentConfig] | None = None,
        sid: str | None = None,
+        file_store: FileStore | None = None,
+        user_id: str | None = None,
        confirmation_mode: bool = False,
        initial_state: State | None = None,
        is_delegate: bool = False,
@@ -132,7 +134,10 @@ class AgentController:
            status_callback: Optional callback function to handle status updates.
            replay_events: A list of logs to replay.
        """
+
        self.id = sid or event_stream.sid
+        self.user_id = user_id
+        self.file_store = file_store
        self.agent = agent
        self.headless_mode = headless_mode
        self.is_delegate = is_delegate
@@ -146,29 +151,22 @@ class AgentController:
                EventStreamSubscriber.AGENT_CONTROLLER, self.on_event, self.id
            )

-        # filter out events that are not relevant to the agent
-        # so they will not be included in the agent history
-        self.agent_history_filter = EventFilter(
-            exclude_types=(
-                NullAction,
-                NullObservation,
-                ChangeAgentStateAction,
-                AgentStateChangedObservation,
-            ),
-            exclude_hidden=True,
-        )
+        self.state_tracker = StateTracker(sid, file_store, user_id)

        # state from the previous session, state from a parent agent, or a fresh state
        self.set_initial_state(
            state=initial_state,
-            max_iterations=max_iterations,
+            max_iterations=iteration_delta,
+            max_budget_per_task=budget_per_task_delta,
            confirmation_mode=confirmation_mode,
        )
-        self.max_budget_per_task = max_budget_per_task
+
+        self.state = self.state_tracker.state  # TODO: share between manager and controller for backward compatability; we should ideally move all state related logic to the state manager
+
        self.agent_to_llm_config = agent_to_llm_config if agent_to_llm_config else {}
        self.agent_configs = agent_configs if agent_configs else {}
-        self._initial_max_iterations = max_iterations
-        self._initial_max_budget_per_task = max_budget_per_task
+        self._initial_max_iterations = iteration_delta
+        self._initial_max_budget_per_task = budget_per_task_delta

        # stuck helper
        self._stuck_detector = StuckDetector(self.state)
@@ -181,7 +179,7 @@ class AgentController:
        self._add_system_message()

    def _add_system_message(self):
-        for event in self.event_stream.get_events(start_id=self.state.start_id):
+        for event in self.event_stream.search_events(start_id=self.state.start_id):
            if isinstance(event, MessageAction) and event.source == EventSource.USER:
                # FIXME: Remove this after 6/1/2025
                # Do not try to add a system message if we first run into
@@ -214,26 +212,7 @@ class AgentController:
        if set_stop_state:
            await self.set_agent_state_to(AgentState.STOPPED)

-        # we made history, now is the time to rewrite it!
-        # the final state.history will be used by external scripts like evals, tests, etc.
-        # history will need to be complete WITH delegates events
-        # like the regular agent history, it does not include:
-        # - 'hidden' events, events with hidden=True
-        # - backend events (the default 'filtered out' types, types in self.filter_out)
-        start_id = self.state.start_id if self.state.start_id >= 0 else 0
-        end_id = (
-            self.state.end_id
-            if self.state.end_id >= 0
-            else self.event_stream.get_latest_event_id()
-        )
-        self.state.history = list(
-            self.event_stream.search_events(
-                start_id=start_id,
-                end_id=end_id,
-                reverse=False,
-                filter=self.agent_history_filter,
-            )
-        )
+        self.state_tracker.close(self.event_stream)

        # unsubscribe from the event stream
        # only the root parent controller subscribes to the event stream
@@ -257,14 +236,6 @@ class AgentController:
        extra_merged = {'session_id': self.id, **extra}
        getattr(logger, level)(message, extra=extra_merged, stacklevel=2)

-    def update_state_before_step(self) -> None:
-        self.state.iteration += 1
-        self.state.local_iteration += 1
-
-    async def update_state_after_step(self) -> None:
-        # update metrics especially for cost. Use deepcopy to avoid it being modified by agent._reset()
-        self.state.local_metrics = copy.deepcopy(self.agent.llm.metrics)
-
    async def _react_to_exception(
        self,
        e: Exception,
@@ -390,10 +361,17 @@ class AgentController:
        # If we have a delegate that is not finished or errored, forward events to it
        if self.delegate is not None:
            delegate_state = self.delegate.get_agent_state()
-            if delegate_state not in (
-                AgentState.FINISHED,
-                AgentState.ERROR,
-                AgentState.REJECTED,
+            if (
+                delegate_state
+                not in (
+                    AgentState.FINISHED,
+                    AgentState.ERROR,
+                    AgentState.REJECTED,
+                )
+                or 'RuntimeError: Agent reached maximum iteration.'
+                in self.delegate.state.last_error
+                or 'RuntimeError:Agent reached maximum budget for conversation'
+                in self.delegate.state.last_error
            ):
                # Forward the event to delegate and skip parent processing
                asyncio.get_event_loop().run_until_complete(
@@ -412,9 +390,7 @@ class AgentController:
        if hasattr(event, 'hidden') and event.hidden:
            return

-        # if the event is not filtered out, add it to the history
-        if self.agent_history_filter.include(event):
-            self.state.history.append(event)
+        self.state_tracker.add_history(event)

        if isinstance(event, Action):
            await self._handle_action(event)
@@ -457,11 +433,9 @@ class AgentController:

        elif isinstance(action, AgentFinishAction):
            self.state.outputs = action.outputs
-            self.state.metrics.merge(self.state.local_metrics)
            await self.set_agent_state_to(AgentState.FINISHED)
        elif isinstance(action, AgentRejectAction):
            self.state.outputs = action.outputs
-            self.state.metrics.merge(self.state.local_metrics)
            await self.set_agent_state_to(AgentState.REJECTED)

    async def _handle_observation(self, observation: Observation) -> None:
@@ -481,8 +455,10 @@ class AgentController:
            log_level, str(observation_to_print), extra={'msg_type': 'OBSERVATION'}
        )

+        # TODO: these metrics come from the draft editor, and they get accumulated into controller's state metrics and the agent's llm metrics
+        # In the future, we should have a more principled way to sharing metrics across all LLM instances for a given conversation
        if observation.llm_metrics is not None:
-            self.agent.llm.metrics.merge(observation.llm_metrics)
+            self.state_tracker.merge_metrics(observation.llm_metrics)

        # this happens for runnable actions and microagent actions
        if self._pending_action and self._pending_action.id == observation.cause:
@@ -496,9 +472,6 @@ class AgentController:
            if self.state.agent_state == AgentState.USER_REJECTED:
                await self.set_agent_state_to(AgentState.AWAITING_USER_INPUT)
            return
-        elif isinstance(observation, ErrorObservation):
-            if self.state.agent_state == AgentState.ERROR:
-                self.state.metrics.merge(self.state.local_metrics)

    async def _handle_message_action(self, action: MessageAction) -> None:
        """Handles message actions from the event stream.
@@ -516,22 +489,6 @@ class AgentController:
                str(action),
                extra={'msg_type': 'ACTION', 'event_source': EventSource.USER},
            )
-            # Extend max iterations when the user sends a message (only in non-headless mode)
-            if self._initial_max_iterations is not None and not self.headless_mode:
-                self.state.max_iterations = (
-                    self.state.iteration + self._initial_max_iterations
-                )
-                if (
-                    self.state.traffic_control_state == TrafficControlState.THROTTLING
-                    or self.state.traffic_control_state == TrafficControlState.PAUSED
-                ):
-                    self.state.traffic_control_state = TrafficControlState.NORMAL
-                self.log(
-                    'debug',
-                    f'Extended max iterations to {self.state.max_iterations} after user message',
-                )
-            # try to retrieve microagents relevant to the user message
-            # set pending_action while we search for information

            # if this is the first user message for this agent, matters for the microagent info type
            first_user_message = self._first_user_message()
@@ -605,36 +562,16 @@ class AgentController:
            return

        if new_state in (AgentState.STOPPED, AgentState.ERROR):
-            # sync existing metrics BEFORE resetting the agent
-            await self.update_state_after_step()
-            self.state.metrics.merge(self.state.local_metrics)
            self._reset()
-        elif (
-            new_state == AgentState.RUNNING
-            and self.state.agent_state == AgentState.PAUSED
-            # TODO: do we really need both THROTTLING and PAUSED states, or can we clean up one of them completely?
-            and self.state.traffic_control_state == TrafficControlState.THROTTLING
-        ):
-            # user intends to interrupt traffic control and let the task resume temporarily
-            self.state.traffic_control_state = TrafficControlState.PAUSED
-            # User has chosen to deliberately continue - lets double the max iterations
-            if (
-                self.state.iteration is not None
-                and self.state.max_iterations is not None
-                and self._initial_max_iterations is not None
-                and not self.headless_mode
-            ):
-                if self.state.iteration >= self.state.max_iterations:
-                    self.state.max_iterations += self._initial_max_iterations

-            if (
-                self.state.metrics.accumulated_cost is not None
-                and self.max_budget_per_task is not None
-                and self._initial_max_budget_per_task is not None
-            ):
-                if self.state.metrics.accumulated_cost >= self.max_budget_per_task:
-                    self.max_budget_per_task += self._initial_max_budget_per_task
-        elif self._pending_action is not None and (
+        # User is allowing to check control limits and expand them if applicable
+        if (
+            self.state.agent_state == AgentState.ERROR
+            and new_state == AgentState.RUNNING
+        ):
+            self.state_tracker.maybe_increase_control_flags_limits(self.headless_mode)
+
+        if self._pending_action is not None and (
            new_state in (AgentState.USER_CONFIRMED, AgentState.USER_REJECTED)
        ):
            if hasattr(self._pending_action, 'thought'):
@@ -659,6 +596,10 @@ class AgentController:
            EventSource.ENVIRONMENT,
        )

+        # Save state whenever agent state changes to ensure we don't lose state
+        # in case of crashes or unexpected circumstances
+        self.save_state()
+
    def get_agent_state(self) -> AgentState:
        """Returns the current state of the agent.

@@ -686,19 +627,27 @@ class AgentController:
        agent_cls: type[Agent] = Agent.get_cls(action.agent)
        agent_config = self.agent_configs.get(action.agent, self.agent.config)
        llm_config = self.agent_to_llm_config.get(action.agent, self.agent.llm.config)
-        llm = LLM(config=llm_config, retry_listener=self._notify_on_llm_retry)
+        # Make sure metrics are shared between parent and child for global accumulation
+        llm = LLM(
+            config=llm_config,
+            retry_listener=self.agent.llm.retry_listener,
+            metrics=self.state.metrics,
+        )
        delegate_agent = agent_cls(llm=llm, config=agent_config)
+
+        # Take a snapshot of the current metrics before starting the delegate
        state = State(
            session_id=self.id.removesuffix('-delegate'),
            inputs=action.inputs or {},
-            local_iteration=0,
-            iteration=self.state.iteration,
-            max_iterations=self.state.max_iterations,
+            iteration_flag=self.state.iteration_flag,
+            budget_flag=self.state.budget_flag,
            delegate_level=self.state.delegate_level + 1,
            # global metrics should be shared between parent and child
            metrics=self.state.metrics,
            # start on top of the stream
            start_id=self.event_stream.get_latest_event_id() + 1,
+            parent_metrics_snapshot=self.state_tracker.get_metrics_snapshot(),
+            parent_iteration=self.state.iteration_flag.current_value,
        )
        self.log(
            'debug',
@@ -708,10 +657,12 @@ class AgentController:
        # Create the delegate with is_delegate=True so it does NOT subscribe directly
        self.delegate = AgentController(
            sid=self.id + '-delegate',
+            file_store=self.file_store,
+            user_id=self.user_id,
            agent=delegate_agent,
            event_stream=self.event_stream,
-            max_iterations=self.state.max_iterations,
-            max_budget_per_task=self.max_budget_per_task,
+            iteration_delta=self._initial_max_iterations,
+            budget_per_task_delta=self._initial_max_budget_per_task,
            agent_to_llm_config=self.agent_to_llm_config,
            agent_configs=self.agent_configs,
            initial_state=state,
@@ -730,7 +681,13 @@ class AgentController:
        delegate_state = self.delegate.get_agent_state()

        # update iteration that is shared across agents
-        self.state.iteration = self.delegate.state.iteration
+        self.state.iteration_flag.current_value = (
+            self.delegate.state.iteration_flag.current_value
+        )
+
+        # Calculate delegate-specific metrics before closing the delegate
+        delegate_metrics = self.state.get_local_metrics()
+        logger.info(f'Local metrics for delegate: {delegate_metrics}')

        # close the delegate controller before adding new events
        asyncio.get_event_loop().run_until_complete(self.delegate.close())
@@ -743,8 +700,12 @@ class AgentController:

            # prepare delegate result observation
            # TODO: replace this with AI-generated summary (#2395)
+            # Filter out metrics from the formatted output to avoid clutter
+            display_outputs = {
+                k: v for k, v in delegate_outputs.items() if k != 'metrics'
+            }
            formatted_output = ', '.join(
-                f'{key}: {value}' for key, value in delegate_outputs.items()
+                f'{key}: {value}' for key, value in display_outputs.items()
            )
            content = (
                f'{self.delegate.agent.name} finishes task with {formatted_output}'
@@ -798,24 +759,16 @@ class AgentController:

        self.log(
            'debug',
-            f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.local_iteration} GLOBAL STEP {self.state.iteration}',
+            f'LEVEL {self.state.delegate_level} LOCAL STEP {self.state.get_local_step()} GLOBAL STEP {self.state.iteration_flag.current_value}',
            extra={'msg_type': 'STEP'},
        )

-        stop_step = False
-        if self.state.iteration >= self.state.max_iterations:
-            stop_step = await self._handle_traffic_control(
-                'iteration', self.state.iteration, self.state.max_iterations
-            )
-        if self.max_budget_per_task is not None:
-            current_cost = self.state.metrics.accumulated_cost
-            if current_cost > self.max_budget_per_task:
-                stop_step = await self._handle_traffic_control(
-                    'budget', current_cost, self.max_budget_per_task
-                )
-        if stop_step:
-            logger.warning('Stopping agent due to traffic control')
-            return
+        # Ensure budget control flag is synchronized with the latest metrics.
+        # In the future, we should centralized the use of one LLM object per conversation.
+        # This will help us unify the cost for auto generating titles, running the condensor, etc.
+        # Before many microservices will touh the same llm cost field, we should sync with the budget flag for the controller
+        # and check that we haven't exceeded budget BEFORE executing an agent step.
+        self.state_tracker.sync_budget_flag_with_metrics()

        if self._is_stuck():
            await self._react_to_exception(
@@ -823,7 +776,13 @@ class AgentController:
            )
            return

-        self.update_state_before_step()
+        try:
+            self.state_tracker.run_control_flags()
+        except Exception as e:
+            logger.warning('Control flag limits hit')
+            await self._react_to_exception(e)
+            return
+
        action: Action = NullAction()

        if self._replay_manager.should_replay():
@@ -894,60 +853,9 @@ class AgentController:

            self.event_stream.add_event(action, action._source)  # type: ignore [attr-defined]

-        await self.update_state_after_step()
-
        log_level = 'info' if LOG_ALL_EVENTS else 'debug'
        self.log(log_level, str(action), extra={'msg_type': 'ACTION'})

-    def _notify_on_llm_retry(self, retries: int, max: int) -> None:
-        if self.status_callback is not None:
-            msg_id = 'STATUS$LLM_RETRY'
-            self.status_callback(
-                'info', msg_id, f'Retrying LLM request, {retries} / {max}'
-            )
-
-    async def _handle_traffic_control(
-        self, limit_type: str, current_value: float, max_value: float
-    ) -> bool:
-        """Handles agent state after hitting the traffic control limit.
-
-        Args:
-            limit_type (str): The type of limit that was hit.
-            current_value (float): The current value of the limit.
-            max_value (float): The maximum value of the limit.
-        """
-        stop_step = False
-        if self.state.traffic_control_state == TrafficControlState.PAUSED:
-            self.log(
-                'debug', 'Hitting traffic control, temporarily resume upon user request'
-            )
-            self.state.traffic_control_state = TrafficControlState.NORMAL
-        else:
-            self.state.traffic_control_state = TrafficControlState.THROTTLING
-            # Format values as integers for iterations, keep decimals for budget
-            if limit_type == 'iteration':
-                current_str = str(int(current_value))
-                max_str = str(int(max_value))
-            else:
-                current_str = f'{current_value:.2f}'
-                max_str = f'{max_value:.2f}'
-
-            if self.headless_mode:
-                e = RuntimeError(
-                    f'Agent reached maximum {limit_type} in headless mode. '
-                    f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}'
-                )
-                await self._react_to_exception(e)
-            else:
-                e = RuntimeError(
-                    f'Agent reached maximum {limit_type}. '
-                    f'Current {limit_type}: {current_str}, max {limit_type}: {max_str}. '
-                )
-                # FIXME: this isn't really an exception--we should have a different path
-                await self._react_to_exception(e)
-            stop_step = True
-        return stop_step
-
    @property
    def _pending_action(self) -> Action | None:
        """Get the current pending action with time tracking.
@@ -1015,150 +923,26 @@ class AgentController:
        self,
        state: State | None,
        max_iterations: int,
+        max_budget_per_task: float | None,
        confirmation_mode: bool = False,
-    ) -> None:
-        """Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
-
-        Args:
-            state: The state to initialize with, or None to create a new state.
-            max_iterations: The maximum number of iterations allowed for the task.
-            confirmation_mode: Whether to enable confirmation mode.
-        """
-        # state can come from:
-        # - the previous session, in which case it has history
-        # - from a parent agent, in which case it has no history
-        # - None / a new state
-
-        # If state is None, we create a brand new state and still load the event stream so we can restore the history
-        if state is None:
-            self.state = State(
-                session_id=self.id.removesuffix('-delegate'),
-                inputs={},
-                max_iterations=max_iterations,
-                confirmation_mode=confirmation_mode,
-            )
-            self.state.start_id = 0
-
-            self.log(
-                'info',
-                f'AgentController {self.id} - created new state. start_id: {self.state.start_id}',
-            )
-        else:
-            self.state = state
-
-            if self.state.start_id <= -1:
-                self.state.start_id = 0
-
-            self.log(
-                'info',
-                f'AgentController {self.id} initializing history from event {self.state.start_id}',
-            )
-
+    ):
+        self.state_tracker.set_initial_state(
+            self.id,
+            self.agent,
+            state,
+            max_iterations,
+            max_budget_per_task,
+            confirmation_mode,
+        )
        # Always load from the event stream to avoid losing history
-        self._init_history()
+        self.state_tracker._init_history(
+            self.event_stream,
+        )

    def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
        # state history could be partially hidden/truncated before controller is closed
        assert self._closed
-        return [
-            event_to_trajectory(event, include_screenshots)
-            for event in self.state.history
-        ]
-
-    def _init_history(self) -> None:
-        """Initializes the agent's history from the event stream.
-
-        The history is a list of events that:
-        - Excludes events of types listed in self.filter_out
-        - Excludes events with hidden=True attribute
-        - For delegate events (between AgentDelegateAction and AgentDelegateObservation):
-            - Excludes all events between the action and observation
-            - Includes the delegate action and observation themselves
-        """
-        # define range of events to fetch
-        # delegates start with a start_id and initially won't find any events
-        # otherwise we're restoring a previous session
-        start_id = self.state.start_id if self.state.start_id >= 0 else 0
-        end_id = (
-            self.state.end_id
-            if self.state.end_id >= 0
-            else self.event_stream.get_latest_event_id()
-        )
-
-        # sanity check
-        if start_id > end_id + 1:
-            self.log(
-                'warning',
-                f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
-            )
-            self.state.history = []
-            return
-
-        events: list[Event] = []
-
-        # Get rest of history
-        events_to_add = list(
-            self.event_stream.search_events(
-                start_id=start_id,
-                end_id=end_id,
-                reverse=False,
-                filter=self.agent_history_filter,
-            )
-        )
-        events.extend(events_to_add)
-
-        # Find all delegate action/observation pairs
-        delegate_ranges: list[tuple[int, int]] = []
-        delegate_action_ids: list[int] = []  # stack of unmatched delegate action IDs
-
-        for event in events:
-            if isinstance(event, AgentDelegateAction):
-                delegate_action_ids.append(event.id)
-                # Note: we can get agent=event.agent and task=event.inputs.get('task','')
-                # if we need to track these in the future
-
-            elif isinstance(event, AgentDelegateObservation):
-                # Match with most recent unmatched delegate action
-                if not delegate_action_ids:
-                    self.log(
-                        'warning',
-                        f'Found AgentDelegateObservation without matching action at id={event.id}',
-                    )
-                    continue
-
-                action_id = delegate_action_ids.pop()
-                delegate_ranges.append((action_id, event.id))
-
-        # Filter out events between delegate action/observation pairs
-        if delegate_ranges:
-            filtered_events: list[Event] = []
-            current_idx = 0
-
-            for start_id, end_id in sorted(delegate_ranges):
-                # Add events before delegate range
-                filtered_events.extend(
-                    event for event in events[current_idx:] if event.id < start_id
-                )
-
-                # Add delegate action and observation
-                filtered_events.extend(
-                    event for event in events if event.id in (start_id, end_id)
-                )
-
-                # Update index to after delegate range
-                current_idx = next(
-                    (i for i, e in enumerate(events) if e.id > end_id), len(events)
-                )
-
-            # Add any remaining events after last delegate range
-            filtered_events.extend(events[current_idx:])
-
-            self.state.history = filtered_events
-        else:
-            self.state.history = events
-
-        # make sure history is in sync
-        self.state.start_id = start_id
+        return self.state_tracker.get_trajectory(include_screenshots)

    def _handle_long_context_error(self) -> None:
        # When context window is exceeded, keep roughly half of agent interactions
@@ -1359,7 +1143,7 @@ class AgentController:
            action: The action to attach metrics to
        """
        # Get metrics from agent LLM
-        agent_metrics = self.agent.llm.metrics
+        agent_metrics = self.state.metrics

        # Get metrics from condenser LLM if it exists
        condenser_metrics: TokenUsage | None = None
@@ -1390,10 +1174,10 @@ class AgentController:
        # Log the metrics information for debugging
        # Get the latest usage directly from the agent's metrics
        latest_usage = None
-        if self.agent.llm.metrics.token_usages:
-            latest_usage = self.agent.llm.metrics.token_usages[-1]
+        if self.state.metrics.token_usages:
+            latest_usage = self.state.metrics.token_usages[-1]

-        accumulated_usage = self.agent.llm.metrics.accumulated_token_usage
+        accumulated_usage = self.state.metrics.accumulated_token_usage
        self.log(
            'debug',
            f'Action metrics - accumulated_cost: {metrics.accumulated_cost}, '
@@ -1432,7 +1216,7 @@ class AgentController:
        )

    def _is_awaiting_observation(self) -> bool:
-        events = self.event_stream.get_events(reverse=True)
+        events = self.event_stream.search_events(reverse=True)
        for event in events:
            if isinstance(event, AgentStateChangedObservation):
                result = event.agent_state == AgentState.RUNNING
@@ -1473,7 +1257,7 @@ class AgentController:
        self._cached_first_user_message = next(
            (
                e
-                for e in self.event_stream.get_events(
+                for e in self.event_stream.search_events(
                    start_id=self.state.start_id,
                )
                if isinstance(e, MessageAction) and e.source == EventSource.USER
@@ -1481,3 +1265,6 @@ class AgentController:
            None,
        )
        return self._cached_first_user_message
+
+    def save_state(self):
+        self.state_tracker.save_state()
--- a/openhands/controller/state/control_flags.py
+++ b/openhands/controller/state/control_flags.py
@@ -0,0 +1,95 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Generic, TypeVar
+
+T = TypeVar(
+    'T', int, float
+)  # Type for the value (int for iterations, float for budget)
+
+
+@dataclass
+class ControlFlag(Generic[T]):
+    """Base class for control flags that manage limits and state transitions."""
+
+    limit_increase_amount: T
+    current_value: T
+    max_value: T
+    headless_mode: bool = False
+    _hit_limit: bool = False
+
+    def reached_limit(self) -> bool:
+        """Check if the limit has been reached.
+
+        Returns:
+            bool: True if the limit has been reached, False otherwise.
+        """
+        raise NotImplementedError
+
+    def increase_limit(self, headless_mode: bool) -> None:
+        """Expand the limit when needed."""
+        raise NotImplementedError
+
+    def step(self):
+        """Determine the next state based on the current state and mode.
+
+        Returns:
+            ControlFlagState: The next state.
+        """
+        raise NotImplementedError
+
+
+@dataclass
+class IterationControlFlag(ControlFlag[int]):
+    """Control flag for managing iteration limits."""
+
+    def reached_limit(self) -> bool:
+        """Check if the iteration limit has been reached."""
+        self._hit_limit = self.current_value >= self.max_value
+        return self._hit_limit
+
+    def increase_limit(self, headless_mode: bool) -> None:
+        """Expand the iteration limit by adding the initial value."""
+        if not headless_mode and self._hit_limit:
+            self.max_value += self.limit_increase_amount
+            self._hit_limit = False
+
+    def step(self):
+        if self.reached_limit():
+            raise RuntimeError(
+                f'Agent reached maximum iteration. '
+                f'Current iteration: {self.current_value}, max iteration: {self.max_value}'
+            )
+
+        # Increment the current value
+        self.current_value += 1
+
+
+@dataclass
+class BudgetControlFlag(ControlFlag[float]):
+    """Control flag for managing budget limits."""
+
+    def reached_limit(self) -> bool:
+        """Check if the budget limit has been reached."""
+        self._hit_limit = self.current_value >= self.max_value
+        return self._hit_limit
+
+    def increase_limit(self, headless_mode) -> None:
+        """Expand the budget limit by adding the initial value to the current value."""
+        if self._hit_limit:
+            self.max_value = self.current_value + self.limit_increase_amount
+            self._hit_limit = False
+
+    def step(self):
+        """Check if we've reached the limit and update state accordingly.
+
+        Note: Unlike IterationControlFlag, this doesn't increment the value
+        as the budget is updated externally.
+        """
+        if self.reached_limit():
+            current_str = f'{self.current_value:.2f}'
+            max_str = f'{self.max_value:.2f}'
+            raise RuntimeError(
+                f'Agent reached maximum budget for conversation.'
+                f'Current budget: {current_str}, max budget: {max_str}'
+            )
--- a/openhands/controller/state/state.py
+++ b/openhands/controller/state/state.py
@@ -8,6 +8,10 @@ from enum import Enum
 from typing import Any

 import openhands
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
 from openhands.core.logger import openhands_logger as logger
 from openhands.core.schema import AgentState
 from openhands.events.action import (
@@ -20,7 +24,15 @@ from openhands.memory.view import View
 from openhands.storage.files import FileStore
 from openhands.storage.locations import get_conversation_agent_state_filename

+RESUMABLE_STATES = [
+    AgentState.RUNNING,
+    AgentState.PAUSED,
+    AgentState.AWAITING_USER_INPUT,
+    AgentState.FINISHED,
+]

+
+# NOTE: this is deprecated
 class TrafficControlState(str, Enum):
    # default state, no rate limiting
    NORMAL = 'normal'
@@ -32,14 +44,6 @@ class TrafficControlState(str, Enum):
    PAUSED = 'paused'


-RESUMABLE_STATES = [
-    AgentState.RUNNING,
-    AgentState.PAUSED,
-    AgentState.AWAITING_USER_INPUT,
-    AgentState.FINISHED,
-]
-
-
@dataclass
 class State:
    """
@@ -75,35 +79,43 @@ class State:
    """

    session_id: str = ''
-    # global iteration for the current task
-    iteration: int = 0
-    # local iteration for the current subtask
-    local_iteration: int = 0
-    # max number of iterations for the current task
-    max_iterations: int = 100
+    iteration_flag: IterationControlFlag = field(
+        default_factory=lambda: IterationControlFlag(
+            limit_increase_amount=100, current_value=0, max_value=100
+        )
+    )
+    budget_flag: BudgetControlFlag | None = None
    confirmation_mode: bool = False
    history: list[Event] = field(default_factory=list)
    inputs: dict = field(default_factory=dict)
    outputs: dict = field(default_factory=dict)
    agent_state: AgentState = AgentState.LOADING
    resume_state: AgentState | None = None
-    traffic_control_state: TrafficControlState = TrafficControlState.NORMAL
    # global metrics for the current task
    metrics: Metrics = field(default_factory=Metrics)
-    # local metrics for the current subtask
-    local_metrics: Metrics = field(default_factory=Metrics)
    # root agent has level 0, and every delegate increases the level by one
    delegate_level: int = 0
    # start_id and end_id track the range of events in history
    start_id: int = -1
    end_id: int = -1

-    delegates: dict[tuple[int, int], tuple[str, str]] = field(default_factory=dict)
-    # NOTE: This will never be used by the controller, but it can be used by different
+    parent_metrics_snapshot: Metrics | None = None
+    parent_iteration: int = 100
+
+    # NOTE: this is used by the controller to track parent's metrics snapshot before delegation
    # evaluation tasks to store extra data needed to track the progress/state of the task.
    extra_data: dict[str, Any] = field(default_factory=dict)
    last_error: str = ''

+    # NOTE: deprecated args, kept here temporarily for backwards compatability
+    # Will be remove in 30 days
+    iteration: int | None = None
+    local_iteration: int | None = None
+    max_iterations: int | None = None
+    traffic_control_state: TrafficControlState | None = None
+    local_metrics: Metrics | None = None
+    delegates: dict[tuple[int, int], tuple[str, str]] | None = None
+
    def save_to_session(
        self, sid: str, file_store: FileStore, user_id: str | None
    ) -> None:
@@ -165,6 +177,10 @@ class State:

        # first state after restore
        state.agent_state = AgentState.LOADING
+
+        # We don't need to clean up deprecated fields here
+        # They will be handled by __getstate__ when the state is saved again
+
        return state

    def __getstate__(self) -> dict:
@@ -177,15 +193,52 @@ class State:
        state.pop('_history_checksum', None)
        state.pop('_view', None)

+        # Remove deprecated fields before pickling
+        state.pop('iteration', None)
+        state.pop('local_iteration', None)
+        state.pop('max_iterations', None)
+        state.pop('traffic_control_state', None)
+        state.pop('local_metrics', None)
+        state.pop('delegates', None)
+
        return state

    def __setstate__(self, state: dict) -> None:
+        # Check if we're restoring from an older version (before control flags)
+        is_old_version = 'iteration' in state
+
+        # Convert old iteration tracking to new iteration_flag if needed
+        if is_old_version:
+            # Create iteration_flag from old values
+            max_iterations = state.get('max_iterations', 100)
+            current_iteration = state.get('iteration', 0)
+
+            # Add the iteration_flag to the state
+            state['iteration_flag'] = IterationControlFlag(
+                limit_increase_amount=max_iterations,
+                current_value=current_iteration,
+                max_value=max_iterations,
+            )
+
+        # Update the state
        self.__dict__.update(state)

+        # We keep the deprecated fields for backward compatibility
+        # They will be removed by __getstate__ when the state is saved again
+
        # make sure we always have the attribute history
        if not hasattr(self, 'history'):
            self.history = []

+        # Ensure we have default values for new fields if they're missing
+        if not hasattr(self, 'iteration_flag'):
+            self.iteration_flag = IterationControlFlag(
+                limit_increase_amount=100, current_value=0, max_value=100
+            )
+
+        if not hasattr(self, 'budget_flag'):
+            self.budget_flag = None
+
    def get_current_user_intent(self) -> tuple[str | None, list[str] | None]:
        """Returns the latest user message and image(if provided) that appears after a FinishAction, or the first (the task) if nothing was finished yet."""
        last_user_message = None
@@ -223,6 +276,17 @@ class State:
            ],
        }

+    def get_local_step(self):
+        if not self.parent_iteration:
+            return self.iteration_flag.current_value
+
+        return self.iteration_flag.current_value - self.parent_iteration
+
+    def get_local_metrics(self):
+        if not self.parent_metrics_snapshot:
+            return self.metrics
+        return self.metrics.diff(self.parent_metrics_snapshot)
+
    @property
    def view(self) -> View:
        # Compute a simple checksum from the history to see if we can re-use any
--- a/openhands/controller/state/state_tracker.py
+++ b/openhands/controller/state/state_tracker.py
@@ -0,0 +1,290 @@
+from openhands.controller.agent import Agent
+from openhands.controller.state.control_flags import (
+    BudgetControlFlag,
+    IterationControlFlag,
+)
+from openhands.controller.state.state import State
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action.agent import AgentDelegateAction, ChangeAgentStateAction
+from openhands.events.action.empty import NullAction
+from openhands.events.event import Event
+from openhands.events.event_filter import EventFilter
+from openhands.events.observation.agent import AgentStateChangedObservation
+from openhands.events.observation.delegate import AgentDelegateObservation
+from openhands.events.observation.empty import NullObservation
+from openhands.events.serialization.event import event_to_trajectory
+from openhands.events.stream import EventStream
+from openhands.llm.metrics import Metrics
+from openhands.storage.files import FileStore
+
+
+class StateTracker:
+    """Manages and synchronizes the state of an agent throughout its lifecycle.
+
+    It is responsible for:
+    1. Maintaining agent state persistence across sessions
+    2. Managing agent history by filtering and tracking relevant events (previously done in the agent controller)
+    3. Synchronizing metrics between the controller and LLM components
+    4. Updating control flags for budget and iteration limits
+
+    """
+
+    def __init__(
+        self, sid: str | None, file_store: FileStore | None, user_id: str | None
+    ):
+        self.sid = sid
+        self.file_store = file_store
+        self.user_id = user_id
+
+        # filter out events that are not relevant to the agent
+        # so they will not be included in the agent history
+        self.agent_history_filter = EventFilter(
+            exclude_types=(
+                NullAction,
+                NullObservation,
+                ChangeAgentStateAction,
+                AgentStateChangedObservation,
+            ),
+            exclude_hidden=True,
+        )
+
+    def set_initial_state(
+        self,
+        id: str,
+        agent: Agent,
+        state: State | None,
+        max_iterations: int,
+        max_budget_per_task: float | None,
+        confirmation_mode: bool = False,
+    ) -> None:
+        """Sets the initial state for the agent, either from the previous session, or from a parent agent, or by creating a new one.
+
+        Args:
+            state: The state to initialize with, or None to create a new state.
+            max_iterations: The maximum number of iterations allowed for the task.
+            confirmation_mode: Whether to enable confirmation mode.
+        """
+        # state can come from:
+        # - the previous session, in which case it has history
+        # - from a parent agent, in which case it has no history
+        # - None / a new state
+
+        # If state is None, we create a brand new state and still load the event stream so we can restore the history
+        if state is None:
+            self.state = State(
+                session_id=id.removesuffix('-delegate'),
+                inputs={},
+                iteration_flag=IterationControlFlag(
+                    limit_increase_amount=max_iterations,
+                    current_value=0,
+                    max_value=max_iterations,
+                ),
+                budget_flag=None
+                if not max_budget_per_task
+                else BudgetControlFlag(
+                    limit_increase_amount=max_budget_per_task,
+                    current_value=0,
+                    max_value=max_budget_per_task,
+                ),
+                confirmation_mode=confirmation_mode,
+            )
+            self.state.start_id = 0
+
+            logger.info(
+                f'AgentController {id} - created new state. start_id: {self.state.start_id}'
+            )
+        else:
+            self.state = state
+            if self.state.start_id <= -1:
+                self.state.start_id = 0
+
+            logger.info(
+                f'AgentController {id} initializing history from event {self.state.start_id}',
+            )
+
+        # Share the state metrics with the agent's LLM metrics
+        # This ensures that all accumulated metrics are always in sync between controller and llm
+        agent.llm.metrics = self.state.metrics
+
+    def _init_history(self, event_stream: EventStream) -> None:
+        """Initializes the agent's history from the event stream.
+
+        The history is a list of events that:
+        - Excludes events of types listed in self.filter_out
+        - Excludes events with hidden=True attribute
+        - For delegate events (between AgentDelegateAction and AgentDelegateObservation):
+            - Excludes all events between the action and observation
+            - Includes the delegate action and observation themselves
+        """
+        # define range of events to fetch
+        # delegates start with a start_id and initially won't find any events
+        # otherwise we're restoring a previous session
+        start_id = self.state.start_id if self.state.start_id >= 0 else 0
+        end_id = (
+            self.state.end_id
+            if self.state.end_id >= 0
+            else event_stream.get_latest_event_id()
+        )
+
+        # sanity check
+        if start_id > end_id + 1:
+            logger.warning(
+                f'start_id {start_id} is greater than end_id + 1 ({end_id + 1}). History will be empty.',
+            )
+            self.state.history = []
+            return
+
+        events: list[Event] = []
+
+        # Get rest of history
+        events_to_add = list(
+            event_stream.search_events(
+                start_id=start_id,
+                end_id=end_id,
+                reverse=False,
+                filter=self.agent_history_filter,
+            )
+        )
+        events.extend(events_to_add)
+
+        # Find all delegate action/observation pairs
+        delegate_ranges: list[tuple[int, int]] = []
+        delegate_action_ids: list[int] = []  # stack of unmatched delegate action IDs
+
+        for event in events:
+            if isinstance(event, AgentDelegateAction):
+                delegate_action_ids.append(event.id)
+                # Note: we can get agent=event.agent and task=event.inputs.get('task','')
+                # if we need to track these in the future
+
+            elif isinstance(event, AgentDelegateObservation):
+                # Match with most recent unmatched delegate action
+                if not delegate_action_ids:
+                    logger.warning(
+                        f'Found AgentDelegateObservation without matching action at id={event.id}',
+                    )
+                    continue
+
+                action_id = delegate_action_ids.pop()
+                delegate_ranges.append((action_id, event.id))
+
+        # Filter out events between delegate action/observation pairs
+        if delegate_ranges:
+            filtered_events: list[Event] = []
+            current_idx = 0
+
+            for start_id, end_id in sorted(delegate_ranges):
+                # Add events before delegate range
+                filtered_events.extend(
+                    event for event in events[current_idx:] if event.id < start_id
+                )
+
+                # Add delegate action and observation
+                filtered_events.extend(
+                    event for event in events if event.id in (start_id, end_id)
+                )
+
+                # Update index to after delegate range
+                current_idx = next(
+                    (i for i, e in enumerate(events) if e.id > end_id), len(events)
+                )
+
+            # Add any remaining events after last delegate range
+            filtered_events.extend(events[current_idx:])
+
+            self.state.history = filtered_events
+        else:
+            self.state.history = events
+
+        # make sure history is in sync
+        self.state.start_id = start_id
+
+    def close(self, event_stream: EventStream):
+        # we made history, now is the time to rewrite it!
+        # the final state.history will be used by external scripts like evals, tests, etc.
+        # history will need to be complete WITH delegates events
+        # like the regular agent history, it does not include:
+        # - 'hidden' events, events with hidden=True
+        # - backend events (the default 'filtered out' types, types in self.filter_out)
+        start_id = self.state.start_id if self.state.start_id >= 0 else 0
+        end_id = (
+            self.state.end_id
+            if self.state.end_id >= 0
+            else event_stream.get_latest_event_id()
+        )
+
+        self.state.history = list(
+            event_stream.search_events(
+                start_id=start_id,
+                end_id=end_id,
+                reverse=False,
+                filter=self.agent_history_filter,
+            )
+        )
+
+    def add_history(self, event: Event):
+        # if the event is not filtered out, add it to the history
+        if self.agent_history_filter.include(event):
+            self.state.history.append(event)
+
+    def get_trajectory(self, include_screenshots: bool = False) -> list[dict]:
+        return [
+            event_to_trajectory(event, include_screenshots)
+            for event in self.state.history
+        ]
+
+    def maybe_increase_control_flags_limits(self, headless_mode: bool):
+        # Iteration and budget extensions are independent of each other
+        # An error will be thrown if any one of the control flags have reached or exceeded its limit
+        self.state.iteration_flag.increase_limit(headless_mode)
+        if self.state.budget_flag:
+            self.state.budget_flag.increase_limit(headless_mode)
+
+    def get_metrics_snapshot(self):
+        """
+        Deep copy of metrics
+        This serves as a snapshot for the parent's metrics at the time a delegate is created
+        It will be stored and used to compute local metrics for the delegate
+        (since delegates now accumulate metrics from where its parent left off)
+        """
+
+        return self.state.metrics.copy()
+
+    def save_state(self):
+        """
+        Save's current state to persistent store
+        """
+        if self.sid and self.file_store:
+            self.state.save_to_session(self.sid, self.file_store, self.user_id)
+
+    def run_control_flags(self):
+        """
+        Performs one step of the control flags
+        """
+        self.state.iteration_flag.step()
+        if self.state.budget_flag:
+            self.state.budget_flag.step()
+
+    def sync_budget_flag_with_metrics(self):
+        """
+        Ensures that budget flag is up to date with accumulated costs from llm completions
+        Budget flag will monitor for when budget is exceeded
+        """
+        if self.state.budget_flag:
+            self.state.budget_flag.current_value = self.state.metrics.accumulated_cost
+
+    def merge_metrics(self, metrics: Metrics):
+        """
+        Merges metrics with the state metrics
+
+        NOTE: this should be refactored in the future. We should have services (draft llm, title autocomplete, condenser, etc)
+        use their own LLMs, but the metrics object should be shared. This way we have one source of truth for accumulated costs from
+        all services
+
+        This would prevent having fragmented stores for metrics, and we don't have the burden of deciding where and how to store them
+        if we decide introduce more specialized services that require llm completions
+
+        """
+        self.state.metrics.merge(metrics)
+        if self.state.budget_flag:
+            self.state.budget_flag.current_value = self.state.metrics.accumulated_cost
--- a/openhands/core/main.py
+++ b/openhands/core/main.py
@@ -5,6 +5,7 @@ from pathlib import Path
 from typing import Callable, Protocol

 import openhands.agenthub  # noqa F401 (we import this to get the agents registered)
+import openhands.cli.suppress_warnings  # noqa: F401
 from openhands.controller.agent import Agent
 from openhands.controller.replay import ReplayManager
 from openhands.controller.state.state import State
@@ -139,9 +140,9 @@ async def run_controller(
                config.mcp_host, config, None
            )
        )
-        config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)
+        runtime.config.mcp.stdio_servers.extend(openhands_mcp_stdio_servers)

-        await add_mcp_tools_to_agent(agent, runtime, memory, config)
+        await add_mcp_tools_to_agent(agent, runtime, memory)

    replay_events: list[Event] | None = None
    if config.replay_trajectory_path:
--- a/openhands/core/setup.py
+++ b/openhands/core/setup.py
@@ -107,13 +107,6 @@ def initialize_repository_for_runtime(
        gitlab_token = SecretStr(os.environ['GITLAB_TOKEN'])
        provider_tokens[ProviderType.GITLAB] = ProviderToken(token=gitlab_token)

-    if 'AZURE_DEVOPS_TOKEN' in os.environ:
-        azure_devops_token = SecretStr(os.environ['AZURE_DEVOPS_TOKEN'])
-        azure_devops_host = os.environ.get('AZURE_DEVOPS_HOST')
-        provider_tokens[ProviderType.AZURE_DEVOPS] = ProviderToken(
-            token=azure_devops_token, host=azure_devops_host
-        )
-
    secret_store = (
        UserSecrets(provider_tokens=provider_tokens) if provider_tokens else None
    )
@@ -213,8 +206,8 @@ def create_controller(

    controller = AgentController(
        agent=agent,
-        max_iterations=config.max_iterations,
-        max_budget_per_task=config.max_budget_per_task,
+        iteration_delta=config.max_iterations,
+        budget_per_task_delta=config.max_budget_per_task,
        agent_to_llm_config=config.get_agent_to_llm_config_map(),
        event_stream=event_stream,
        initial_state=initial_state,
--- a/openhands/events/async_event_store_wrapper.py
+++ b/openhands/events/async_event_store_wrapper.py
@@ -15,8 +15,8 @@ class AsyncEventStoreWrapper:
        loop = asyncio.get_running_loop()

        # Create an async generator that yields events
-        for event in self.event_store.get_events(*self.args, **self.kwargs):
-            # Run the blocking get_events() in a thread pool
+        for event in self.event_store.search_events(*self.args, **self.kwargs):
+            # Run the blocking search_events() in a thread pool
            def get_event(e: Event = event) -> Event:
                return e

--- a/openhands/events/event_store.py
+++ b/openhands/events/event_store.py
@@ -140,7 +140,7 @@ class EventStore(EventStoreABC):
        return self.cur_id - 1

    def filtered_events_by_source(self, source: EventSource) -> Iterable[Event]:
-        for event in self.get_events():
+        for event in self.search_events():
            if event.source == source:
                yield event

--- a/openhands/integrations/azure_devops/init.py
+++ b/openhands/integrations/azure_devops/init.py
@@ -1,3 +0,0 @@
-"""
-Azure DevOps integration package.
-"""
--- a/openhands/integrations/azure_devops/azure_devops_service.py
+++ b/openhands/integrations/azure_devops/azure_devops_service.py
@@ -1,801 +0,0 @@
-"""Azure DevOps service implementation using standard HTTP API calls."""
-
-from __future__ import annotations
-
-import base64
-from typing import Any
-
-import httpx
-from pydantic import SecretStr
-
-from openhands.core.logger import openhands_logger as logger
-from openhands.integrations.service_types import (
-    AuthenticationError,
-    BaseGitService,
-    Branch,
-    ProviderType,
-    Repository,
-    RequestMethod,
-    SuggestedTask,
-    TaskType,
-    UnknownException,
-    User,
-)
-from openhands.server.types import AppMode
-
-
-class AzureDevOpsServiceImpl(BaseGitService):
-    """Azure DevOps service implementation using standard HTTP API calls."""
-
-    def __init__(
-        self,
-        user_id: str | None = None,
-        token: SecretStr | None = None,
-        external_auth_id: str | None = None,
-        external_auth_token: SecretStr | None = None,
-        external_token_manager: bool = False,
-        base_domain: str | None = None,
-    ):
-        """Initialize the Azure DevOps service.
-
-        Args:
-            user_id: The user ID
-            token: The Azure DevOps personal access token
-            external_auth_id: External auth ID (not used for Azure DevOps)
-            external_auth_token: External auth token (not used for Azure DevOps)
-            external_token_manager: Whether to use external token manager (not used for Azure DevOps)
-            base_domain: The Azure DevOps organization URL (e.g., https://dev.azure.com/organization)
-        """
-        self.user_id = user_id
-        self.token = token
-        self.external_auth_id = external_auth_id
-        self.external_auth_token = external_auth_token
-        self.external_token_manager = external_token_manager
-        self.organization_url = base_domain or 'https://dev.azure.com'
-
-        # Extract organization name from URL for API calls
-        if self.organization_url.startswith('https://dev.azure.com/'):
-            self.organization = self.organization_url.replace(
-                'https://dev.azure.com/', ''
-            ).rstrip('/')
-        else:
-            # Handle custom Azure DevOps Server URLs
-            self.organization = (
-                self.organization_url.split('/')[-1]
-                if '/' in self.organization_url
-                else self.organization_url
-            )
-
-        self.base_url = f'https://dev.azure.com/{self.organization}/_apis'
-
-    @property
-    def provider(self) -> str:
-        return ProviderType.AZURE_DEVOPS.value
-
-    async def _get_azure_devops_headers(self) -> dict[str, str]:
-        """Get headers for Azure DevOps API requests."""
-        if not self.token:
-            self.token = await self.get_latest_token()
-
-        if not self.token:
-            raise AuthenticationError('No Azure DevOps token provided')
-
-        # Azure DevOps uses Basic authentication with PAT
-        # Username can be empty, password is the PAT
-        credentials = base64.b64encode(
-            f':{self.token.get_secret_value()}'.encode()
-        ).decode()
-
-        return {
-            'Authorization': f'Basic {credentials}',
-            'Content-Type': 'application/json',
-            'Accept': 'application/json',
-        }
-
-    def _has_token_expired(self, status_code: int) -> bool:
-        """Check if the token has expired."""
-        return status_code == 401
-
-    async def execute_request(
-        self,
-        client: httpx.AsyncClient,
-        url: str,
-        headers: dict,
-        params: dict | None,
-        method: RequestMethod = RequestMethod.GET,
-    ) -> httpx.Response:
-        """Execute an HTTP request."""
-        if method == RequestMethod.GET:
-            response = await client.get(url, headers=headers, params=params)
-        elif method == RequestMethod.POST:
-            # For Azure DevOps, we need to handle the case where params contains both
-            # query parameters and JSON data. We'll use a special key to separate them.
-            json_data = params.pop('_json_data', None) if params else None
-            response = await client.post(
-                url, headers=headers, params=params, json=json_data
-            )
-        else:
-            raise ValueError(f'Unsupported HTTP method: {method}')
-
-        return response
-
-    async def _make_request(
-        self,
-        url: str,
-        params: dict | None = None,
-        method: RequestMethod = RequestMethod.GET,
-        json_data: dict | None = None,
-    ) -> tuple[Any, dict]:
-        """Make a request to the Azure DevOps API."""
-        try:
-            async with httpx.AsyncClient() as client:
-                azure_devops_headers = await self._get_azure_devops_headers()
-
-                # Make initial request
-                # For POST requests, embed json_data in params using special key
-                if method == RequestMethod.POST and json_data is not None:
-                    if params is None:
-                        params = {}
-                    params['_json_data'] = json_data
-
-                response = await self.execute_request(
-                    client=client,
-                    url=url,
-                    headers=azure_devops_headers,
-                    params=params,
-                    method=method,
-                )
-
-                # Handle token refresh if needed
-                if self._has_token_expired(response.status_code):
-                    logger.warning('Azure DevOps token expired, attempting refresh')
-                    # For Azure DevOps, we don't have automatic token refresh
-                    # The user needs to provide a new PAT
-                    raise AuthenticationError(
-                        'Azure DevOps token expired. Please provide a new Personal Access Token.'
-                    )
-
-                if response.status_code >= 400:
-                    logger.error(
-                        f'Azure DevOps API error: {response.status_code} - {response.text}'
-                    )
-                    if response.status_code == 401:
-                        raise AuthenticationError(
-                            'Authentication failed with Azure DevOps'
-                        )
-                    elif response.status_code == 403:
-                        raise AuthenticationError(
-                            'Access forbidden. Check your Azure DevOps permissions.'
-                        )
-                    elif response.status_code == 404:
-                        raise ValueError('Resource not found')
-                    else:
-                        raise UnknownException(
-                            f'Azure DevOps API error: {response.status_code}'
-                        )
-
-                try:
-                    response_data = response.json()
-                except Exception:
-                    response_data = response.text
-
-                return response_data, {}
-
-        except httpx.RequestError as e:
-            logger.error(f'Request error: {e}')
-            raise UnknownException(f'Request failed: {e}')
-        except Exception as e:
-            logger.error(f'Unexpected error: {e}')
-            raise UnknownException(f'Unexpected error: {e}')
-
-    async def get_latest_token(self) -> SecretStr | None:
-        """Get the latest token.
-
-        Returns:
-            The latest token
-        """
-        return self.token
-
-    async def get_user(self) -> User:
-        """Get the authenticated user.
-
-        Returns:
-            The authenticated user
-        """
-        try:
-            # Try to get user profile from Azure DevOps
-            # For organization-scoped tokens, we'll use the projects API to verify authentication
-            # since the global profile API requires "All accessible organizations" scope
-
-            # Fallback: Try to get projects to verify authentication
-            projects_url = f'{self.base_url}/projects'
-            projects_params = {'api-version': '7.1-preview.4'}
-
-            projects_data, _ = await self._make_request(
-                projects_url, params=projects_params
-            )
-
-            # If we can get projects, authentication is working
-            if projects_data:
-                # Try to get connection data for more user info
-                try:
-                    connection_url = f'{self.base_url}/connectionData'
-                    connection_params = {'api-version': '7.1-preview.1'}
-                    connection_data, _ = await self._make_request(
-                        connection_url, params=connection_params
-                    )
-
-                    if connection_data and isinstance(connection_data, dict):
-                        auth_user = connection_data.get('authenticatedUser', {})
-                        return User(
-                            id=auth_user.get('id', 0),
-                            login=auth_user.get(
-                                'uniqueName', self.user_id or 'azure_devops_user'
-                            ),
-                            avatar_url=auth_user.get('imageUrl', ''),
-                            name=auth_user.get(
-                                'displayName', self.user_id or 'Azure DevOps User'
-                            ),
-                            email=auth_user.get('uniqueName'),
-                            company=None,
-                        )
-                except Exception as connection_error:
-                    logger.debug(f'Could not get connection data: {connection_error}')
-
-                # Basic fallback if connection data fails
-                return User(
-                    id=0,  # Placeholder ID
-                    login=self.user_id or 'azure_devops_user',
-                    avatar_url='',
-                    name=self.user_id or 'Azure DevOps User',
-                    email=None,
-                    company=None,
-                )
-
-            # If projects API also fails, try the old profile approach as last resort
-            profile_url = f'{self.base_url}/profile/profiles/me'
-            profile_params = {'api-version': '7.1-preview.3'}
-
-            try:
-                profile_data, _ = await self._make_request(
-                    profile_url, params=profile_params
-                )
-
-                if profile_data and isinstance(profile_data, dict):
-                    return User(
-                        id=profile_data.get('id', 0),
-                        login=profile_data.get(
-                            'emailAddress', self.user_id or 'azure_devops_user'
-                        ),
-                        avatar_url=profile_data.get('avatar', {}).get('value', ''),
-                        name=profile_data.get(
-                            'displayName', self.user_id or 'Azure DevOps User'
-                        ),
-                        email=profile_data.get('emailAddress'),
-                        company=None,
-                    )
-            except Exception as profile_error:
-                logger.warning(f'Could not get user profile: {profile_error}')
-                raise AuthenticationError('Failed to authenticate with Azure DevOps')
-
-        except AuthenticationError:
-            raise
-        except Exception as e:
-            logger.error(f'Error getting Azure DevOps user: {e}')
-            raise AuthenticationError(f'Failed to authenticate with Azure DevOps: {e}')
-
-        # This should never be reached, but added for mypy
-        raise AuthenticationError('Failed to authenticate with Azure DevOps')
-
-    async def get_repositories(self, sort: str, app_mode: AppMode) -> list[Repository]:
-        """Get repositories for the authenticated user.
-
-        Args:
-            sort: The sort order
-            app_mode: The app mode
-
-        Returns:
-            A list of repositories
-        """
-        try:
-            # Get all repositories across all projects
-            repos_url = f'{self.base_url}/git/repositories'
-            repos_params = {'api-version': '7.1-preview.1'}
-
-            repos_data, _ = await self._make_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                return []
-
-            repositories = repos_data.get('value', [])
-
-            # Convert to Repository objects
-            result = []
-            for repo in repositories:
-                project_name = repo.get('project', {}).get('name', 'Unknown')
-                repo_name = repo.get('name', 'Unknown')
-
-                result.append(
-                    Repository(
-                        id=repo.get('id', ''),
-                        full_name=f'{project_name}/{repo_name}',
-                        git_provider=ProviderType.AZURE_DEVOPS,
-                        is_public=False,  # Azure DevOps repos are private by default
-                        stargazers_count=None,
-                        link_header=None,
-                        pushed_at=None,
-                    )
-                )
-
-            return result
-        except Exception as e:
-            logger.error(f'Error getting Azure DevOps repositories: {e}')
-            return []
-
-    async def search_repositories(
-        self,
-        query: str,
-        per_page: int,
-        sort: str,
-        order: str,
-    ) -> list[Repository]:
-        """Search for repositories.
-
-        Args:
-            query: The search query
-            per_page: The number of results per page
-            sort: The sort order
-            order: The sort direction
-
-        Returns:
-            A list of repositories
-        """
-        try:
-            # Get all repositories (Azure DevOps doesn't have a search API for repos)
-            repos_url = f'{self.base_url}/git/repositories'
-            repos_params = {'api-version': '7.1-preview.1'}
-
-            repos_data, _ = await self._make_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                return []
-
-            repositories = repos_data.get('value', [])
-
-            # Filter repositories by name (simple client-side filtering)
-            filtered_repos = [
-                repo
-                for repo in repositories
-                if query.lower() in repo.get('name', '').lower()
-                or query.lower() in repo.get('project', {}).get('name', '').lower()
-            ]
-
-            # Convert to Repository objects
-            result = []
-            for repo in filtered_repos[:per_page]:
-                project_name = repo.get('project', {}).get('name', 'Unknown')
-                repo_name = repo.get('name', 'Unknown')
-
-                result.append(
-                    Repository(
-                        id=repo.get('id', ''),
-                        full_name=f'{project_name}/{repo_name}',
-                        git_provider=ProviderType.AZURE_DEVOPS,
-                        is_public=False,  # Azure DevOps repos are private by default
-                        stargazers_count=None,
-                        link_header=None,
-                        pushed_at=None,
-                    )
-                )
-
-            return result
-        except Exception as e:
-            logger.error(f'Error searching Azure DevOps repositories: {e}')
-            return []
-
-    async def get_suggested_tasks(self) -> list[SuggestedTask]:
-        """Get suggested tasks for the authenticated user.
-
-        Returns:
-            A list of suggested tasks including:
-            - Open issues assigned to the user
-            - Pull requests authored by the user with:
-              - Merge conflicts
-              - Failing checks
-              - Unresolved comments
-        """
-        tasks: list[SuggestedTask] = []
-
-        try:
-            # Get open work items (bugs/issues)
-            await self._get_work_item_tasks(tasks)
-
-            # Get pull request tasks
-            await self._get_pull_request_tasks(tasks)
-
-            return tasks
-        except Exception as e:
-            logger.error(f'Error getting Azure DevOps suggested tasks: {e}')
-            return []
-
-    async def _get_work_item_tasks(self, tasks: list[SuggestedTask]) -> None:
-        """Get work item tasks using WIQL query."""
-        try:
-            # Use WIQL to query for open bugs
-            wiql_url = f'{self.base_url}/wit/wiql'
-            wiql_params = {'api-version': '7.1-preview.2'}
-
-            wiql_query = {
-                'query': """
-                    select [System.Id],
-                        [System.WorkItemType],
-                        [System.Title],
-                        [System.State],
-                        [System.TeamProject]
-                    from WorkItems
-                    where [System.WorkItemType] in ('Bug', 'Issue', 'Task')
-                    and [System.State] <> 'Closed'
-                    and [System.State] <> 'Resolved'
-                    and [System.State] <> 'Done'
-                    order by [System.ChangedDate] desc
-                """
-            }
-
-            wiql_data, _ = await self._make_request(
-                wiql_url,
-                params=wiql_params,
-                method=RequestMethod.POST,
-                json_data=wiql_query,
-            )
-
-            if not wiql_data or not isinstance(wiql_data, dict):
-                return
-
-            work_items = wiql_data.get('workItems', [])[:10]  # Limit to 10
-
-            # Get full work item details
-            for work_item in work_items:
-                work_item_id = work_item.get('id')
-                if not work_item_id:
-                    continue
-
-                # Get work item details
-                work_item_url = f'{self.base_url}/wit/workitems/{work_item_id}'
-                work_item_params = {'api-version': '7.1-preview.3'}
-
-                work_item_data, _ = await self._make_request(
-                    work_item_url, params=work_item_params
-                )
-
-                if work_item_data and isinstance(work_item_data, dict):
-                    fields = work_item_data.get('fields', {})
-                    project_name = fields.get('System.TeamProject', '')
-
-                    tasks.append(
-                        SuggestedTask(
-                            git_provider=ProviderType.AZURE_DEVOPS,
-                            task_type=TaskType.OPEN_ISSUE,
-                            repo=project_name,
-                            issue_number=work_item_id,
-                            title=fields.get('System.Title', ''),
-                        )
-                    )
-
-        except Exception as e:
-            logger.warning(f'Error getting work item tasks: {e}')
-
-    async def _get_pull_request_tasks(self, tasks: list[SuggestedTask]) -> None:
-        """Get pull request tasks."""
-        try:
-            # Get all repositories
-            repos_url = f'{self.base_url}/git/repositories'
-            repos_params = {'api-version': '7.1-preview.1'}
-
-            repos_data, _ = await self._make_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                return
-
-            repositories = repos_data.get('value', [])
-
-            # For each repository, get pull requests
-            for repo in repositories:
-                project_name = repo.get('project', {}).get('name', '')
-                repo_name = repo.get('name', '')
-                repo_id = repo.get('id', '')
-                full_repo_name = f'{project_name}/{repo_name}'
-
-                if not project_name or not repo_id:
-                    continue
-
-                # Get active pull requests
-                prs_url = f'{self.base_url}/git/repositories/{repo_id}/pullrequests'
-                prs_params = {
-                    'api-version': '7.1-preview.1',
-                    'searchCriteria.status': 'active',
-                }
-
-                prs_data, _ = await self._make_request(prs_url, params=prs_params)
-
-                if not prs_data or not isinstance(prs_data, dict):
-                    continue
-
-                pull_requests = prs_data.get('value', [])
-
-                for pr in pull_requests:
-                    pr_id = pr.get('pullRequestId')
-                    if not pr_id:
-                        continue
-
-                    task_type = None
-
-                    # Check for merge conflicts
-                    if pr.get('mergeStatus') == 'conflicts':
-                        task_type = TaskType.MERGE_CONFLICTS
-                    else:
-                        # Check for failing policy evaluations
-                        try:
-                            policy_url = f'{self.base_url}/policy/evaluations'
-                            policy_params = {
-                                'api-version': '7.1-preview.1',
-                                'artifactId': f'vstfs:///CodeReview/CodeReviewId/{project_name}/{pr_id}',
-                            }
-
-                            policy_data, _ = await self._make_request(
-                                policy_url, params=policy_params
-                            )
-
-                            if policy_data and isinstance(policy_data, dict):
-                                evaluations = policy_data.get('value', [])
-                                has_failing_checks = any(
-                                    eval.get('status') == 'rejected'
-                                    for eval in evaluations
-                                )
-
-                                if has_failing_checks:
-                                    task_type = TaskType.FAILING_CHECKS
-                        except Exception:
-                            # Policy evaluations might not be accessible, continue
-                            pass
-
-                        # Check for unresolved comments if no other issues found
-                        if not task_type:
-                            try:
-                                threads_url = f'{self.base_url}/git/repositories/{repo_id}/pullRequests/{pr_id}/threads'
-                                threads_params = {'api-version': '7.1-preview.1'}
-
-                                threads_data, _ = await self._make_request(
-                                    threads_url, params=threads_params
-                                )
-
-                                if threads_data and isinstance(threads_data, dict):
-                                    threads = threads_data.get('value', [])
-                                    has_unresolved_comments = any(
-                                        thread.get('status') == 'active'
-                                        and not thread.get('isDeleted', False)
-                                        for thread in threads
-                                    )
-
-                                    if has_unresolved_comments:
-                                        task_type = TaskType.UNRESOLVED_COMMENTS
-                            except Exception:
-                                # Threads might not be accessible, continue
-                                pass
-
-                    # Add the task if we identified a specific issue
-                    if task_type:
-                        tasks.append(
-                            SuggestedTask(
-                                git_provider=ProviderType.AZURE_DEVOPS,
-                                task_type=task_type,
-                                repo=full_repo_name,
-                                issue_number=pr_id,
-                                title=pr.get('title', ''),
-                            )
-                        )
-
-        except Exception as e:
-            logger.warning(f'Error getting pull request tasks: {e}')
-
-    async def get_repository_details_from_repo_name(
-        self, repository: str
-    ) -> Repository:
-        """Get repository details from repository name.
-
-        Args:
-            repository: The repository name (format: project/repo)
-
-        Returns:
-            The repository details
-        """
-        try:
-            # Parse the repository name (expected format: project/repo)
-            parts = repository.split('/')
-            if len(parts) != 2:
-                raise ValueError(
-                    f'Invalid repository name format: {repository}. Expected format: project/repo'
-                )
-
-            project_name, repo_name = parts
-
-            # Get repositories for the specific project
-            repos_url = f'{self.base_url}/git/repositories'
-            repos_params = {'api-version': '7.1-preview.1', 'project': project_name}
-
-            repos_data, _ = await self._make_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                raise ValueError(f'Repository not found: {repository}')
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                raise ValueError(f'Repository not found: {repository}')
-
-            return Repository(
-                id=repo.get('id', ''),
-                full_name=f'{project_name}/{repo_name}',
-                git_provider=ProviderType.AZURE_DEVOPS,
-                is_public=False,  # Azure DevOps repos are private by default
-                stargazers_count=None,
-                link_header=None,
-                pushed_at=None,
-            )
-        except Exception as e:
-            logger.error(f'Error getting Azure DevOps repository details: {e}')
-            raise AuthenticationError(f'Failed to get repository details: {e}')
-
-    async def get_branches(self, repository: str) -> list[Branch]:
-        """Get branches for a repository.
-
-        Args:
-            repository: The repository name (format: project/repo)
-
-        Returns:
-            A list of branches
-        """
-        try:
-            # Parse the repository name (expected format: project/repo)
-            parts = repository.split('/')
-            if len(parts) != 2:
-                raise ValueError(
-                    f'Invalid repository name format: {repository}. Expected format: project/repo'
-                )
-
-            project_name, repo_name = parts
-
-            # First, get the repository ID
-            repo_details = await self.get_repository_details_from_repo_name(repository)
-            repo_id = repo_details.id
-
-            # Get the branches (refs) for the repository
-            refs_url = f'{self.base_url}/git/repositories/{repo_id}/refs'
-            refs_params = {
-                'api-version': '7.1-preview.1',
-                'filter': 'heads/',  # Only get branch refs, not tags
-            }
-
-            refs_data, _ = await self._make_request(refs_url, params=refs_params)
-
-            if not refs_data or not isinstance(refs_data, dict):
-                return []
-
-            refs = refs_data.get('value', [])
-
-            # Convert to Branch objects
-            result = []
-            for ref in refs:
-                # Extract branch name from ref name (remove 'refs/heads/' prefix)
-                ref_name = ref.get('name', '')
-                if ref_name.startswith('refs/heads/'):
-                    branch_name = ref_name[len('refs/heads/') :]
-
-                    result.append(
-                        Branch(
-                            name=branch_name,
-                            commit_sha=ref.get('objectId', ''),
-                            protected=False,  # Azure DevOps doesn't expose this information directly
-                            last_push_date=None,  # Azure DevOps doesn't expose this information directly
-                        )
-                    )
-
-            return result
-        except Exception as e:
-            logger.error(f'Error getting Azure DevOps branches: {e}')
-            return []
-
-    async def create_pr(
-        self,
-        repo_name: str,
-        source_branch: str,
-        target_branch: str,
-        title: str,
-        body: str | None = None,
-        draft: bool = False,
-    ) -> str:
-        """Create a pull request in Azure DevOps.
-
-        Args:
-            repo_name: The repository name (format: project/repo)
-            source_branch: The source branch name
-            target_branch: The target branch name
-            title: The pull request title
-            body: The pull request description (optional)
-            draft: Whether the pull request should be a draft (optional)
-
-        Returns:
-            The URL of the created pull request
-
-        Raises:
-            ValueError: If the repository name format is invalid
-            AuthenticationError: If authentication fails
-            UnknownException: If the API request fails
-        """
-        try:
-            # Parse the repository name (expected format: project/repo)
-            parts = repo_name.split('/')
-            if len(parts) != 2:
-                raise ValueError(
-                    f'Invalid repository name format: {repo_name}. Expected format: project/repo'
-                )
-
-            project_name, repo_name_only = parts
-
-            # Get the repository details to get the repository ID
-            repo_details = await self.get_repository_details_from_repo_name(repo_name)
-            repo_id = repo_details.id
-
-            # Prepare the pull request data
-            pr_data = {
-                'sourceRefName': f'refs/heads/{source_branch}',
-                'targetRefName': f'refs/heads/{target_branch}',
-                'title': title,
-                'description': body
-                or f'Pull request from {source_branch} to {target_branch}',
-                'isDraft': draft,
-            }
-
-            # Create the pull request
-            pr_url = f'{self.base_url}/git/repositories/{repo_id}/pullrequests'
-            pr_params = {'api-version': '7.1-preview.1'}
-
-            response_data, _ = await self._make_request(
-                url=pr_url,
-                params=pr_params,
-                method=RequestMethod.POST,
-                json_data=pr_data,
-            )
-
-            if not response_data or not isinstance(response_data, dict):
-                raise UnknownException(
-                    'Failed to create pull request: Invalid response'
-                )
-
-            # Extract the pull request URL
-            pr_id = response_data.get('pullRequestId')
-            if not pr_id:
-                raise UnknownException(
-                    'Failed to create pull request: No PR ID returned'
-                )
-
-            # Construct the web URL for the pull request
-            web_url = f'{self.organization_url}/{project_name}/_git/{repo_name_only}/pullrequest/{pr_id}'
-
-            logger.info(f'Successfully created Azure DevOps pull request: {web_url}')
-            return web_url
-
-        except ValueError:
-            raise
-        except AuthenticationError:
-            raise
-        except Exception as e:
-            logger.error(f'Error creating Azure DevOps pull request: {e}')
-            raise UnknownException(f'Failed to create pull request: {e}')
--- a/openhands/integrations/provider.py
+++ b/openhands/integrations/provider.py
@@ -14,9 +14,6 @@ from openhands.core.logger import openhands_logger as logger
 from openhands.events.action.action import Action
 from openhands.events.action.commands import CmdRunAction
 from openhands.events.stream import EventStream
-from openhands.integrations.azure_devops.azure_devops_service import (
-    AzureDevOpsServiceImpl,
-)
 from openhands.integrations.github.github_service import GithubServiceImpl
 from openhands.integrations.gitlab.gitlab_service import GitLabServiceImpl
 from openhands.integrations.service_types import (
@@ -30,8 +27,6 @@ from openhands.integrations.service_types import (
 )
 from openhands.server.types import AppMode

-AZURE_DEVOPS_AVAILABLE = True
-

 class ProviderToken(BaseModel):
    token: SecretStr | None = Field(default=None)
@@ -113,7 +108,6 @@ class ProviderHandler:
        self.service_class_map: dict[ProviderType, type[GitService]] = {
            ProviderType.GITHUB: GithubServiceImpl,
            ProviderType.GITLAB: GitLabServiceImpl,
-            ProviderType.AZURE_DEVOPS: AzureDevOpsServiceImpl,
        }

        self.external_auth_id = external_auth_id
@@ -130,8 +124,6 @@ class ProviderHandler:
        """Helper method to instantiate a service for a given provider"""
        token = self.provider_tokens[provider]
        service_class = self.service_class_map[provider]
-
-        # All services now use base_domain consistently
        return service_class(
            user_id=token.user_id,
            external_auth_id=self.external_auth_id,
--- a/openhands/integrations/service_types.py
+++ b/openhands/integrations/service_types.py
@@ -13,7 +13,6 @@ from openhands.server.types import AppMode
 class ProviderType(Enum):
    GITHUB = 'github'
    GITLAB = 'gitlab'
-    AZURE_DEVOPS = 'azure_devops'


 class TaskType(str, Enum):
@@ -52,19 +51,6 @@ class SuggestedTask(BaseModel):
                'ciProvider': 'GitHub',
                'requestVerb': 'pull request',
            }
-        elif self.git_provider == ProviderType.AZURE_DEVOPS:
-            return {
-                'requestType': 'Pull Request',
-                'requestTypeShort': 'PR',
-                'apiName': 'Azure DevOps API',
-                'tokenEnvVar': 'AZURE_DEVOPS_TOKEN',
-                'ciSystem': 'Azure Pipelines',
-                'ciProvider': 'Azure DevOps',
-                'requestVerb': 'pull request',
-                'work item': 'work item',
-                'repository': 'repository',
-                'pull request': 'pull request',
-            }

        raise ValueError(f'Provider {self.git_provider} for suggested task prompts')

@@ -97,9 +83,7 @@ class SuggestedTask(BaseModel):


 class User(BaseModel):
-    id: (
-        int | str
-    )  # Support both integer IDs (GitHub/GitLab) and string UUIDs (Azure DevOps)
+    id: int
    login: str
    avatar_url: str
    company: str | None = None
@@ -115,9 +99,7 @@ class Branch(BaseModel):


 class Repository(BaseModel):
-    id: (
-        int | str
-    )  # Support both integer IDs (GitHub/GitLab) and string UUIDs (Azure DevOps)
+    id: int
    full_name: str
    git_provider: ProviderType
    is_public: bool
@@ -193,7 +175,7 @@ class BaseGitService(ABC):


 class GitService(Protocol):
-    """Protocol defining the interface for Git service providers."""
+    """Protocol defining the interface for Git service providers"""

    def __init__(
        self,
@@ -204,15 +186,15 @@ class GitService(Protocol):
        external_token_manager: bool = False,
        base_domain: str | None = None,
    ) -> None:
-        """Initialize the service with authentication details."""
+        """Initialize the service with authentication details"""
        ...

    async def get_latest_token(self) -> SecretStr | None:
-        """Get latest working token of the user."""
+        """Get latest working token of the user"""
        ...

    async def get_user(self) -> User:
-        """Get the authenticated user's information."""
+        """Get the authenticated user's information"""
        ...

    async def search_repositories(
@@ -222,21 +204,21 @@ class GitService(Protocol):
        sort: str,
        order: str,
    ) -> list[Repository]:
-        """Search for repositories."""
+        """Search for repositories"""
        ...

    async def get_repositories(self, sort: str, app_mode: AppMode) -> list[Repository]:
-        """Get repositories for the authenticated user."""
+        """Get repositories for the authenticated user"""
        ...

    async def get_suggested_tasks(self) -> list[SuggestedTask]:
-        """Get suggested tasks for the authenticated user across all repositories."""
+        """Get suggested tasks for the authenticated user across all repositories"""
        ...

    async def get_repository_details_from_repo_name(
        self, repository: str
    ) -> Repository:
-        """Gets all repository details from repository name."""
+        """Gets all repository details from repository name"""

    async def get_branches(self, repository: str) -> list[Branch]:
-        """Get branches for a repository."""
+        """Get branches for a repository"""
--- a/openhands/integrations/utils.py
+++ b/openhands/integrations/utils.py
@@ -1,9 +1,8 @@
+import traceback
+
 from pydantic import SecretStr

 from openhands.core.logger import openhands_logger as logger
-from openhands.integrations.azure_devops.azure_devops_service import (
-    AzureDevOpsServiceImpl,
-)
 from openhands.integrations.github.github_service import GitHubService
 from openhands.integrations.gitlab.gitlab_service import GitLabService
 from openhands.integrations.provider import ProviderType
@@ -13,53 +12,35 @@ async def validate_provider_token(
    token: SecretStr, base_domain: str | None = None
 ) -> ProviderType | None:
    """
-    Determine whether a token is for GitHub, GitLab, or Azure DevOps by attempting to get user info
-    from the services.
+    Determine whether a token is for GitHub or GitLab by attempting to get user info
+    from both services.

    Args:
        token: The token to check
-        base_domain: Optional base domain for the service

    Returns:
        'github' if it's a GitHub token
        'gitlab' if it's a GitLab token
-        'azure_devops' if it's an Azure DevOps token
-        None if the token is invalid for all services
+        None if the token is invalid for both services
    """
-    # Skip validation for empty tokens
-    if token is None or not token.get_secret_value().strip():
-        return None
    # Try GitHub first
-    github_error = None
    try:
        github_service = GitHubService(token=token, base_domain=base_domain)
        await github_service.verify_access()
        return ProviderType.GITHUB
    except Exception as e:
-        github_error = e
+        logger.debug(
+            f'Failed to validate Github token: {e} \n {traceback.format_exc()}'
+        )

    # Try GitLab next
-    gitlab_error = None
    try:
        gitlab_service = GitLabService(token=token, base_domain=base_domain)
        await gitlab_service.get_user()
        return ProviderType.GITLAB
    except Exception as e:
-        gitlab_error = e
-
-    # Try Azure DevOps last
-    azure_devops_error = None
-    try:
-        azure_devops_service = AzureDevOpsServiceImpl(
-            token=token, base_domain=base_domain
+        logger.debug(
+            f'Failed to validate GitLab token: {e} \n {traceback.format_exc()}'
        )
-        await azure_devops_service.get_user()
-        return ProviderType.AZURE_DEVOPS
-    except Exception as e:
-        azure_devops_error = e
-
-    logger.debug(
-        f'Failed to validate token: {github_error} \n {gitlab_error} \n {azure_devops_error}'
-    )

    return None
--- a/openhands/llm/llm.py
+++ b/openhands/llm/llm.py
@@ -773,9 +773,6 @@ class LLM(RetryMixin, DebugMixin):
    def __repr__(self) -> str:
        return str(self)

-    def reset(self) -> None:
-        self.metrics.reset()
-
    def format_messages_for_llm(self, messages: Message | list[Message]) -> list[dict]:
        if isinstance(messages, Message):
            messages = [messages]
--- a/openhands/llm/metrics.py
+++ b/openhands/llm/metrics.py
@@ -193,22 +193,6 @@ class Metrics:
            'token_usages': [usage.model_dump() for usage in self._token_usages],
        }

-    def reset(self) -> None:
-        self._accumulated_cost = 0.0
-        self._costs = []
-        self._response_latencies = []
-        self._token_usages = []
-        # Reset accumulated token usage with a new instance
-        self._accumulated_token_usage = TokenUsage(
-            model=self.model_name,
-            prompt_tokens=0,
-            completion_tokens=0,
-            cache_read_tokens=0,
-            cache_write_tokens=0,
-            context_window=0,
-            response_id='',
-        )
-
    def log(self) -> str:
        """Log the metrics."""
        metrics = self.get()
@@ -221,5 +205,58 @@ class Metrics:
        """Create a deep copy of the Metrics object."""
        return copy.deepcopy(self)

+    def diff(self, baseline: 'Metrics') -> 'Metrics':
+        """Calculate the difference between current metrics and a baseline.
+
+        This is useful for tracking metrics for specific operations like delegates.
+
+        Args:
+            baseline: A metrics object representing the baseline state
+
+        Returns:
+            A new Metrics object containing only the differences since the baseline
+        """
+        result = Metrics(self.model_name)
+
+        # Calculate cost difference
+        result._accumulated_cost = self._accumulated_cost - baseline._accumulated_cost
+
+        # Include only costs that were added after the baseline
+        if baseline._costs:
+            last_baseline_timestamp = baseline._costs[-1].timestamp
+            result._costs = [
+                cost for cost in self._costs if cost.timestamp > last_baseline_timestamp
+            ]
+        else:
+            result._costs = self._costs.copy()
+
+        # Include only response latencies that were added after the baseline
+        result._response_latencies = self._response_latencies[
+            len(baseline._response_latencies) :
+        ]
+
+        # Include only token usages that were added after the baseline
+        result._token_usages = self._token_usages[len(baseline._token_usages) :]
+
+        # Calculate accumulated token usage difference
+        base_usage = baseline.accumulated_token_usage
+        current_usage = self.accumulated_token_usage
+
+        result._accumulated_token_usage = TokenUsage(
+            model=self.model_name,
+            prompt_tokens=current_usage.prompt_tokens - base_usage.prompt_tokens,
+            completion_tokens=current_usage.completion_tokens
+            - base_usage.completion_tokens,
+            cache_read_tokens=current_usage.cache_read_tokens
+            - base_usage.cache_read_tokens,
+            cache_write_tokens=current_usage.cache_write_tokens
+            - base_usage.cache_write_tokens,
+            context_window=current_usage.context_window,
+            per_turn_token=0,
+            response_id='',
+        )
+
+        return result
+
    def __repr__(self) -> str:
        return f'Metrics({self.get()}'
--- a/openhands/mcp/utils.py
+++ b/openhands/mcp/utils.py
@@ -10,7 +10,6 @@ from openhands.core.config.mcp_config import (
    MCPSHTTPServerConfig,
    MCPSSEServerConfig,
 )
-from openhands.core.config.openhands_config import OpenHandsConfig
 from openhands.core.logger import openhands_logger as logger
 from openhands.events.action.mcp import MCPAction
 from openhands.events.observation.mcp import MCPObservation
@@ -187,9 +186,7 @@ async def call_tool_mcp(mcp_clients: list[MCPClient], action: MCPAction) -> Obse
    )


-async def add_mcp_tools_to_agent(
-    agent: 'Agent', runtime: Runtime, memory: 'Memory', app_config: OpenHandsConfig
-):
+async def add_mcp_tools_to_agent(agent: 'Agent', runtime: Runtime, memory: 'Memory'):
    """
    Add MCP tools to an agent.
    """
@@ -208,7 +205,6 @@ async def add_mcp_tools_to_agent(
    extra_stdio_servers = []

    # Add microagent MCP tools if available
-    mcp_config: MCPConfig = app_config.mcp
    microagent_mcp_configs = memory.get_microagent_mcp_tools()
    for mcp_config in microagent_mcp_configs:
        if mcp_config.sse_servers:
--- a/openhands/resolver/README.md
+++ b/openhands/resolver/README.md
@@ -1,9 +1,9 @@
-# OpenHands Github, Gitlab & Azure DevOps Issue Resolver 🙌
+# OpenHands Github & Gitlab Issue Resolver 🙌

-Need help resolving issues in GitHub, GitLab, or Azure DevOps but don't have the time to do it yourself? Let an AI agent help you out!
+Need help resolving a GitHub issue but don't have the time to do it yourself? Let an AI agent help you out!

 This tool allows you to use open-source AI agents based on [OpenHands](https://github.com/all-hands-ai/openhands)
-to attempt to resolve issues automatically. While it can handle multiple issues, it's primarily designed
+to attempt to resolve GitHub issues automatically. While it can handle multiple issues, it's primarily designed
 to help you resolve one issue at a time with high quality.

 Getting started is simple - just follow the instructions below.
@@ -74,8 +74,8 @@ If you prefer to run the resolver programmatically instead of using GitHub Actio
 pip install openhands-ai
 ```

-2. Create an access token for your platform:
-   - Create a GitHub access token
+2. Create a GitHub or GitLab access token:
+   - Create a GitHub acces token
      - Visit [GitHub's token settings](https://github.com/settings/personal-access-tokens/new)
      - Create a fine-grained token with these scopes:
      - "Content"
@@ -84,7 +84,7 @@ pip install openhands-ai
      - "Workflows"
      - If you don't have push access to the target repo, you can fork it first

-   - Create a GitLab access token
+   - Create a GitLab acces token
      - Visit [GitLab's token settings](https://gitlab.com/-/user_settings/personal_access_tokens)
      - Create a fine-grained token with these scopes:
      - 'api'
@@ -93,30 +93,20 @@ pip install openhands-ai
      - 'read_repository'
      - 'write_repository'

-   - Create an Azure DevOps access token
-      - Visit [Azure DevOps Personal Access Tokens](https://dev.azure.com/your-organization/_usersSettings/tokens)
-      - Create a token with these scopes:
-      - "Code (Read & Write)"
-      - "Work Items (Read & Write)"
-      - "Pull Request Threads (Read & Write)"
-      - "Pull Request Contribute"
-
 3. Set up environment variables:

 ```bash

 # GitHub credentials
+
 export GITHUB_TOKEN="your-github-token"
 export GIT_USERNAME="your-github-username"  # Optional, defaults to token owner

 # GitLab credentials if you're using GitLab repo
+
 export GITLAB_TOKEN="your-gitlab-token"
 export GIT_USERNAME="your-gitlab-username"  # Optional, defaults to token owner

-# Azure DevOps credentials if you're using Azure DevOps repo
-export AZURE_DEVOPS_TOKEN="your-azure-devops-token"
-export GIT_USERNAME="your-azure-devops-username"  # Optional, defaults to token owner
-
 # LLM configuration

 export LLM_MODEL="anthropic/claude-sonnet-4-20250514"  # Recommended
--- a/openhands/resolver/interfaces/azure_devops.py
+++ b/openhands/resolver/interfaces/azure_devops.py
@@ -1,915 +0,0 @@
-import asyncio
-import base64
-from typing import Any
-
-import httpx
-
-from openhands.core.logger import openhands_logger as logger
-from openhands.integrations.service_types import RequestMethod
-from openhands.resolver.interfaces.issue import (
-    Issue,
-    IssueHandlerInterface,
-    ReviewThread,
-)
-
-
-class AzureDevOpsIssueHandler(IssueHandlerInterface):
-    def __init__(
-        self,
-        owner: str,
-        repo: str,
-        token: str,
-        username: str | None = None,
-        base_domain: str = 'dev.azure.com',
-    ):
-        """Initialize an Azure DevOps issue handler.
-
-        Args:
-            owner: The owner (organization) of the repository
-            repo: The name of the repository (format: project/repo)
-            token: The Azure DevOps personal access token
-            username: Optional Azure DevOps username
-            base_domain: The domain for Azure DevOps (default: "dev.azure.com")
-        """
-        self.owner = owner
-        self.repo = repo
-        self.token = token
-        self.username = username
-        self.base_domain = base_domain
-
-        # Parse the repository name (expected format: project/repo)
-        parts = repo.split('/')
-        if len(parts) != 2:
-            raise ValueError(
-                f'Invalid repository name format: {repo}. Expected format: project/repo'
-            )
-
-        self.project_name, self.repo_name = parts
-
-        self.base_url = self.get_base_url()
-        self.download_url = self.get_download_url()
-        self.clone_url = self.get_clone_url()
-        self.headers = self.get_headers()
-
-        # Set up API base URL
-        self.api_base_url = f'https://{self.base_domain}/{self.owner}/_apis'
-
-    def set_owner(self, owner: str) -> None:
-        self.owner = owner
-
-    def get_headers(self) -> dict[str, str]:
-        # Azure DevOps uses Basic authentication with PAT
-        # Username can be empty, password is the PAT
-        credentials = base64.b64encode(f':{self.token}'.encode()).decode()
-        return {
-            'Authorization': f'Basic {credentials}',
-            'Accept': 'application/json',
-            'Content-Type': 'application/json',
-        }
-
-    async def _make_api_request(
-        self,
-        url: str,
-        method: RequestMethod = RequestMethod.GET,
-        params: dict | None = None,
-        json_data: dict | None = None,
-    ) -> dict | list | None:
-        """Make an HTTP request to the Azure DevOps API."""
-        try:
-            async with httpx.AsyncClient() as client:
-                if method == RequestMethod.GET:
-                    response = await client.get(
-                        url, headers=self.headers, params=params
-                    )
-                elif method == RequestMethod.POST:
-                    response = await client.post(
-                        url, headers=self.headers, params=params, json=json_data
-                    )
-                else:
-                    raise ValueError(f'Unsupported HTTP method: {method}')
-
-                if response.status_code >= 400:
-                    logger.error(
-                        f'Azure DevOps API error: {response.status_code} - {response.text}'
-                    )
-                    return None
-
-                try:
-                    return response.json()
-                except Exception:
-                    return response.text
-
-        except httpx.RequestError as e:
-            logger.error(f'Request error: {e}')
-            return None
-        except Exception as e:
-            logger.error(f'Unexpected error: {e}')
-            return None
-
-    def get_base_url(self) -> str:
-        return f'https://{self.base_domain}/{self.owner}/{self.project_name}/_apis/git/repositories/{self.repo_name}'
-
-    def get_authorize_url(self) -> str:
-        return f'https://{self.username}:{self.token}@{self.base_domain}/'
-
-    def get_branch_url(self, branch_name: str) -> str:
-        return self.get_base_url() + f'/refs?filter=heads/{branch_name}'
-
-    def get_download_url(self) -> str:
-        return f'https://{self.base_domain}/{self.owner}/{self.project_name}/_apis/wit/workitems'
-
-    def get_clone_url(self) -> str:
-        return f'https://{self.username}:{self.token}@{self.base_domain}/{self.owner}/{self.project_name}/_git/{self.repo_name}'
-
-    def get_graphql_url(self) -> str:
-        return f'https://{self.base_domain}/{self.owner}/_apis/graphql'
-
-    def get_compare_url(self, branch_name: str) -> str:
-        return f'https://{self.base_domain}/{self.owner}/{self.project_name}/_git/{self.repo_name}/branchCompare?baseVersion=GC{self.get_default_branch_name()}&targetVersion=GC{branch_name}'
-
-    def get_converted_issues(
-        self, issue_numbers: list[int] | None = None, comment_id: int | None = None
-    ) -> list[Issue]:
-        """Download issues from Azure DevOps.
-
-        Args:
-            issue_numbers: The numbers of the issues to download
-            comment_id: The ID of a single comment, if provided, otherwise all comments
-
-        Returns:
-            List of Azure DevOps issues.
-        """
-        if not issue_numbers:
-            raise ValueError('Unspecified issue number')
-
-        all_issues = self.download_issues()
-        logger.info(f'Limiting resolving to issues {issue_numbers}.')
-        all_issues = [issue for issue in all_issues if issue['id'] in issue_numbers]
-
-        if len(issue_numbers) == 1 and not all_issues:
-            raise ValueError(f'Issue {issue_numbers[0]} not found')
-
-        converted_issues = []
-        for issue in all_issues:
-            # Check for required fields (id and title)
-            if any(
-                [
-                    issue.get('fields', {}).get(key) is None
-                    for key in ['System.Id', 'System.Title']
-                ]
-            ):
-                logger.warning(f'Skipping issue {issue} as it is missing id or title.')
-                continue
-
-            # Handle empty body by using empty string
-            description = issue.get('fields', {}).get('System.Description', '')
-            if description is None:
-                description = ''
-
-            # Get issue thread comments
-            thread_comments = self.get_issue_comments(
-                issue['id'], comment_id=comment_id
-            )
-
-            # Convert empty lists to None for optional fields
-            issue_details = Issue(
-                owner=self.owner,
-                repo=self.repo,
-                number=issue['id'],
-                title=issue['fields']['System.Title'],
-                body=description,
-                thread_comments=thread_comments,
-                review_comments=None,  # Initialize review comments as None for regular issues
-            )
-
-            converted_issues.append(issue_details)
-
-        return converted_issues
-
-    def download_issues(self) -> list[Any]:
-        """Download issues from Azure DevOps using HTTP API calls."""
-        return asyncio.run(self._download_issues_async())
-
-    async def _download_issues_async(self) -> list[Any]:
-        """Download issues from Azure DevOps asynchronously."""
-        # Use WIQL to query for open bugs
-        wiql_url = f'{self.api_base_url}/wit/wiql'
-        wiql_params = {'api-version': '7.1-preview.2'}
-
-        wiql_query = {
-            'query': f"""
-                select [System.Id],
-                    [System.WorkItemType],
-                    [System.Title],
-                    [System.State],
-                    [System.Description]
-                from WorkItems
-                where [System.TeamProject] = '{self.project_name}'
-                and [System.WorkItemType] in ('Bug', 'Issue', 'Task')
-                and [System.State] <> 'Closed'
-                and [System.State] <> 'Resolved'
-                and [System.State] <> 'Done'
-                order by [System.ChangedDate] desc
-            """
-        }
-
-        wiql_data = await self._make_api_request(
-            wiql_url,
-            method=RequestMethod.POST,
-            params=wiql_params,
-            json_data=wiql_query,
-        )
-
-        if not wiql_data or not isinstance(wiql_data, dict):
-            return []
-
-        work_items = wiql_data.get('workItems', [])
-
-        # Get full work item details
-        all_issues = []
-        for work_item in work_items:
-            work_item_id = work_item.get('id')
-            if not work_item_id:
-                continue
-
-            # Get work item details
-            work_item_url = f'{self.api_base_url}/wit/workitems/{work_item_id}'
-            work_item_params = {'api-version': '7.1-preview.3'}
-
-            work_item_data = await self._make_api_request(
-                work_item_url, params=work_item_params
-            )
-
-            if work_item_data and isinstance(work_item_data, dict):
-                # Convert the work item to a dictionary format similar to GitHub/GitLab
-                issue = {
-                    'id': work_item_data.get('id'),
-                    'fields': work_item_data.get('fields', {}),
-                }
-                all_issues.append(issue)
-
-        return all_issues
-
-    def get_issue_comments(
-        self, issue_number: int, comment_id: int | None = None
-    ) -> list[str] | None:
-        """Download comments for a specific issue from Azure DevOps."""
-        return asyncio.run(self._get_issue_comments_async(issue_number, comment_id))
-
-    async def _get_issue_comments_async(
-        self, issue_number: int, comment_id: int | None = None
-    ) -> list[str] | None:
-        """Download comments for a specific issue from Azure DevOps asynchronously."""
-        # Get the comments for the work item
-        comments_url = f'{self.api_base_url}/wit/workItems/{issue_number}/comments'
-        comments_params = {'api-version': '7.1-preview.3'}
-
-        comments_data = await self._make_api_request(
-            comments_url, params=comments_params
-        )
-
-        if not comments_data or not isinstance(comments_data, dict):
-            return None
-
-        comments = comments_data.get('comments', [])
-
-        all_comments = []
-        if comments:
-            if comment_id:
-                matching_comment = next(
-                    (
-                        comment.get('text', '')
-                        for comment in comments
-                        if comment.get('id') == comment_id
-                    ),
-                    None,
-                )
-                if matching_comment:
-                    return [matching_comment]
-            else:
-                all_comments = [
-                    comment.get('text', '')
-                    for comment in comments
-                    if comment.get('text')
-                ]
-
-        return all_comments if all_comments else None
-
-    def branch_exists(self, branch_name: str) -> bool:
-        """Check if a branch exists."""
-        return asyncio.run(self._branch_exists_async(branch_name))
-
-    async def _branch_exists_async(self, branch_name: str) -> bool:
-        """Check if a branch exists asynchronously."""
-        logger.info(f'Checking if branch {branch_name} exists...')
-
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return False
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return False
-
-            repo_id = repo.get('id')
-
-            # Get the branches (refs) for the repository
-            refs_url = f'{self.api_base_url}/git/repositories/{repo_id}/refs'
-            refs_params = {
-                'api-version': '7.1-preview.1',
-                'filter': f'heads/{branch_name}',
-            }
-
-            refs_data = await self._make_api_request(refs_url, params=refs_params)
-
-            if not refs_data or not isinstance(refs_data, dict):
-                return False
-
-            refs = refs_data.get('value', [])
-            exists = len(refs) > 0
-
-            logger.info(f'Branch {branch_name} exists: {exists}')
-            return exists
-        except Exception as e:
-            logger.warning(f'Error checking if branch exists: {e}')
-            return False
-
-    def get_branch_name(self, base_branch_name: str) -> str:
-        branch_name = base_branch_name
-        attempt = 1
-        while self.branch_exists(branch_name):
-            attempt += 1
-            branch_name = f'{base_branch_name}-try{attempt}'
-        return branch_name
-
-    def reply_to_comment(self, pr_number: int, comment_id: str, reply: str) -> None:
-        """Reply to a comment on a pull request."""
-        asyncio.run(self._reply_to_comment_async(pr_number, comment_id, reply))
-
-    async def _reply_to_comment_async(
-        self, pr_number: int, comment_id: str, reply: str
-    ) -> None:
-        """Reply to a comment on a pull request asynchronously."""
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return
-
-            repo_id = repo.get('id')
-
-            # Create a comment reply
-            comment_reply = f'Openhands fix success summary\n\n\n{reply}'
-
-            # Add the comment to the thread
-            comment_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullRequests/{pr_number}/threads/{comment_id}/comments'
-            comment_params = {'api-version': '7.1-preview.1'}
-            comment_data = {'content': comment_reply}
-
-            await self._make_api_request(
-                comment_url,
-                method=RequestMethod.POST,
-                params=comment_params,
-                json_data=comment_data,
-            )
-        except Exception as e:
-            logger.warning(f'Error replying to comment: {e}')
-
-    def get_pull_url(self, pr_number: int) -> str:
-        return f'https://{self.base_domain}/{self.owner}/{self.project_name}/_git/{self.repo_name}/pullrequest/{pr_number}'
-
-    def get_default_branch_name(self) -> str:
-        """Get the default branch name."""
-        return asyncio.run(self._get_default_branch_name_async())
-
-    async def _get_default_branch_name_async(self) -> str:
-        """Get the default branch name asynchronously."""
-        try:
-            # First, get the repository
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return 'main'  # Default to 'main' if repository not found
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return 'main'  # Default to 'main' if repository not found
-
-            # Get the default branch
-            default_branch = repo.get('defaultBranch', 'refs/heads/main')
-            return default_branch.replace('refs/heads/', '')
-        except Exception as e:
-            logger.warning(f'Error getting default branch: {e}')
-            return 'main'  # Default to 'main' if an error occurs
-
-    def create_pull_request(self, data: dict[str, Any] | None = None) -> dict[str, Any]:
-        """Create a pull request."""
-        return asyncio.run(self._create_pull_request_async(data))
-
-    async def _create_pull_request_async(
-        self, data: dict[str, Any] | None = None
-    ) -> dict[str, Any]:
-        """Create a pull request asynchronously."""
-        if data is None:
-            data = {}
-
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                raise RuntimeError(f'Repository not found: {self.repo_name}')
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                raise RuntimeError(f'Repository not found: {self.repo_name}')
-
-            repo_id = repo.get('id')
-
-            # Create the pull request
-            pr_data = {
-                'sourceRefName': f'refs/heads/{data.get("head", "")}',
-                'targetRefName': f'refs/heads/{data.get("base", "")}',
-                'title': data.get('title', ''),
-                'description': data.get('body', ''),
-            }
-
-            pr_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullrequests'
-            pr_params = {'api-version': '7.1-preview.1'}
-
-            created_pr = await self._make_api_request(
-                pr_url, method=RequestMethod.POST, params=pr_params, json_data=pr_data
-            )
-
-            if not created_pr or not isinstance(created_pr, dict):
-                raise RuntimeError('Failed to create pull request')
-
-            # Convert to a format similar to GitHub/GitLab
-            pr_id = created_pr.get('pullRequestId')
-            if pr_id is None:
-                raise RuntimeError('Pull request ID not found in response')
-
-            pr_result = {
-                'id': pr_id,
-                'number': pr_id,
-                'html_url': self.get_pull_url(pr_id),
-            }
-
-            return pr_result
-        except Exception as e:
-            if '403' in str(e):
-                raise RuntimeError(
-                    'Failed to create pull request due to missing permissions. '
-                    'Make sure that the provided token has push permissions for the repository.'
-                )
-            raise RuntimeError(f'Failed to create pull request: {e}')
-
-    def request_reviewers(self, reviewer: str, pr_number: int) -> None:
-        """Request reviewers for a pull request."""
-        asyncio.run(self._request_reviewers_async(reviewer, pr_number))
-
-    async def _request_reviewers_async(self, reviewer: str, pr_number: int) -> None:
-        """Request reviewers for a pull request asynchronously."""
-        # Azure DevOps doesn't have a direct API for requesting reviewers
-        # Instead, we'll add a comment mentioning the reviewer
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return
-
-            repo_id = repo.get('id')
-
-            # Create a comment mentioning the reviewer
-            comment = f'@{reviewer} Please review this pull request.'
-
-            # Add the comment to the pull request
-            thread_data = {
-                'comments': [{'content': comment}],
-                'status': 'active',
-            }
-
-            thread_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullRequests/{pr_number}/threads'
-            thread_params = {'api-version': '7.1-preview.1'}
-
-            await self._make_api_request(
-                thread_url,
-                method=RequestMethod.POST,
-                params=thread_params,
-                json_data=thread_data,
-            )
-        except Exception as e:
-            logger.warning(f'Failed to request review from {reviewer}: {e}')
-
-    def send_comment_msg(self, issue_number: int, msg: str) -> None:
-        """Send a comment message to an Azure DevOps issue or pull request."""
-        asyncio.run(self._send_comment_msg_async(issue_number, msg))
-
-    async def _send_comment_msg_async(self, issue_number: int, msg: str) -> None:
-        """Send a comment message to an Azure DevOps issue or pull request asynchronously."""
-        try:
-            # Add the comment to the work item
-            comment_url = f'{self.api_base_url}/wit/workItems/{issue_number}/comments'
-            comment_params = {'api-version': '7.1-preview.3'}
-            comment_data = {'text': msg}
-
-            await self._make_api_request(
-                comment_url,
-                method=RequestMethod.POST,
-                params=comment_params,
-                json_data=comment_data,
-            )
-            logger.info(f'Comment added to the issue: {msg}')
-        except Exception as e:
-            logger.error(f'Failed to post comment: {e}')
-
-    def get_context_from_external_issues_references(
-        self,
-        closing_issues: list[str],
-        closing_issue_numbers: list[int],
-        issue_body: str,
-        review_comments: list[str] | None,
-        review_threads: list[ReviewThread],
-        thread_comments: list[str] | None,
-    ) -> list[str]:
-        """Get context from external issue references."""
-        # This method can remain largely the same as it doesn't use Azure DevOps SDK
-        context_items = []
-        if closing_issues:
-            context_items.append(f'Closing issues: {", ".join(closing_issues)}')
-        if closing_issue_numbers:
-            context_items.append(
-                f'Closing issue numbers: {", ".join(map(str, closing_issue_numbers))}'
-            )
-        if issue_body:
-            context_items.append(f'Issue body: {issue_body}')
-        if review_comments:
-            context_items.extend(review_comments)
-        if review_threads:
-            for thread in review_threads:
-                context_items.append(f'Review thread: {thread.comment}')
-        if thread_comments:
-            context_items.extend(thread_comments)
-        return context_items
-
-
-class AzureDevOpsPRHandler(AzureDevOpsIssueHandler):
-    """Azure DevOps Pull Request handler that extends the issue handler."""
-
-    def __init__(
-        self,
-        owner: str,
-        repo: str,
-        token: str,
-        username: str | None = None,
-        base_domain: str = 'dev.azure.com',
-    ):
-        """Initialize an Azure DevOps PR handler.
-
-        Args:
-            owner: The owner (organization) of the repository
-            repo: The name of the repository (format: project/repo)
-            token: The Azure DevOps personal access token
-            username: Optional Azure DevOps username
-            base_domain: The domain for Azure DevOps (default: "dev.azure.com")
-        """
-        super().__init__(owner, repo, token, username, base_domain)
-
-    def download_issues(self) -> list[Any]:
-        """Download pull requests from Azure DevOps."""
-        return asyncio.run(self._download_pull_requests_async())
-
-    async def _download_pull_requests_async(self) -> list[Any]:
-        """Download pull requests from Azure DevOps asynchronously."""
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return []
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return []
-
-            repo_id = repo.get('id')
-
-            # Get all active pull requests for the repository
-            prs_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullrequests'
-            prs_params = {
-                'api-version': '7.1-preview.1',
-                'searchCriteria.status': 'active',
-            }
-
-            prs_data = await self._make_api_request(prs_url, params=prs_params)
-
-            if not prs_data or not isinstance(prs_data, dict):
-                return []
-
-            pull_requests = prs_data.get('value', [])
-
-            # Convert pull requests to the issue format
-            all_issues = []
-            for pr in pull_requests:
-                # Convert the PR to a dictionary format similar to issues
-                issue = {
-                    'id': pr.get('pullRequestId'),
-                    'fields': {
-                        'System.Id': pr.get('pullRequestId'),
-                        'System.Title': pr.get('title', ''),
-                        'System.Description': pr.get('description', ''),
-                    },
-                    'source_branch': pr.get('sourceRefName', ''),
-                    'repository': repo,
-                }
-                all_issues.append(issue)
-
-            return all_issues
-
-        except Exception as e:
-            logger.warning(f'Error downloading pull requests: {e}')
-            return []
-
-    def get_converted_issues(
-        self, issue_numbers: list[int] | None = None, comment_id: int | None = None
-    ) -> list[Issue]:
-        """Download pull requests from Azure DevOps.
-
-        Args:
-            issue_numbers: The numbers of the pull requests to download
-            comment_id: The ID of a single comment, if provided, otherwise all comments
-
-        Returns:
-            List of Azure DevOps pull requests as Issue objects.
-        """
-        if not issue_numbers:
-            raise ValueError('Unspecified issue number')
-
-        all_issues = self.download_issues()
-        logger.info(f'Limiting resolving to issues {issue_numbers}.')
-        all_issues = [issue for issue in all_issues if issue['id'] in issue_numbers]
-
-        if len(issue_numbers) == 1 and not all_issues:
-            raise ValueError(f'Issue {issue_numbers[0]} not found')
-
-        converted_issues = []
-        for issue in all_issues:
-            # Get PR metadata
-            (
-                closing_issues,
-                closing_issue_numbers,
-                review_bodies,
-                review_threads,
-                thread_ids,
-            ) = self.download_pr_metadata(issue['id'], comment_id)
-
-            # Create the Issue object
-            converted_issue = Issue(
-                number=issue['id'],
-                title=issue['fields']['System.Title'],
-                body=issue['fields']['System.Description'],
-                owner=self.owner,
-                repo=f'{self.project_name}/{self.repo_name}',
-                head_branch=issue['source_branch'].replace('refs/heads/', ''),
-                closing_issues=closing_issues,
-                closing_issue_numbers=closing_issue_numbers,
-                review_bodies=review_bodies,
-                review_threads=review_threads,
-                thread_ids=thread_ids,
-            )
-            converted_issues.append(converted_issue)
-
-        return converted_issues
-
-    def download_pr_metadata(
-        self, pull_number: int, comment_id: int | None = None
-    ) -> tuple[list[str], list[int], list[str] | None, list[ReviewThread], list[str]]:
-        """Get metadata for a pull request."""
-        return asyncio.run(self._download_pr_metadata_async(pull_number, comment_id))
-
-    async def _download_pr_metadata_async(
-        self, pull_number: int, comment_id: int | None = None
-    ) -> tuple[list[str], list[int], list[str] | None, list[ReviewThread], list[str]]:
-        """Get metadata for a pull request asynchronously.
-
-        Args:
-            pull_number: The number of the pull request to query.
-            comment_id: Optional ID of a specific comment to focus on.
-
-        Returns:
-            Tuple containing:
-            1. List of closing issue bodies
-            2. List of closing issue numbers
-            3. List of review bodies
-            4. List of review threads
-            5. List of thread IDs
-        """
-        try:
-            # First, get the repository ID
-            repos_url = f'{self.api_base_url}/git/repositories'
-            repos_params = {
-                'api-version': '7.1-preview.1',
-                'project': self.project_name,
-            }
-
-            repos_data = await self._make_api_request(repos_url, params=repos_params)
-
-            if not repos_data or not isinstance(repos_data, dict):
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return [], [], None, [], []
-
-            repositories = repos_data.get('value', [])
-            repo = next(
-                (
-                    r
-                    for r in repositories
-                    if r.get('name', '').lower() == self.repo_name.lower()
-                ),
-                None,
-            )
-
-            if not repo:
-                logger.warning(f'Repository not found: {self.repo_name}')
-                return [], [], None, [], []
-
-            repo_id = repo.get('id')
-
-            # Get the pull request details
-            pr_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullRequests/{pull_number}'
-            pr_params = {'api-version': '7.1-preview.1'}
-
-            pr_data = await self._make_api_request(pr_url, params=pr_params)
-
-            if not pr_data:
-                logger.warning(f'Pull request {pull_number} not found')
-                return [], [], None, [], []
-
-            # Get threads (comments) for the pull request
-            threads_url = f'{self.api_base_url}/git/repositories/{repo_id}/pullRequests/{pull_number}/threads'
-            threads_params = {'api-version': '7.1-preview.1'}
-
-            threads_data = await self._make_api_request(
-                threads_url, params=threads_params
-            )
-
-            review_threads = []
-            thread_ids = []
-            review_bodies = []
-
-            if threads_data and isinstance(threads_data, dict):
-                threads = threads_data.get('value', [])
-
-                for thread in threads:
-                    thread_id = str(thread.get('id', ''))
-                    thread_ids.append(thread_id)
-
-                    comments = thread.get('comments', [])
-                    if comments:
-                        # Get the first comment as the main review body
-                        first_comment = comments[0]
-                        content = first_comment.get('content', '')
-                        if content:
-                            review_bodies.append(content)
-
-                        # Create review thread
-                        review_thread = ReviewThread(
-                            id=thread_id,
-                            body=content,
-                            line=None,  # Azure DevOps doesn't provide line numbers in the same way
-                            start_line=None,
-                            original_line=None,
-                            original_start_line=None,
-                            diff_hunk='',  # Would need additional API call to get diff
-                            path='',  # Would need additional API call to get file path
-                        )
-                        review_threads.append(review_thread)
-
-            # For now, we don't extract closing issues from PR description
-            # This would require parsing the description text
-            closing_issues: list[str] = []
-            closing_issue_numbers: list[int] = []
-
-            return (
-                closing_issues,
-                closing_issue_numbers,
-                review_bodies if review_bodies else None,
-                review_threads,
-                thread_ids,
-            )
-
-        except Exception as e:
-            logger.warning(f'Error downloading PR metadata: {e}')
-            return [], [], None, [], []
--- a/openhands/resolver/interfaces/issue.py
+++ b/openhands/resolver/interfaces/issue.py
@@ -121,5 +121,5 @@ class IssueHandlerInterface(ABC):
    def get_converted_issues(
        self, issue_numbers: list[int] | None = None, comment_id: int | None = None
    ) -> list[Issue]:
-        """Download issues from the git provider (GitHub, GitLab, or Azure DevOps)."""
+        """Download issues from Gitlab."""
        pass
--- a/openhands/resolver/issue_handler_factory.py
+++ b/openhands/resolver/issue_handler_factory.py
@@ -1,9 +1,5 @@
 from openhands.core.config import LLMConfig
 from openhands.integrations.provider import ProviderType
-from openhands.resolver.interfaces.azure_devops import (
-    AzureDevOpsIssueHandler,
-    AzureDevOpsPRHandler,
-)
 from openhands.resolver.interfaces.github import GithubIssueHandler, GithubPRHandler
 from openhands.resolver.interfaces.gitlab import GitlabIssueHandler, GitlabPRHandler
 from openhands.resolver.interfaces.issue_definitions import (
@@ -46,7 +42,7 @@ class IssueHandlerFactory:
                    ),
                    self.llm_config,
                )
-            elif self.platform == ProviderType.GITLAB:
+            else:  # platform == Platform.GITLAB
                return ServiceContextIssue(
                    GitlabIssueHandler(
                        self.owner,
@@ -57,19 +53,6 @@ class IssueHandlerFactory:
                    ),
                    self.llm_config,
                )
-            elif self.platform == ProviderType.AZURE_DEVOPS:
-                return ServiceContextIssue(
-                    AzureDevOpsIssueHandler(
-                        self.owner,
-                        self.repo,
-                        self.token,
-                        self.username,
-                        self.base_domain,
-                    ),
-                    self.llm_config,
-                )
-            else:
-                raise ValueError(f'Unsupported platform: {self.platform}')
        elif self.issue_type == 'pr':
            if self.platform == ProviderType.GITHUB:
                return ServiceContextPR(
@@ -82,7 +65,7 @@ class IssueHandlerFactory:
                    ),
                    self.llm_config,
                )
-            elif self.platform == ProviderType.GITLAB:
+            else:  # platform == Platform.GITLAB
                return ServiceContextPR(
                    GitlabPRHandler(
                        self.owner,
@@ -93,18 +76,5 @@ class IssueHandlerFactory:
                    ),
                    self.llm_config,
                )
-            elif self.platform == ProviderType.AZURE_DEVOPS:
-                return ServiceContextPR(
-                    AzureDevOpsPRHandler(
-                        self.owner,
-                        self.repo,
-                        self.token,
-                        self.username,
-                        self.base_domain,
-                    ),
-                    self.llm_config,
-                )
-            else:
-                raise ValueError(f'Unsupported platform: {self.platform}')
        else:
            raise ValueError(f'Invalid issue type: {self.issue_type}')
--- a/openhands/resolver/issue_resolver.py
+++ b/openhands/resolver/issue_resolver.py
@@ -50,7 +50,6 @@ AGENT_CLASS = 'CodeActAgent'

 class IssueResolver:
    GITLAB_CI = os.getenv('GITLAB_CI') == 'true'
-    AZURE_DEVOPS_CI = os.getenv('TF_BUILD') == 'True'

    def __init__(self, args: Namespace) -> None:
        """Initialize the IssueResolver with the given parameters.
@@ -77,12 +76,7 @@ class IssueResolver:
            raise ValueError('Invalid repository format. Expected owner/repo')
        owner, repo = parts

-        token = (
-            args.token
-            or os.getenv('GITHUB_TOKEN')
-            or os.getenv('GITLAB_TOKEN')
-            or os.getenv('AZURE_DEVOPS_TOKEN')
-        )
+        token = args.token or os.getenv('GITHUB_TOKEN') or os.getenv('GITLAB_TOKEN')
        username = args.username if args.username else os.getenv('GIT_USERNAME')
        if not username:
            raise ValueError('Username is required.')
@@ -126,11 +120,7 @@ class IssueResolver:
        base_domain = args.base_domain
        if base_domain is None:
            base_domain = (
-                'github.com'
-                if platform == ProviderType.GITHUB
-                else 'gitlab.com'
-                if platform == ProviderType.GITLAB
-                else 'dev.azure.com'
+                'github.com' if platform == ProviderType.GITHUB else 'gitlab.com'
            )

        self.output_dir = args.output_dir
@@ -250,14 +240,6 @@ class IssueResolver:
            if user_id == 0:
                sandbox_config.user_id = get_unique_uid()

-        # Configure sandbox for Azure DevOps CI environment
-        if cls.AZURE_DEVOPS_CI:
-            sandbox_config.use_host_network = False
-            sandbox_config.enable_auto_lint = True
-            sandbox_config.runtime_startup_env_vars = {
-                'TF_BUILD': 'True',
-            }
-
        openhands_config.sandbox.base_container_image = (
            sandbox_config.base_container_image
        )
@@ -291,9 +273,7 @@ class IssueResolver:
        if not isinstance(obs, CmdOutputObservation) or obs.exit_code != 0:
            raise RuntimeError(f'Failed to change directory to /workspace.\n{obs}')

-        if (self.platform == ProviderType.GITLAB and self.GITLAB_CI) or (
-            self.platform == ProviderType.AZURE_DEVOPS and self.AZURE_DEVOPS_CI
-        ):
+        if self.platform == ProviderType.GITLAB and self.GITLAB_CI:
            action = CmdRunAction(command='sudo chown -R 1001:0 /workspace/*')
            logger.info(action, extra={'msg_type': 'ACTION'})
            obs = runtime.run_action(action)
@@ -355,9 +335,7 @@ class IssueResolver:
        if not isinstance(obs, CmdOutputObservation) or obs.exit_code != 0:
            raise RuntimeError(f'Failed to set git config. Observation: {obs}')

-        if (self.platform == ProviderType.GITLAB and self.GITLAB_CI) or (
-            self.platform == ProviderType.AZURE_DEVOPS and self.AZURE_DEVOPS_CI
-        ):
+        if self.platform == ProviderType.GITLAB and self.GITLAB_CI:
            action = CmdRunAction(command='sudo git add -A')
        else:
            action = CmdRunAction(command='git add -A')
--- a/openhands/resolver/resolve_issue.py
+++ b/openhands/resolver/resolve_issue.py
@@ -116,7 +116,7 @@ def main() -> None:
        '--base-domain',
        type=str,
        default=None,
-        help='Base domain for the git server (defaults to "github.com" for GitHub, "gitlab.com" for GitLab, and "dev.azure.com" for Azure DevOps)',
+        help='Base domain for the git server (defaults to "github.com" for GitHub and "gitlab.com" for GitLab)',
    )

    my_args = parser.parse_args()
--- a/openhands/resolver/send_pull_request.py
+++ b/openhands/resolver/send_pull_request.py
@@ -11,7 +11,6 @@ from openhands.core.config import LLMConfig
 from openhands.core.logger import openhands_logger as logger
 from openhands.integrations.service_types import ProviderType
 from openhands.llm.llm import LLM
-from openhands.resolver.interfaces.azure_devops import AzureDevOpsIssueHandler
 from openhands.resolver.interfaces.github import GithubIssueHandler
 from openhands.resolver.interfaces.gitlab import GitlabIssueHandler
 from openhands.resolver.interfaces.issue import Issue
@@ -236,55 +235,40 @@ def send_pull_request(
    pr_title: str | None = None,
    base_domain: str | None = None,
 ) -> str:
-    """Send a pull request to a GitHub, GitLab, or Azure DevOps repository.
+    """Send a pull request to a GitHub or Gitlab repository.

    Args:
        issue: The issue to send the pull request for
-        token: The token to use for authentication
-        username: The username, if provided
+        token: The GitHub or Gitlab token to use for authentication
+        username: The GitHub or Gitlab username, if provided
        platform: The platform of the repository.
        patch_dir: The directory containing the patches to apply
        pr_type: The type: branch (no PR created), draft or ready (regular PR created)
        fork_owner: The owner of the fork to push changes to (if different from the original repo owner)
        additional_message: The additional messages to post as a comment on the PR in json list format
        target_branch: The target branch to create the pull request against (defaults to repository default branch)
-        reviewer: The username of the reviewer to assign
+        reviewer: The GitHub or Gitlab username of the reviewer to assign
        pr_title: Custom title for the pull request (optional)
-        base_domain: The base domain for the git server (defaults to "github.com" for GitHub, "gitlab.com" for GitLab, and "dev.azure.com" for Azure DevOps)
+        base_domain: The base domain for the git server (defaults to "github.com" for GitHub and "gitlab.com" for GitLab)
    """
    if pr_type not in ['branch', 'draft', 'ready']:
        raise ValueError(f'Invalid pr_type: {pr_type}')

    # Determine default base_domain based on platform
    if base_domain is None:
-        if platform == ProviderType.GITHUB:
-            base_domain = 'github.com'
-        elif platform == ProviderType.GITLAB:
-            base_domain = 'gitlab.com'
-        else:  # platform == ProviderType.AZURE_DEVOPS
-            base_domain = 'dev.azure.com'
+        base_domain = 'github.com' if platform == ProviderType.GITHUB else 'gitlab.com'

-    # Create the appropriate handler based on platform
    handler = None
    if platform == ProviderType.GITHUB:
        handler = ServiceContextIssue(
            GithubIssueHandler(issue.owner, issue.repo, token, username, base_domain),
            None,
        )
-    elif platform == ProviderType.GITLAB:
+    else:  # platform == Platform.GITLAB
        handler = ServiceContextIssue(
            GitlabIssueHandler(issue.owner, issue.repo, token, username, base_domain),
            None,
        )
-    elif platform == ProviderType.AZURE_DEVOPS:
-        handler = ServiceContextIssue(
-            AzureDevOpsIssueHandler(
-                issue.owner, issue.repo, token, username, base_domain
-            ),
-            None,
-        )
-    else:
-        raise ValueError(f'Unsupported platform: {platform}')

    # Create a new branch with a unique name
    base_branch_name = f'openhands-fix-issue-{issue.number}'
--- a/openhands/resolver/utils.py
+++ b/openhands/resolver/utils.py
@@ -17,7 +17,7 @@ from openhands.integrations.utils import validate_provider_token

 async def identify_token(token: str, base_domain: str | None) -> ProviderType:
    """
-    Identifies whether a token belongs to GitHub, GitLab, or Azure DevOps.
+    Identifies whether a token belongs to GitHub or GitLab.
    Parameters:
        token (str): The personal access token to check.
        base_domain (str): Custom base domain for provider (e.g GitHub Enterprise)
--- a/openhands/runtime/action_execution_server.py
+++ b/openhands/runtime/action_execution_server.py
@@ -18,6 +18,7 @@ import time
 import traceback
 from contextlib import asynccontextmanager
 from pathlib import Path
+from typing import Any
 from zipfile import ZipFile

 from binaryornot.check import is_binary
@@ -213,6 +214,94 @@ class ActionExecutor:
    def initial_cwd(self):
        return self._initial_cwd

+    def _extract_action_metadata(self, action: Action) -> dict[str, Any]:
+        """Extract relevant metadata from an action for logging, excluding large content."""
+        metadata: dict[str, Any] = {}
+
+        # Common metadata for all actions
+        if hasattr(action, 'timeout'):
+            metadata['timeout'] = action.timeout
+
+        # Action-specific metadata
+        if isinstance(action, (FileReadAction, FileWriteAction, FileEditAction)):
+            metadata['path'] = getattr(action, 'path', None)
+            if isinstance(action, FileReadAction):
+                metadata['start'] = getattr(action, 'start', None)
+                metadata['end'] = getattr(action, 'end', None)
+                metadata['view_range'] = getattr(action, 'view_range', None)
+            elif isinstance(action, FileWriteAction):
+                metadata['start'] = getattr(action, 'start', None)
+                metadata['end'] = getattr(action, 'end', None)
+                # Don't log content, just its length
+                content = getattr(action, 'content', '')
+                metadata['content_length'] = len(content) if content else 0
+            elif isinstance(action, FileEditAction):
+                metadata['command'] = getattr(action, 'command', None)
+                metadata['insert_line'] = getattr(action, 'insert_line', None)
+                # Don't log old_str/new_str content, just their lengths
+                old_str = getattr(action, 'old_str', '')
+                new_str = getattr(action, 'new_str', '')
+                metadata['old_str_length'] = len(old_str) if old_str else 0
+                metadata['new_str_length'] = len(new_str) if new_str else 0
+        elif isinstance(action, CmdRunAction):
+            # Log command but truncate if very long
+            command = getattr(action, 'command', '')
+            metadata['command'] = (
+                command[:200] + '...' if len(command) > 200 else command
+            )
+            metadata['blocking'] = getattr(action, 'blocking', None)
+            metadata['keep_prompt'] = getattr(action, 'keep_prompt', None)
+        elif isinstance(action, IPythonRunCellAction):
+            # Log code but truncate if very long
+            code = getattr(action, 'code', '')
+            metadata['code_length'] = len(code) if code else 0
+            metadata['code_preview'] = code[:100] + '...' if len(code) > 100 else code
+        elif isinstance(action, (BrowseURLAction, BrowseInteractiveAction)):
+            metadata['url'] = getattr(action, 'url', None)
+            if isinstance(action, BrowseInteractiveAction):
+                metadata['browser_actions'] = len(
+                    getattr(action, 'browser_actions', [])
+                )
+
+        return metadata
+
+    def _extract_observation_metadata(self, observation) -> dict[str, Any]:
+        """Extract relevant metadata from an observation for logging, excluding large content."""
+        metadata: dict[str, Any] = {}
+
+        # Common metadata
+        metadata['observation_type'] = type(observation).__name__
+
+        # Check for error conditions
+        if hasattr(observation, 'error') and observation.error:
+            metadata['has_error'] = True
+            metadata['error'] = str(observation.error)[:200]  # Truncate long errors
+        else:
+            metadata['has_error'] = False
+
+        # Observation-specific metadata
+        if hasattr(observation, 'path'):
+            metadata['path'] = observation.path
+
+        if hasattr(observation, 'exit_code'):
+            metadata['exit_code'] = observation.exit_code
+
+        if hasattr(observation, 'content'):
+            content = observation.content
+            metadata['content_length'] = len(content) if content else 0
+            # For file operations, check if content looks like it contains file data
+            if metadata['content_length'] > 0:
+                metadata['content_preview'] = (
+                    content[:100] + '...' if len(content) > 100 else content
+                )
+
+        # For file edit observations, include diff info
+        if hasattr(observation, 'diff') and observation.diff:
+            metadata['has_diff'] = True
+            metadata['diff_length'] = len(observation.diff)
+
+        return metadata
+
    async def _init_browser_async(self):
        """Initialize the browser asynchronously."""
        if sys.platform == 'win32':
@@ -377,28 +466,131 @@ class ActionExecutor:
            assert obs.exit_code == 0
        logger.debug('Bash init commands completed')

-    async def run_action(self, action) -> Observation:
+    async def run_action(self, action: Action) -> Observation:
        async with self.lock:
-            action_type = action.action
-            observation = await getattr(self, action_type)(action)
-            return observation
+            action_type = action.action  # type: ignore[attr-defined]
+            start_time = time.time()
+
+            # Log action execution start with metadata
+            action_metadata = self._extract_action_metadata(action)
+            logger.info(
+                f'Executing action: {action_type}',
+                extra={
+                    'action_type': action_type,
+                    'action_id': getattr(action, 'id', None),
+                    'action_metadata': action_metadata,
+                    'timestamp': start_time,
+                },
+            )
+
+            try:
+                observation = await getattr(self, action_type)(action)
+                execution_time = time.time() - start_time
+
+                # Log successful action completion with observation metadata
+                obs_metadata = self._extract_observation_metadata(observation)
+                logger.info(
+                    f'Action completed successfully: {action_type}',
+                    extra={
+                        'action_type': action_type,
+                        'action_id': getattr(action, 'id', None),
+                        'observation_type': type(observation).__name__,
+                        'execution_time_ms': round(execution_time * 1000, 2),
+                        'observation_metadata': obs_metadata,
+                        'success': True,
+                    },
+                )
+
+                return observation
+            except Exception as e:
+                execution_time = time.time() - start_time
+
+                # Log action execution failure
+                logger.error(
+                    f'Action failed: {action_type}',
+                    extra={
+                        'action_type': action_type,
+                        'action_id': getattr(action, 'id', None),
+                        'execution_time_ms': round(execution_time * 1000, 2),
+                        'error': str(e),
+                        'error_type': type(e).__name__,
+                        'success': False,
+                    },
+                    exc_info=True,
+                )
+
+                raise

    async def run(
        self, action: CmdRunAction
    ) -> CmdOutputObservation | ErrorObservation:
+        # Log command execution attempt
+        command_preview = (
+            action.command[:100] + '...'
+            if len(action.command) > 100
+            else action.command
+        )
+        logger.debug(
+            f'Executing command: {command_preview}',
+            extra={
+                'operation': 'cmd_run',
+                'command_length': len(action.command),
+                'blocking': action.blocking,
+                'is_static': action.is_static,
+                'cwd': action.cwd if action.is_static else None,
+            },
+        )
+
        try:
            bash_session = self.bash_session
            if action.is_static:
                bash_session = self._create_bash_session(action.cwd)
            assert bash_session is not None
            obs = await call_sync_from_async(bash_session.execute, action)
+
+            # Log command execution result
+            logger.debug(
+                f'Command completed: {command_preview}',
+                extra={
+                    'operation': 'cmd_run',
+                    'exit_code': obs.exit_code if hasattr(obs, 'exit_code') else None,
+                    'output_length': len(obs.content)
+                    if hasattr(obs, 'content') and obs.content
+                    else 0,
+                    'success': obs.exit_code == 0
+                    if hasattr(obs, 'exit_code')
+                    else True,
+                },
+            )
+
            return obs
        except Exception as e:
-            logger.error(f'Error running command: {e}')
+            logger.error(
+                f'Error running command: {command_preview}',
+                extra={
+                    'operation': 'cmd_run',
+                    'error': str(e),
+                    'error_type': type(e).__name__,
+                },
+            )
            return ErrorObservation(str(e))

    async def run_ipython(self, action: IPythonRunCellAction) -> Observation:
        assert self.bash_session is not None
+
+        # Log IPython execution attempt
+        code_preview = (
+            action.code[:100] + '...' if len(action.code) > 100 else action.code
+        )
+        logger.debug(
+            f'Executing IPython code: {code_preview}',
+            extra={
+                'operation': 'ipython_run',
+                'code_length': len(action.code),
+                'include_extra': action.include_extra,
+            },
+        )
+
        if 'jupyter' in self.plugins:
            _jupyter_plugin: JupyterPlugin = self.plugins['jupyter']  # type: ignore
            # This is used to make AgentSkills in Jupyter aware of the
@@ -428,6 +620,17 @@ class ActionExecutor:
                    f'\n[Jupyter current working directory: {self.bash_session.cwd}]'
                )
                obs.content += f'\n[Jupyter Python interpreter: {_jupyter_plugin.python_interpreter_path}]'
+
+            # Log IPython execution result
+            logger.debug(
+                f'IPython code completed: {code_preview}',
+                extra={
+                    'operation': 'ipython_run',
+                    'output_length': len(obs.content) if obs.content else 0,
+                    'success': not hasattr(obs, 'error') or not obs.error,
+                },
+            )
+
            return obs
        else:
            raise RuntimeError(
@@ -443,8 +646,26 @@ class ActionExecutor:
    async def read(self, action: FileReadAction) -> Observation:
        assert self.bash_session is not None

+        # Log file read attempt
+        logger.debug(
+            f'Attempting to read file: {action.path}',
+            extra={
+                'operation': 'file_read',
+                'path': action.path,
+                'working_dir': self.bash_session.cwd,
+            },
+        )
+
        # Cannot read binary files
        if is_binary(action.path):
+            logger.warning(
+                f'Attempted to read binary file: {action.path}',
+                extra={
+                    'operation': 'file_read',
+                    'path': action.path,
+                    'error': 'binary_file',
+                },
+            )
            return ErrorObservation('ERROR_BINARY_FILE')

        if action.impl_source == FileReadSource.OH_ACI:
@@ -467,6 +688,14 @@ class ActionExecutor:
        filepath = self._resolve_path(action.path, working_dir)
        try:
            if filepath.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):
+                logger.debug(
+                    f'Reading image file: {filepath}',
+                    extra={
+                        'operation': 'file_read',
+                        'path': filepath,
+                        'file_type': 'image',
+                    },
+                )
                with open(filepath, 'rb') as file:
                    image_data = file.read()
                    encoded_image = base64.b64encode(image_data).decode('utf-8')
@@ -475,6 +704,15 @@ class ActionExecutor:
                        mime_type = 'image/png'  # default to PNG if mime type cannot be determined
                    encoded_image = f'data:{mime_type};base64,{encoded_image}'

+                logger.debug(
+                    f'Successfully read image file: {filepath}',
+                    extra={
+                        'operation': 'file_read',
+                        'path': filepath,
+                        'file_type': 'image',
+                        'size_bytes': len(image_data),
+                    },
+                )
                return FileReadObservation(path=filepath, content=encoded_image)
            elif filepath.lower().endswith('.pdf'):
                with open(filepath, 'rb') as file:
@@ -495,13 +733,50 @@ class ActionExecutor:

            with open(filepath, 'r', encoding='utf-8') as file:
                lines = read_lines(file.readlines(), action.start, action.end)
+
+            logger.debug(
+                f'Successfully read text file: {filepath}',
+                extra={
+                    'operation': 'file_read',
+                    'path': filepath,
+                    'file_type': 'text',
+                    'lines_read': len(lines),
+                    'start_line': action.start,
+                    'end_line': action.end,
+                },
+            )
        except FileNotFoundError:
+            logger.warning(
+                f'File not found during read: {filepath}',
+                extra={
+                    'operation': 'file_read',
+                    'path': filepath,
+                    'working_dir': working_dir,
+                    'error': 'file_not_found',
+                },
+            )
            return ErrorObservation(
                f'File not found: {filepath}. Your current working directory is {working_dir}.'
            )
        except UnicodeDecodeError:
+            logger.warning(
+                f'Unicode decode error reading file: {filepath}',
+                extra={
+                    'operation': 'file_read',
+                    'path': filepath,
+                    'error': 'unicode_decode_error',
+                },
+            )
            return ErrorObservation(f'File could not be decoded as utf-8: {filepath}.')
        except IsADirectoryError:
+            logger.warning(
+                f'Attempted to read directory as file: {filepath}',
+                extra={
+                    'operation': 'file_read',
+                    'path': filepath,
+                    'error': 'is_directory',
+                },
+            )
            return ErrorObservation(
                f'Path is a directory: {filepath}. You can only read files'
            )
@@ -514,15 +789,53 @@ class ActionExecutor:
        working_dir = self.bash_session.cwd
        filepath = self._resolve_path(action.path, working_dir)

+        # Log file write attempt
+        logger.debug(
+            f'Attempting to write file: {filepath}',
+            extra={
+                'operation': 'file_write',
+                'path': filepath,
+                'working_dir': working_dir,
+                'content_length': len(action.content) if action.content else 0,
+                'start_line': action.start,
+                'end_line': action.end,
+            },
+        )
+
        insert = action.content.split('\n')
        if not os.path.exists(os.path.dirname(filepath)):
+            logger.debug(
+                f'Creating directory for file: {os.path.dirname(filepath)}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'directory_created': os.path.dirname(filepath),
+                },
+            )
            os.makedirs(os.path.dirname(filepath))

        file_exists = os.path.exists(filepath)
        if file_exists:
            file_stat = os.stat(filepath)
+            logger.debug(
+                f'File exists, will modify: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'file_exists': True,
+                    'file_size': file_stat.st_size,
+                },
+            )
        else:
            file_stat = None
+            logger.debug(
+                f'Creating new file: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'file_exists': False,
+                },
+            )

        mode = 'w' if not file_exists else 'r+'
        try:
@@ -538,12 +851,36 @@ class ActionExecutor:
                file.truncate()

        except FileNotFoundError:
+            logger.warning(
+                f'File not found during write: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'error': 'file_not_found',
+                },
+            )
            return ErrorObservation(f'File not found: {filepath}')
        except IsADirectoryError:
+            logger.warning(
+                f'Attempted to write to directory: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'error': 'is_directory',
+                },
+            )
            return ErrorObservation(
                f'Path is a directory: {filepath}. You can only write to files'
            )
        except UnicodeDecodeError:
+            logger.warning(
+                f'Unicode decode error writing file: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'error': 'unicode_decode_error',
+                },
+            )
            return ErrorObservation(f'File could not be decoded as utf-8: {filepath}')

        # Attempt to handle file permissions
@@ -558,13 +895,48 @@ class ActionExecutor:
                os.chmod(filepath, 0o664)
                os.chown(filepath, self.user_id, self.user_id)
        except PermissionError as e:
+            logger.warning(
+                f'Permission error setting file permissions: {filepath}',
+                extra={
+                    'operation': 'file_write',
+                    'path': filepath,
+                    'error': 'permission_error',
+                    'error_details': str(e),
+                },
+            )
            return ErrorObservation(
                f'File {filepath} written, but failed to change ownership and permissions: {e}'
            )
+
+        logger.debug(
+            f'Successfully wrote file: {filepath}',
+            extra={
+                'operation': 'file_write',
+                'path': filepath,
+                'lines_written': len(insert),
+                'final_size': os.path.getsize(filepath)
+                if os.path.exists(filepath)
+                else 0,
+            },
+        )
        return FileWriteObservation(content='', path=filepath)

    async def edit(self, action: FileEditAction) -> Observation:
        assert action.impl_source == FileEditSource.OH_ACI
+
+        # Log file edit attempt
+        logger.debug(
+            f'Attempting to edit file: {action.path}',
+            extra={
+                'operation': 'file_edit',
+                'path': action.path,
+                'command': action.command,
+                'insert_line': action.insert_line,
+                'old_str_length': len(action.old_str) if action.old_str else 0,
+                'new_str_length': len(action.new_str) if action.new_str else 0,
+            },
+        )
+
        result_str, (old_content, new_content) = _execute_file_editor(
            self.file_editor,
            command=action.command,
@@ -576,6 +948,30 @@ class ActionExecutor:
            enable_linting=False,
        )

+        # Log edit result
+        if result_str.startswith('ERROR:'):
+            logger.warning(
+                f'File edit failed: {action.path}',
+                extra={
+                    'operation': 'file_edit',
+                    'path': action.path,
+                    'command': action.command,
+                    'error': result_str[:200],  # Truncate long errors
+                },
+            )
+        else:
+            logger.debug(
+                f'Successfully edited file: {action.path}',
+                extra={
+                    'operation': 'file_edit',
+                    'path': action.path,
+                    'command': action.command,
+                    'has_diff': bool(old_content and new_content),
+                    'old_content_length': len(old_content) if old_content else 0,
+                    'new_content_length': len(new_content) if new_content else 0,
+                },
+            )
+
        return FileEditObservation(
            content=result_str,
            path=action.path,
@@ -771,15 +1167,60 @@ if __name__ == '__main__':
    @app.post('/execute_action')
    async def execute_action(action_request: ActionRequest):
        assert client is not None
+        request_start_time = time.time()
+
        try:
            action = event_from_dict(action_request.action)
            if not isinstance(action, Action):
+                logger.error(
+                    'Invalid action type received in /execute_action',
+                    extra={
+                        'action_dict': action_request.action,
+                        'error': 'Invalid action type',
+                    },
+                )
                raise HTTPException(status_code=400, detail='Invalid action type')
+
+            # Log the HTTP request
+            logger.debug(
+                f'Received action request: {action.action}',  # type: ignore[attr-defined]
+                extra={
+                    'action_type': action.action,  # type: ignore[attr-defined]
+                    'action_id': getattr(action, 'id', None),
+                    'endpoint': '/execute_action',
+                },
+            )
+
            client.last_execution_time = time.time()
            observation = await client.run_action(action)
+
+            request_time = time.time() - request_start_time
+            logger.debug(
+                f'Action request completed: {action.action}',  # type: ignore[attr-defined]
+                extra={
+                    'action_type': action.action,  # type: ignore[attr-defined]
+                    'action_id': getattr(action, 'id', None),
+                    'endpoint': '/execute_action',
+                    'total_request_time_ms': round(request_time * 1000, 2),
+                    'observation_type': type(observation).__name__,
+                },
+            )
+
            return event_to_dict(observation)
+        except HTTPException:
+            # Re-raise HTTP exceptions without additional logging
+            raise
        except Exception as e:
-            logger.error(f'Error while running /execute_action: {str(e)}')
+            request_time = time.time() - request_start_time
+            logger.error(
+                f'Error while running /execute_action: {str(e)}',
+                extra={
+                    'endpoint': '/execute_action',
+                    'total_request_time_ms': round(request_time * 1000, 2),
+                    'error': str(e),
+                    'error_type': type(e).__name__,
+                },
+            )
            raise HTTPException(
                status_code=500,
                detail=traceback.format_exc(),
@@ -844,6 +1285,17 @@ if __name__ == '__main__':
    ):
        assert client is not None

+        logger.debug(
+            f'File upload request: {file.filename}',
+            extra={
+                'operation': 'upload_file',
+                'filename': file.filename,
+                'destination': destination,
+                'recursive': recursive,
+                'file_size': file.size if hasattr(file, 'size') else None,
+            },
+        )
+
        try:
            # Ensure the destination directory exists
            if not os.path.isabs(destination):
@@ -870,15 +1322,30 @@ if __name__ == '__main__':
                shutil.unpack_archive(zip_path, full_dest_path)
                os.remove(zip_path)  # Remove the zip file after extraction

-                logger.debug(
-                    f'Uploaded file {file.filename} and extracted to {destination}'
+                logger.info(
+                    f'Uploaded and extracted zip file: {file.filename}',
+                    extra={
+                        'operation': 'upload_file',
+                        'filename': file.filename,
+                        'destination': destination,
+                        'type': 'zip_extraction',
+                    },
                )
            else:
                # For single file uploads
                file_path = os.path.join(full_dest_path, file.filename)
                with open(file_path, 'wb') as buffer:
                    shutil.copyfileobj(file.file, buffer)
-                logger.debug(f'Uploaded file {file.filename} to {destination}')
+                logger.info(
+                    f'Uploaded single file: {file.filename}',
+                    extra={
+                        'operation': 'upload_file',
+                        'filename': file.filename,
+                        'destination': destination,
+                        'file_path': file_path,
+                        'type': 'single_file',
+                    },
+                )

            return JSONResponse(
                content={
@@ -890,20 +1357,50 @@ if __name__ == '__main__':
            )

        except Exception as e:
+            logger.error(
+                f'File upload failed: {file.filename}',
+                extra={
+                    'operation': 'upload_file',
+                    'filename': file.filename,
+                    'destination': destination,
+                    'error': str(e),
+                    'error_type': type(e).__name__,
+                },
+            )
            raise HTTPException(status_code=500, detail=str(e))

    @app.get('/download_files')
    def download_file(path: str):
-        logger.debug('Downloading files')
+        logger.debug(
+            f'File download request: {path}',
+            extra={'operation': 'download_files', 'path': path},
+        )
        try:
            if not os.path.isabs(path):
+                logger.warning(
+                    f'Download request with relative path: {path}',
+                    extra={
+                        'operation': 'download_files',
+                        'path': path,
+                        'error': 'relative_path',
+                    },
+                )
                raise HTTPException(
                    status_code=400, detail='Path must be an absolute path'
                )

            if not os.path.exists(path):
+                logger.warning(
+                    f'Download request for non-existent path: {path}',
+                    extra={
+                        'operation': 'download_files',
+                        'path': path,
+                        'error': 'file_not_found',
+                    },
+                )
                raise HTTPException(status_code=404, detail='File not found')

+            file_count = 0
            with tempfile.NamedTemporaryFile(suffix='.zip', delete=False) as temp_zip:
                with ZipFile(temp_zip, 'w') as zipf:
                    for root, _, files in os.walk(path):
@@ -912,6 +1409,18 @@ if __name__ == '__main__':
                            zipf.write(
                                file_path, arcname=os.path.relpath(file_path, path)
                            )
+                            file_count += 1
+
+                logger.info(
+                    f'Successfully created download zip: {path}',
+                    extra={
+                        'operation': 'download_files',
+                        'path': path,
+                        'files_included': file_count,
+                        'zip_path': temp_zip.name,
+                    },
+                )
+
                return FileResponse(
                    path=temp_zip.name,
                    media_type='application/zip',
@@ -920,6 +1429,15 @@ if __name__ == '__main__':
                )

        except Exception as e:
+            logger.error(
+                f'File download failed: {path}',
+                extra={
+                    'operation': 'download_files',
+                    'path': path,
+                    'error': str(e),
+                    'error_type': type(e).__name__,
+                },
+            )
            raise HTTPException(status_code=500, detail=str(e))

    @app.get('/alive')
@@ -983,6 +1501,15 @@ if __name__ == '__main__':

        if not os.path.exists(full_path):
            # if user just removed a folder, prevent server error 500 in UI
+            logger.debug(
+                f'Directory does not exist for listing: {full_path}',
+                extra={
+                    'operation': 'list_files',
+                    'path': path,
+                    'full_path': full_path,
+                    'exists': False,
+                },
+            )
            return JSONResponse(content=[])

        try:
@@ -1017,10 +1544,32 @@ if __name__ == '__main__':

            # Combine sorted directories and files
            sorted_entries = directories + files
+
+            logger.debug(
+                f'Successfully listed files in: {full_path}',
+                extra={
+                    'operation': 'list_files',
+                    'path': path,
+                    'full_path': full_path,
+                    'total_entries': len(sorted_entries),
+                    'directories': len(directories),
+                    'files': len(files),
+                },
+            )
+
            return JSONResponse(content=sorted_entries)

        except Exception as e:
-            logger.error(f'Error listing files: {e}')
+            logger.error(
+                f'Error listing files in {full_path}: {e}',
+                extra={
+                    'operation': 'list_files',
+                    'path': path,
+                    'full_path': full_path,
+                    'error': str(e),
+                    'error_type': type(e).__name__,
+                },
+            )
            return JSONResponse(content=[])

    logger.debug(f'Starting action execution API on port {args.port}')
--- a/openhands/runtime/base.py
+++ b/openhands/runtime/base.py
@@ -411,7 +411,6 @@ class Runtime(FileEditRuntimeMixin):
        provider_domains = {
            ProviderType.GITHUB: 'github.com',
            ProviderType.GITLAB: 'gitlab.com',
-            ProviderType.AZURE_DEVOPS: 'dev.azure.com',
        }

        domain = provider_domains[provider]
@@ -426,45 +425,10 @@ class Runtime(FileEditRuntimeMixin):
            if git_token:
                if provider == ProviderType.GITLAB:
                    remote_repo_url = f'https://oauth2:{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
-                elif provider == ProviderType.AZURE_DEVOPS:
-                    # Azure DevOps URL format: https://token@dev.azure.com/organization/project/_git/repository
-                    # Extract organization from domain if it's a full URL
-                    if domain.startswith('https://dev.azure.com/'):
-                        org_name = domain.replace('https://dev.azure.com/', '').rstrip(
-                            '/'
-                        )
-                        base_domain = 'dev.azure.com'
-                    else:
-                        # If domain is just the host, we need to get organization from the token host
-                        token_host = git_provider_tokens[provider].host
-                        if token_host and token_host.startswith(
-                            'https://dev.azure.com/'
-                        ):
-                            org_name = token_host.replace(
-                                'https://dev.azure.com/', ''
-                            ).rstrip('/')
-                            base_domain = 'dev.azure.com'
-                        else:
-                            # Fallback: assume domain contains the organization
-                            org_name = domain.replace('dev.azure.com', '').strip('/')
-                            base_domain = 'dev.azure.com'
-
-                    # Parse project/repo from selected_repository
-                    repo_parts = selected_repository.split('/')
-                    if len(repo_parts) == 2:
-                        project_name, repo_name = repo_parts
-                        remote_repo_url = f'https://{git_token.get_secret_value()}@{base_domain}/{org_name}/{project_name}/_git/{repo_name}'
-                    else:
-                        # Fallback to original format if parsing fails
-                        remote_repo_url = f'https://{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
                else:
                    remote_repo_url = f'https://{git_token.get_secret_value()}@{domain}/{selected_repository}.git'
            else:
-                if provider == ProviderType.AZURE_DEVOPS:
-                    # Public Azure DevOps repos (rare, but handle gracefully)
-                    remote_repo_url = f'https://{domain}/{selected_repository}.git'
-                else:
-                    remote_repo_url = f'https://{domain}/{selected_repository}.git'
+                remote_repo_url = f'https://{domain}/{selected_repository}.git'
        else:
            remote_repo_url = f'https://{domain}/{selected_repository}.git'

@@ -683,8 +647,6 @@ fi
            provider = ProviderType.GITHUB
        elif 'gitlab.com' in repo_path:
            provider = ProviderType.GITLAB
-        elif 'dev.azure.com' in repo_path:
-            provider = ProviderType.AZURE_DEVOPS

        # Add authentication if available
        if (
@@ -696,8 +658,6 @@ fi
            if git_token:
                if provider == ProviderType.GITLAB:
                    remote_url = f'https://oauth2:{git_token.get_secret_value()}@{repo_path.replace("gitlab.com/", "")}.git'
-                elif provider == ProviderType.AZURE_DEVOPS:
-                    remote_url = f'https://{git_token.get_secret_value()}@{repo_path.replace("dev.azure.com/", "")}.git'
                else:
                    remote_url = f'https://{git_token.get_secret_value()}@{repo_path.replace("github.com/", "")}.git'

@@ -713,7 +673,7 @@ fi
        the microagents from the ./microagents/ folder.

        Args:
-            selected_repository: The repository path (e.g., "github.com/acme-co/api" or "acme-co/api")
+            selected_repository: The repository path (e.g., "github.com/acme-co/api")

        Returns:
            A list of loaded microagents from the org/user level repository
@@ -724,35 +684,14 @@ fi
        if len(repo_parts) < 2:
            return loaded_microagents

-        # Determine the provider and domain
-        provider_domains = {
-            ProviderType.GITHUB: 'github.com',
-            ProviderType.GITLAB: 'gitlab.com',
-            ProviderType.AZURE_DEVOPS: 'dev.azure.com',
-        }
-
-        # First, try to extract domain from repository name if it includes one
-        if len(repo_parts) > 2:
-            domain = repo_parts[0]
-        else:
-            # Repository name doesn't include domain (e.g., "org/repo")
-            # Try to determine provider from available tokens
-            domain = 'github.com'  # Default fallback
-
-            if self.git_provider_tokens:
-                # If we only have one provider token, use that
-                if len(self.git_provider_tokens) == 1:
-                    provider = next(iter(self.git_provider_tokens))
-                    domain = provider_domains.get(provider, 'github.com')
-                else:
-                    # Multiple providers - would need additional logic to determine which one
-                    # For now, default to GitHub
-                    pass
-
+        # Extract the domain and org/user name
+        domain = repo_parts[0] if len(repo_parts) > 2 else 'github.com'
        org_name = repo_parts[-2]

        # Construct the org-level .openhands repo path
        org_openhands_repo = f'{domain}/{org_name}/.openhands'
+        if domain not in org_openhands_repo:
+            org_openhands_repo = f'github.com/{org_openhands_repo}'

        self.log(
            'info',
@@ -767,7 +706,9 @@ fi
            # Get authenticated URL and do a shallow clone (--depth 1) for efficiency
            remote_url = self._get_authenticated_git_url(org_openhands_repo)

-            clone_cmd = f'git clone --depth 1 {remote_url} {org_repo_dir}'
+            clone_cmd = (
+                f'GIT_TERMINAL_PROMPT=0 git clone --depth 1 {remote_url} {org_repo_dir}'
+            )

            action = CmdRunAction(command=clone_cmd)
            obs = self.run_action(action)
--- a/openhands/runtime/impl/daytona/daytona_runtime.py
+++ b/openhands/runtime/impl/daytona/daytona_runtime.py
@@ -13,6 +13,7 @@ from daytona_sdk import (

 from openhands.core.config.openhands_config import OpenHandsConfig
 from openhands.events.stream import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -42,6 +43,8 @@ class DaytonaRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.daytona_api_key, 'Daytona API key is required'

@@ -74,6 +77,8 @@ class DaytonaRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

    def _get_workspace(self) -> Workspace | None:
--- a/openhands/runtime/impl/docker/docker_runtime.py
+++ b/openhands/runtime/impl/docker/docker_runtime.py
@@ -17,6 +17,7 @@ from openhands.core.exceptions import (
 from openhands.core.logger import DEBUG, DEBUG_RUNTIME
 from openhands.core.logger import openhands_logger as logger
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.builder import DockerRuntimeBuilder
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
@@ -86,6 +87,8 @@ class DockerRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
        main_module: str = DEFAULT_MAIN_MODULE,
    ):
        if not DockerRuntime._shutdown_listener_id:
@@ -132,6 +135,8 @@ class DockerRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

        # Log runtime_extra_deps after base class initialization so self.sid is available
--- a/openhands/runtime/impl/e2b/e2b_runtime.py
+++ b/openhands/runtime/impl/e2b/e2b_runtime.py
@@ -12,29 +12,42 @@ from openhands.events.observation import (
    Observation,
 )
 from openhands.events.stream import EventStream
-from openhands.runtime.base import Runtime
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
+from openhands.runtime.impl.action_execution.action_execution_client import (
+    ActionExecutionClient,
+)
 from openhands.runtime.impl.e2b.filestore import E2BFileStore
 from openhands.runtime.impl.e2b.sandbox import E2BSandbox
 from openhands.runtime.plugins import PluginRequirement
 from openhands.runtime.utils.files import insert_lines, read_lines


-class E2BRuntime(Runtime):
+class E2BRuntime(ActionExecutionClient):
    def __init__(
        self,
        config: OpenHandsConfig,
        event_stream: EventStream,
        sid: str = 'default',
        plugins: list[PluginRequirement] | None = None,
-        sandbox: E2BSandbox | None = None,
+        env_vars: dict[str, str] | None = None,
        status_callback: Callable | None = None,
+        attach_to_existing: bool = False,
+        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
+        sandbox: E2BSandbox | None = None,
    ):
        super().__init__(
            config,
            event_stream,
            sid,
            plugins,
-            status_callback=status_callback,
+            env_vars,
+            status_callback,
+            attach_to_existing,
+            headless_mode,
+            user_id,
+            git_provider_tokens,
        )
        if sandbox is None:
            self.sandbox = E2BSandbox()
--- a/openhands/runtime/impl/local/local_runtime.py
+++ b/openhands/runtime/impl/local/local_runtime.py
@@ -25,6 +25,7 @@ from openhands.events.observation import (
    Observation,
 )
 from openhands.events.serialization import event_to_dict, observation_from_dict
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -145,6 +146,8 @@ class LocalRuntime(ActionExecutionClient):
        status_callback: Callable[[str, str, str], None] | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ) -> None:
        self.is_windows = sys.platform == 'win32'
        if self.is_windows:
@@ -194,6 +197,8 @@ class LocalRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

        # If there is an API key in the environment we use this in requests to the runtime
--- a/openhands/runtime/impl/modal/modal_runtime.py
+++ b/openhands/runtime/impl/modal/modal_runtime.py
@@ -9,6 +9,7 @@ import tenacity

 from openhands.core.config import OpenHandsConfig
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -53,6 +54,8 @@ class ModalRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.modal_api_token_id, 'Modal API token id is required'
        assert config.modal_api_token_secret, 'Modal API token secret is required'
@@ -100,6 +103,8 @@ class ModalRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )

    async def connect(self):
--- a/openhands/runtime/impl/remote/remote_runtime.py
+++ b/openhands/runtime/impl/remote/remote_runtime.py
@@ -140,7 +140,6 @@ class RemoteRuntime(ActionExecutionClient):
            )
        else:
            self.log('info', 'No existing runtime found, starting a new one')
-            self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
            if self.config.sandbox.runtime_container_image is None:
                self.log(
                    'info',
@@ -160,7 +159,6 @@ class RemoteRuntime(ActionExecutionClient):
        assert self.runtime_url is not None, (
            'Runtime URL is not set. This should never happen.'
        )
-        self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
        if not self.attach_to_existing:
            self.log('info', 'Waiting for runtime to be alive...')
        self._wait_until_alive()
@@ -221,6 +219,7 @@ class RemoteRuntime(ActionExecutionClient):

    def _build_runtime(self) -> None:
        self.log('debug', f'Building RemoteRuntime config:\n{self.config}')
+        self.set_runtime_status(RuntimeStatus.BUILDING_RUNTIME)
        response = self._send_runtime_api_request(
            'GET',
            f'{self.config.sandbox.remote_runtime_api_url}/registry_prefix',
@@ -265,6 +264,7 @@ class RemoteRuntime(ActionExecutionClient):

    def _start_runtime(self) -> None:
        # Prepare the request body for the /start endpoint
+        self.set_runtime_status(RuntimeStatus.STARTING_RUNTIME)
        command = self.get_action_execution_server_startup_command()
        environment: dict[str, str] = {}
        if self.config.debug or os.environ.get('DEBUG', 'false').lower() == 'true':
--- a/openhands/runtime/impl/runloop/runloop_runtime.py
+++ b/openhands/runtime/impl/runloop/runloop_runtime.py
@@ -9,6 +9,7 @@ from runloop_api_client.types.shared_params import LaunchParameters
 from openhands.core.config import OpenHandsConfig
 from openhands.core.logger import openhands_logger as logger
 from openhands.events import EventStream
+from openhands.integrations.provider import PROVIDER_TOKEN_TYPE
 from openhands.runtime.impl.action_execution.action_execution_client import (
    ActionExecutionClient,
 )
@@ -36,6 +37,8 @@ class RunloopRuntime(ActionExecutionClient):
        status_callback: Callable | None = None,
        attach_to_existing: bool = False,
        headless_mode: bool = True,
+        user_id: str | None = None,
+        git_provider_tokens: PROVIDER_TOKEN_TYPE | None = None,
    ):
        assert config.runloop_api_key is not None, 'Runloop API key is required'
        self.devbox: DevboxView | None = None
@@ -53,6 +56,8 @@ class RunloopRuntime(ActionExecutionClient):
            status_callback,
            attach_to_existing,
            headless_mode,
+            user_id,
+            git_provider_tokens,
        )
        # Buffer for container logs
        self._vscode_url: str | None = None
--- a/openhands/runtime/utils/edit.py
+++ b/openhands/runtime/utils/edit.py
@@ -305,7 +305,6 @@ class FileEditRuntimeMixin(FileEditRuntimeInterface):
            return ErrorObservation(error_msg)

        content_to_edit = '\n'.join(old_file_lines[start_idx:end_idx])
-        self.draft_editor_llm.reset()
        _edited_content = get_new_file_contents(
            self.draft_editor_llm, content_to_edit, action.content
        )
--- a/openhands/runtime/utils/runtime_build.py
+++ b/openhands/runtime/utils/runtime_build.py
@@ -303,18 +303,21 @@ def truncate_hash(hash: str) -> str:

 def get_hash_for_lock_files(base_image: str) -> str:
    openhands_source_dir = Path(openhands.__file__).parent
+    logger.info(f'Calculating hash for lock files with base image: {base_image}')
    md5 = hashlib.md5()
    md5.update(base_image.encode())
    for file in ['pyproject.toml', 'poetry.lock']:
        src = Path(openhands_source_dir, file)
        if not src.exists():
            src = Path(openhands_source_dir.parent, file)
+        logger.info(f'Reading lock file: {src}')
        with open(src, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b''):
                md5.update(chunk)
    # We get away with truncation because we want something that is unique
    # rather than something that is cryptographically secure
    result = truncate_hash(md5.hexdigest())
+    logger.info(f'Hash for docker build directory (lock files): {result}')
    return result


@@ -324,6 +327,7 @@ def get_tag_for_versioned_image(base_image: str) -> str:

 def get_hash_for_source_files() -> str:
    openhands_source_dir = Path(openhands.__file__).parent
+    logger.info(f'Calculating hash for source directory: {openhands_source_dir}')
    dir_hash = dirhash(
        openhands_source_dir,
        'md5',
@@ -336,6 +340,7 @@ def get_hash_for_source_files() -> str:
    # We get away with truncation because we want something that is unique
    # rather than something that is cryptographically secure
    result = truncate_hash(dir_hash)
+    logger.info(f'Hash for docker build directory (source files): {result}')
    return result


--- a/openhands/server/routes/mcp.py
+++ b/openhands/server/routes/mcp.py
@@ -8,9 +8,6 @@ from fastmcp.server.dependencies import get_http_request
 from pydantic import Field

 from openhands.core.logger import openhands_logger as logger
-from openhands.integrations.azure_devops.azure_devops_service import (
-    AzureDevOpsServiceImpl,
-)
 from openhands.integrations.github.github_service import GithubServiceImpl
 from openhands.integrations.gitlab.gitlab_service import GitLabServiceImpl
 from openhands.integrations.provider import ProviderToken
@@ -30,7 +27,7 @@ mcp_server = FastMCP(
 )

 HOST = f'https://{os.getenv("WEB_HOST", "app.all-hands.dev").strip()}'
-CONVO_URL = HOST + '/{}'
+CONVO_URL = HOST + '/conversations/{}'


 async def get_convo_link(service: GitService, conversation_id: str, body: str) -> str:
@@ -209,65 +206,3 @@ async def create_mr(
        raise ToolError(str(error))

    return response
-
-
-@mcp_server.tool()
-async def create_azure_devops_pr(
-    repo_name: Annotated[
-        str, Field(description='Azure DevOps repository ({{project}}/{{repo}})')
-    ],
-    source_branch: Annotated[str, Field(description='Source branch on repo')],
-    target_branch: Annotated[str, Field(description='Target branch on repo')],
-    title: Annotated[str, Field(description='PR Title')],
-    body: Annotated[str | None, Field(description='PR body')],
-    draft: Annotated[bool, Field(description='Whether PR opened is a draft')] = True,
-) -> str:
-    """Open a PR in Azure DevOps"""
-
-    logger.info('Calling OpenHands MCP create_azure_devops_pr')
-
-    request = get_http_request()
-    headers = request.headers
-    conversation_id = headers.get('X-OpenHands-ServerConversation-ID', None)
-
-    provider_tokens = await get_provider_tokens(request)
-    access_token = await get_access_token(request)
-    user_id = await get_user_id(request)
-
-    azure_devops_token = (
-        provider_tokens.get(ProviderType.AZURE_DEVOPS, ProviderToken())
-        if provider_tokens
-        else ProviderToken()
-    )
-
-    azure_devops_service = AzureDevOpsServiceImpl(
-        user_id=azure_devops_token.user_id,
-        external_auth_id=user_id,
-        external_auth_token=access_token,
-        token=azure_devops_token.token,
-        base_domain=azure_devops_token.host,
-    )
-
-    try:
-        body = await get_convo_link(azure_devops_service, conversation_id, body or '')
-    except Exception as e:
-        logger.warning(f'Failed to append convo link: {e}')
-
-    try:
-        response = await azure_devops_service.create_pr(
-            repo_name=repo_name,
-            source_branch=source_branch,
-            target_branch=target_branch,
-            title=title,
-            body=body,
-            draft=draft,
-        )
-
-        if conversation_id:
-            await save_pr_metadata(user_id, conversation_id, response)
-
-    except Exception as e:
-        error = f'Error creating Azure DevOps pull request: {e}'
-        raise ToolError(str(error))
-
-    return response
--- a/openhands/server/routes/secrets.py
+++ b/openhands/server/routes/secrets.py
@@ -75,8 +75,7 @@ async def check_provider_tokens(
    if incoming_provider_tokens.provider_tokens:
        # Determine whether tokens are valid
        for token_type, token_value in incoming_provider_tokens.provider_tokens.items():
-            # Only validate if token is not empty
-            if token_value.token and token_value.token.get_secret_value():
+            if token_value.token:
                confirmed_token_type = await validate_provider_token(
                    token_value.token, token_value.host
                )  # FE always sends latest host
@@ -91,7 +90,6 @@ async def check_provider_tokens(
                existing_token
                and (existing_token.host != token_value.host)
                and existing_token.token
-                and existing_token.token.get_secret_value()
            ):
                confirmed_token_type = await validate_provider_token(
                    existing_token.token, token_value.host
@@ -131,23 +129,10 @@ async def store_provider_tokens(

            # Merge incoming settings store with the existing one
            for provider, token_value in list(provider_info.provider_tokens.items()):
-                # If token is empty, keep the existing token if available
-                if provider in existing_providers and (
-                    not token_value.token or not token_value.token.get_secret_value()
-                ):
+                if provider in existing_providers and not token_value.token:
                    existing_token = user_secrets.provider_tokens.get(provider)
-                    if (
-                        existing_token
-                        and existing_token.token
-                        and existing_token.token.get_secret_value()
-                    ):
+                    if existing_token and existing_token.token:
                        provider_info.provider_tokens[provider] = existing_token
-                    # If both new and existing tokens are empty, skip this provider
-                    elif (
-                        not token_value.token
-                        or not token_value.token.get_secret_value()
-                    ):
-                        continue

                provider_info.provider_tokens[provider] = provider_info.provider_tokens[
                    provider
--- a/openhands/server/session/agent_session.py
+++ b/openhands/server/session/agent_session.py
@@ -158,7 +158,7 @@ class AgentSession:
            # NOTE: this needs to happen before controller is created
            # so MCP tools can be included into the SystemMessageAction
            if self.runtime and runtime_connected and agent.config.enable_mcp:
-                await add_mcp_tools_to_agent(agent, self.runtime, self.memory, config)
+                await add_mcp_tools_to_agent(agent, self.runtime, self.memory)

            if replay_json:
                initial_message = self._run_replay(
@@ -232,8 +232,7 @@ class AgentSession:
        if self.event_stream is not None:
            self.event_stream.close()
        if self.controller is not None:
-            end_state = self.controller.get_state()
-            end_state.save_to_session(self.sid, self.file_store, self.user_id)
+            self.controller.save_state()
            await self.controller.close()
        if self.runtime is not None:
            EXECUTOR.submit(self.runtime.close)
@@ -366,6 +365,7 @@ class AgentSession:
                headless_mode=False,
                attach_to_existing=False,
                env_vars=env_vars,
+                git_provider_tokens=git_provider_tokens,
            )

        # FIXME: this sleep is a terrible hack.
@@ -438,10 +438,12 @@ class AgentSession:
        initial_state = self._maybe_restore_state()
        controller = AgentController(
            sid=self.sid,
+            user_id=self.user_id,
+            file_store=self.file_store,
            event_stream=self.event_stream,
            agent=agent,
-            max_iterations=int(max_iterations),
-            max_budget_per_task=max_budget_per_task,
+            iteration_delta=int(max_iterations),
+            budget_per_task_delta=max_budget_per_task,
            agent_to_llm_config=agent_to_llm_config,
            agent_configs=agent_configs,
            confirmation_mode=confirmation_mode,
--- a/openhands/utils/conversation_summary.py
+++ b/openhands/utils/conversation_summary.py
@@ -95,7 +95,7 @@ async def auto_generate_title(

        # Find the first user message
        first_user_message = None
-        for event in event_stream.get_events():
+        for event in event_stream.search_events():
            if (
                event.source == EventSource.USER
                and isinstance(event, MessageAction)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
openhands	9e2f70e063	Add debug logging to runtime_build.py to help diagnose hash calculation issues	2025-07-09 15:27:43 +00:00
openhands	ec3864b641	Fix JSON logging tests to use environment-configured level key The tests were failing because they expected 'level' as the JSON log level key, but the environment was configured to use 'severity' via LOG_JSON_LEVEL_KEY. Updated all TestJsonOutput tests to use LOG_JSON_LEVEL_KEY instead of hardcoded 'level' to make them environment-agnostic and work correctly regardless of the LOG_JSON_LEVEL_KEY environment variable setting.	2025-06-26 23:28:49 +00:00
openhands	9d9f2bd8f2	Add comprehensive logging to Action Execution Server - Add structured logging to action execution flow with timing and metadata - Add metadata extraction functions for actions and observations - Add detailed logging to file operations (read, write, edit) - Add logging to command execution and IPython operations - Add logging to HTTP endpoints (execute_action, upload_file, download_files, list_files) - Exclude large content from logs while preserving useful metadata - Add error categorization and detailed failure logging - Include execution timing for performance monitoring This will help debug issues like files disappearing by providing comprehensive visibility into all file system operations and action executions while maintaining reasonable log sizes.	2025-06-17 18:59:22 +00:00
Robert Brennan	147ffb7e42	Suppress pydub warning about ffmpeg/avconv not found (#8940 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-17 14:44:32 -04:00
Tim O'Farrell	237037cee9	Fix remote runtime status (#9190 )	2025-06-18 02:34:41 +08:00
Xingyao Wang	567af43a71	Fix deprecation warning: Replace get_events with search_events (#9188 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-18 00:54:29 +08:00
Rohit Malhotra	65071550b6	Fix grammar issues in Slack documentation (#9180 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-17 23:53:55 +08:00
Alexander	d81d2f62cb	docs: local serving with ollama documented (#8807 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-17 07:18:18 -04:00
Ryan H. Tran	ddaa186971	[GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057 )	2025-06-17 13:16:50 +07:00
Graham Neubig	e6e0f4673f	docs: Add "Running OpenHands with OpenHands" section for recursive development (#9146 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 20:57:52 -04:00
Graham Neubig	7d78b65a1a	docs: Add Python version requirement to CLI documentation (#9164 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 20:14:10 +00:00
Rohit Malhotra	1f90086030	(Hotfix): Slack app installation flow (#9162 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 19:33:43 +00:00
Xingyao Wang	2c4ecd02f7	feat(frontend): add user feedback Likert scale for agent performance rating (only on OH Cloud) (#8992 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-06-16 19:26:24 +00:00
Rohit Malhotra	2fd1fdcd7e	[Refactor, Fix]: Agent controller state/metrics management (#9012 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 11:24:13 -04:00
Graham Neubig	cbe32a1a12	Fix bash timeout issue caused by interactive git clone prompts (#9148 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-16 08:39:28 -04:00
better629	432d8829dc	disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151 )	2025-06-16 11:03:17 +00:00
Graham Neubig	24f891687d	Fix CLI displaying claude-2 as default model for anthropic provider (#9101 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-15 21:21:33 -04:00
Graham Neubig	2d2ccf1329	Fix conversation URL format in pull request links (#9143 )	2025-06-15 15:41:08 -04:00
FT	e5bff91e8e	Fix Typo: Change "accurancy" to "accuracy" in Evaluation Benchmark Comments (#9139 )	2025-06-15 12:48:26 +00:00
Linghao Zhang	a93b0457c6	feat(eval): Support evaluation on SWE-bench-Live (#9137 )	2025-06-15 12:30:47 +00:00
Graham Neubig	98e0f5509c	Update CLI mode docs to accurately reflect settings workflow (#9134 )	2025-06-14 19:21:18 +00:00
kilavvy	4e99aabcb2	Minor Code Comment Corrections and Clarifications (#9129 )	2025-06-14 18:57:14 +00:00
				`@@ -1 +0,0 @@`
				`<svg height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="m22 18-5 4-8-3v3l-4.19-5.75 12.91 1.05v-10.96l4.28-.69zm-17.19-1.75v-7.29l12.91-2.62-7.12-4.34v2.84l-6.63 1.92-1.97 2.62v5.69z"/></svg>`