Add comprehensive unit tests for Ctrl+C behavior

- test_single_ctrl_c_stops_agent: Verifies first Ctrl+C stops agent gracefully with helpful message - test_double_ctrl_c_raises_keyboard_interrupt: Verifies second Ctrl+C within 2 seconds raises KeyboardInterrupt for CLI cleanup - test_ctrl_p_pauses_agent: Verifies Ctrl+P still pauses agent as expected Tests use proper mocking of prompt_toolkit's create_input, raw_mode, and attach context managers. All tests pass and validate the improved Ctrl+C behavior implementation. Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
Fix Ctrl+C behavior: use KeyboardInterrupt instead of signals
2026-04-29 03:00:45 -04:00 · 2025-06-28 16:17:48 +02:00 · 2025-06-28 16:02:22 +02:00 · 2025-06-28 15:41:02 +02:00 · 2025-06-28 15:24:03 +02:00 · 2025-06-28 14:46:54 +02:00
428 changed files with 18458 additions and 6115 deletions
--- a/.devcontainer/setup.sh
+++ b/.devcontainer/setup.sh
@@ -1,5 +1,9 @@
 #!/bin/bash

+# Mark the current repository as safe for Git to prevent "dubious ownership" errors,
+# which can occur in containerized environments when directory ownership doesn't match the current user.
+git config --global --add safe.directory "$(realpath .)"
+
 # Install `nc`
 sudo apt update && sudo apt install netcat -y

--- a/.github/ISSUE_TEMPLATE/bug_template.yml
+++ b/.github/ISSUE_TEMPLATE/bug_template.yml
@@ -45,6 +45,13 @@ body:
      description: What version of OpenHands are you using?
      placeholder: ex. 0.9.8, main, etc.

+  - type: input
+    id: model-name
+    attributes:
+      label: Model Name
+      description: What model are you using?
+      placeholder: ex. gpt-4o, claude-3-5-sonnet, openrouter/deepseek-r1, etc.
+
  - type: dropdown
    id: os
    attributes:
--- a/.github/workflows/ghcr-build.yml
+++ b/.github/workflows/ghcr-build.yml
@@ -40,9 +40,7 @@ jobs:
          # Only build nikolaik on PRs, otherwise build both nikolaik and ubuntu.
          if [[ "$GITHUB_EVENT_NAME" == "pull_request" ]]; then
            json=$(jq -n -c '[
-                { image: "nikolaik/python-nodejs:python3.12-nodejs22", tag: "nikolaik" },
-                { image: "ubuntu:24.04", tag: "ubuntu" }
-
+                { image: "nikolaik/python-nodejs:python3.12-nodejs22", tag: "nikolaik" }
              ]')
          else
            json=$(jq -n -c '[
--- a/.github/workflows/lint-fix.yml
+++ b/.github/workflows/lint-fix.yml
@@ -74,7 +74,7 @@ jobs:
      - name: Fix python lint issues
        run: |
          # Run all pre-commit hooks and continue even if they modify files (exit code 1)
-          pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --files openhands/**/* evaluation/**/* tests/**/* || true
+          pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --all-files || true

      # Commit and push changes if any
      - name: Check for changes
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -53,7 +53,7 @@ jobs:
      - name: Install pre-commit
        run: pip install pre-commit==3.7.0
      - name: Run pre-commit hooks
-        run: pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
+        run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml

  # Check version consistency across documentation
  check-version-consistency:
--- a/.github/workflows/py-unit-tests.yml
+++ b/.github/workflows/py-unit-tests.yml
@@ -81,4 +81,3 @@ jobs:
        env:
          TEST_RUNTIME: local
          DEBUG: "1"
-
--- a/.openhands/microagents/glossary.md
+++ b/.openhands/microagents/glossary.md
@@ -121,7 +121,7 @@ A specialized prompt that enhances OpenHands with domain-specific knowledge, rep
 A central repository of available microagents and their configurations.

 #### Public Microagent
-A general-purpose microagent available to all OpenHands users, triggered by specific keywords.
+A general-purpose microagent available to all OpenHands users, triggered by specific keywords. Located in `microagents/`.

 #### Repository Microagent
 A type of microagent that provides repository-specific context and guidelines, stored in the `.openhands/microagents/` directory.
--- a/.openhands/microagents/repo.md
+++ b/.openhands/microagents/repo.md
@@ -5,6 +5,14 @@ This repository contains the code for OpenHands, an automated AI software engine
 To set up the entire repo, including frontend and backend, run `make build`.
 You don't need to do this unless the user asks you to, or if you're trying to run the entire application.

+## Running OpenHands with OpenHands:
+To run the full application to debug issues:
+```bash
+export INSTALL_DOCKER=0
+export RUNTIME=local
+make build && make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0 &> /tmp/openhands-log.txt &
+```
+
 IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.

 Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
@@ -60,6 +68,29 @@ If you are starting a pull request (PR), please follow the template in `.github/

 These details may or may not be useful for your current task.

+### Microagents
+
+Microagents are specialized prompts that enhance OpenHands with domain-specific knowledge and task-specific workflows. They are Markdown files that can include frontmatter for configuration.
+
+#### Types:
+- **Public Microagents**: Located in `microagents/`, available to all users
+- **Repository Microagents**: Located in `.openhands/microagents/`, specific to this repository
+
+#### Loading Behavior:
+- **Without frontmatter**: Always loaded into LLM context
+- **With triggers in frontmatter**: Only loaded when user's message matches the specified trigger keywords
+
+#### Structure:
+```yaml
+---
+triggers:
+- keyword1
+- keyword2
+---
+# Microagent Content
+Your specialized knowledge and instructions here...
+```
+
 ### Frontend

 #### Action Handling:
--- a/Development.md
+++ b/Development.md
@@ -103,6 +103,29 @@ components or interface enhancements.
  make start-frontend
  ```

+### 5. Running OpenHands with OpenHands
+
+You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
+
+#### Quick Start
+
+1. **Build and run OpenHands:**
+   ```bash
+   export INSTALL_DOCKER=0
+   export RUNTIME=local
+   make build && make run
+   ```
+
+2. **Access the interface:**
+   - Local development: http://localhost:3001
+   - Remote/cloud environments: Use the appropriate external URL
+
+3. **Configure for external access (if needed):**
+   ```bash
+   # For external access (e.g., cloud environments)
+   make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
+   ```
+
 ### 6. LLM Debugging

 If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
@@ -136,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
 To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
 container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.

-Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.43-nikolaik`
+Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.47-nikolaik`

 ## Develop inside Docker container

--- a/38
+++ b/38
@@ -12,6 +12,7 @@ DEFAULT_MODEL = "gpt-4o"
 CONFIG_FILE = config.toml
 PRE_COMMIT_CONFIG_PATH = "./dev_config/python/.pre-commit-config.yaml"
 PYTHON_VERSION = 3.12
+KIND_CLUSTER_NAME = "local-hands"

 # ANSI color codes
 GREEN=$(shell tput -Txterm setaf 2)
@@ -189,7 +190,7 @@ install-pre-commit-hooks:

 lint-backend:
 	@echo "$(YELLOW)Running linters...$(RESET)"
-	@poetry run pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)
+	@poetry run pre-commit run --all-files --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)

 lint-frontend:
 	@echo "$(YELLOW)Running linters for frontend...$(RESET)"
@@ -199,6 +200,40 @@ lint:
 	@$(MAKE) -s lint-frontend
 	@$(MAKE) -s lint-backend

+kind:
+	@echo "$(YELLOW)Checking if kind is installed...$(RESET)"
+	@if ! command -v kind > /dev/null; then \
+		echo "$(RED)kind is not installed. Please install kind with `brew install kind` to continue$(RESET)"; \
+		exit 1; \
+	else \
+		echo "$(BLUE)kind $(shell kind version) is already installed.$(RESET)"; \
+	fi
+	@echo "$(YELLOW)Checking if kind cluster '$(KIND_CLUSTER_NAME)' already exists...$(RESET)"
+	@if kind get clusters | grep -q "^$(KIND_CLUSTER_NAME)$$"; then \
+		echo "$(BLUE)Kind cluster '$(KIND_CLUSTER_NAME)' already exists.$(RESET)"; \
+		kubectl config use-context kind-$(KIND_CLUSTER_NAME); \
+	else \
+		echo "$(YELLOW)Creating kind cluster '$(KIND_CLUSTER_NAME)'...$(RESET)"; \
+		kind create cluster --name $(KIND_CLUSTER_NAME) --config kind/cluster.yaml; \
+	fi
+	@echo "$(YELLOW)Checking if mirrord is installed...$(RESET)"
+	@if ! command -v mirrord > /dev/null; then \
+		echo "$(RED)mirrord is not installed. Please install mirrord with `brew install metalbear-co/mirrord/mirrord` to continue$(RESET)"; \
+		exit 1; \
+	else \
+		echo "$(BLUE)mirrord $(shell mirrord --version) is already installed.$(RESET)"; \
+	fi
+	@echo "$(YELLOW)Installing k8s mirrord resources...$(RESET)"
+	@kubectl apply -f kind/manifests
+	@echo "$(GREEN)Mirrord resources installed successfully.$(RESET)"
+	@echo "$(YELLOW)Waiting for Mirrord pod to be ready.$(RESET)"
+	@sleep 5
+	@kubectl wait --for=condition=Available deployment/ubuntu-dev
+	@echo "$(YELLOW)Waiting for Nginx to be ready.$(RESET)"
+	@kubectl -n ingress-nginx wait --for=condition=Available deployment/ingress-nginx-controller
+	@echo "$(YELLOW)Running make run inside of mirrord.$(RESET)"
+	@mirrord exec --target deployment/ubuntu-dev -- make run
+
 test-frontend:
 	@echo "$(YELLOW)Running tests for frontend...$(RESET)"
 	@cd frontend && npm run test
@@ -333,3 +368,4 @@ help:

 # Phony targets
 .PHONY: build check-dependencies check-system check-python check-npm check-nodejs check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint-backend lint-frontend lint test-frontend test build-frontend start-backend start-frontend _run_setup run run-wsl setup-config setup-config-prompts setup-config-basic openhands-cloud-run docker-dev docker-run clean help
+.PHONY: kind
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
  <a href="https://github.com/All-Hands-AI/OpenHands/stargazers"><img src="https://img.shields.io/github/stars/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Stargazers"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
  <br/>
-  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community"></a>
+  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community"></a>
  <a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Join our Discord community"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="Credits"></a>
  <br/>
@@ -48,7 +48,7 @@ Learn more at [docs.all-hands.dev](https://docs.all-hands.dev), or [sign up for

 ## ☁️ OpenHands Cloud
 The easiest way to get started with OpenHands is on [OpenHands Cloud](https://app.all-hands.dev),
-which comes with $50 in free credits for new users.
+which comes with $20 in free credits for new users.

 ## 💻 Running OpenHands Locally

@@ -62,19 +62,21 @@ system requirements and more information.


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.47
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!

 When you open the application, you'll be asked to choose an LLM provider and add an API key.
@@ -83,15 +85,14 @@ works best, but you have [many options](https://docs.all-hands.dev/usage/llms).

 ## 💡 Other ways to run OpenHands

-> [!CAUTION]
+> [!WARNING]
 > OpenHands is meant to be run by a single user on their local workstation.
 > It is not appropriate for multi-tenant deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability.
 >
-> If you're interested in running OpenHands in a multi-tenant environment, please
-> [get in touch with us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
-> for advanced deployment options.
+> If you're interested in running OpenHands in a multi-tenant environment, check out the source-available, commercially-licensed
+> [OpenHands Cloud Helm Chart](https://github.com/all-Hands-AI/OpenHands-cloud)

-You can also [connect OpenHands to your local filesystem](https://docs.all-hands.dev/usage/runtimes/docker#connecting-to-your-filesystem),
+You can [connect OpenHands to your local filesystem](https://docs.all-hands.dev/usage/runtimes/docker#connecting-to-your-filesystem),
 run OpenHands in a scriptable [headless mode](https://docs.all-hands.dev/usage/how-to/headless-mode),
 interact with it via a [friendly CLI](https://docs.all-hands.dev/usage/how-to/cli-mode),
 or run it on tagged issues with [a github action](https://docs.all-hands.dev/usage/how-to/github-action).
@@ -116,7 +117,7 @@ troubleshooting resources, and advanced configuration options.
 OpenHands is a community-driven project, and we welcome contributions from everyone. We do most of our communication
 through Slack, so this is the best place to start, but we also are happy to have you contact us on Discord or Github:

- [Join our Slack workspace](https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A) - Here we talk about research, architecture, and future development.
+- [Join our Slack workspace](https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA) - Here we talk about research, architecture, and future development.
 - [Join our Discord server](https://discord.gg/ESHStjSjD4) - This is a community-run server for general discussion, questions, and feedback.
 - [Read or post Github Issues](https://github.com/All-Hands-AI/OpenHands/issues) - Check out the issues we're working on, or add your own ideas.

@@ -145,13 +146,12 @@ For a list of open source projects and licenses used in OpenHands, please see ou
 ## 📚 Cite

 ```
-@misc{openhands,
-      title={{OpenHands: An Open Platform for AI Software Developers as Generalist Agents}},
-      author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
-      year={2024},
-      eprint={2407.16741},
-      archivePrefix={arXiv},
-      primaryClass={cs.SE},
-      url={https://arxiv.org/abs/2407.16741},
+@inproceedings{
+  wang2025openhands,
+  title={OpenHands: An Open Platform for {AI} Software Developers as Generalist Agents},
+  author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
+  booktitle={The Thirteenth International Conference on Learning Representations},
+  year={2025},
+  url={https://openreview.net/forum?id=OJd3ayDDoF}
 }
 ```
--- a/README_CN.md
+++ b/README_CN.md
@@ -12,7 +12,7 @@
  <a href="https://github.com/All-Hands-AI/OpenHands/stargazers"><img src="https://img.shields.io/github/stars/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Stargazers"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
  <br/>
-  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="加入我们的Slack社区"></a>
+  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="加入我们的Slack社区"></a>
  <a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="加入我们的Discord社区"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="致谢"></a>
  <br/>
@@ -51,19 +51,21 @@ OpenHands也可以使用Docker在本地系统上运行。


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.47
 ```

+> **注意**: 如果您在0.44版本之前使用过OpenHands，您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
+
 您将在[http://localhost:3000](http://localhost:3000)找到运行中的OpenHands！

 打开应用程序时，您将被要求选择一个LLM提供商并添加API密钥。
@@ -105,7 +107,7 @@ docker run -it --rm --pull=always \
 OpenHands是一个社区驱动的项目，我们欢迎每个人的贡献。我们大部分沟通
 通过Slack进行，因此这是开始的最佳场所，但我们也很乐意您通过Discord或Github与我们联系：

- [加入我们的Slack工作空间](https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A) - 这里我们讨论研究、架构和未来发展。
+- [加入我们的Slack工作空间](https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA) - 这里我们讨论研究、架构和未来发展。
 - [加入我们的Discord服务器](https://discord.gg/ESHStjSjD4) - 这是一个社区运营的服务器，用于一般讨论、问题和反馈。
 - [阅读或发布Github问题](https://github.com/All-Hands-AI/OpenHands/issues) - 查看我们正在处理的问题，或添加您自己的想法。

--- a/README_JA.md
+++ b/README_JA.md
@@ -0,0 +1,60 @@
+<a name="readme-top"></a>
+
+<div align="center">
+  <img src="./docs/static/img/logo.png" alt="Logo" width="200">
+  <h1 align="center">OpenHands: コードを減らして、もっと作ろう</h1>
+</div>
+
+<div align="center">
+  <a href="https://github.com/All-Hands-AI/OpenHands/graphs/contributors"><img src="https://img.shields.io/github/contributors/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Contributors"></a>
+  <a href="https://github.com/All-Hands-AI/OpenHands/stargazers"><img src="https://img.shields.io/github/stars/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Stargazers"></a>
+  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
+  <br/>
+  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Slackコミュニティに参加"></a>
+  <a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Discordコミュニティに参加"></a>
+  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="クレジット"></a>
+  <br/>
+  <a href="https://docs.all-hands.dev/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="ドキュメントを見る"></a>
+  <a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Arxiv論文"></a>
+  <a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0"><img src="https://img.shields.io/badge/Benchmark%20score-000?logoColor=FFE165&logo=huggingface&style=for-the-badge" alt="評価ベンチマークスコア"></a>
+  <hr>
+</div>
+
+OpenHands（旧OpenDevin）へようこそ。これはAIが駆動するソフトウェア開発エージェントのプラットフォームです。
+
+OpenHandsのエージェントは人間の開発者ができることは何でもこなします。コードを修正し、コマンドを実行し、ウェブを閲覧し、APIを呼び出し、StackOverflowからコードスニペットをコピーすることさえできます。
+
+詳細は[docs.all-hands.dev](https://docs.all-hands.dev)をご覧いただくか、[OpenHands Cloud](https://app.all-hands.dev)に登録して始めましょう。
+
+> [!IMPORTANT]
+> 仕事でOpenHandsを使っていますか？ぜひお話を聞かせてください。[こちらの短いフォーム](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)にご記入いただき、Design Partnerプログラムにご参加ください。商用機能の早期アクセスや製品ロードマップへのフィードバックの機会を提供します。
+
+![アプリのスクリーンショット](./docs/static/img/screenshot.png)
+
+## ☁️ OpenHands Cloud
+OpenHandsを始める最も簡単な方法は[OpenHands Cloud](https://app.all-hands.dev)を利用することです。新規ユーザーには50ドル分の無料クレジットが付与されます。
+
+## 💻 OpenHandsをローカルで実行する
+
+OpenHandsはDockerを利用してローカル環境でも実行できます。システム要件や詳細については[Running OpenHands](https://docs.all-hands.dev/usage/installation)ガイドをご覧ください。
+
+> [!WARNING]
+> 公共ネットワークで実行していますか？[Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)を参照して、ネットワークバインディングの制限や追加のセキュリティ対策を実施してください。
+
+```bash
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik
+
+docker run -it --rm --pull=always \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
+    -e LOG_ALL_EVENTS=true \
+    -v /var/run/docker.sock:/var/run/docker.sock \
+    -v ~/.openhands:/.openhands \
+    -p 3000:3000 \
+    --add-host host.docker.internal:host-gateway \
+    --name openhands-app \
+    docker.all-hands.dev/all-hands-ai/openhands:0.47
+```
+
+**注**: バージョン0.44以前のOpenHandsを使用していた場合は、会話履歴を移行するために `mv ~/.openhands-state ~/.openhands` を実行してください。
+
+OpenHandsは[http://localhost:3000](http://localhost:3000)で起動します！
--- a/config.template.toml
+++ b/config.template.toml
@@ -10,18 +10,7 @@
 # General core configurations
 ##############################################################################
 [core]
-# API key for E2B
-#e2b_api_key = ""
-
-# API key for Modal
-#modal_api_token_id = ""
-#modal_api_token_secret = ""
-
-# API key for Daytona
-#daytona_api_key = ""
-
-# Daytona Target
-#daytona_target = ""
+# API keys and configuration for core services

 # Base path for the workspace
 #workspace_base = "./workspace"
@@ -64,7 +53,7 @@
 #max_budget_per_task = 0.0

 # Maximum number of iterations
-#max_iterations = 250
+#max_iterations = 500

 # Path to mount the workspace in the sandbox
 #workspace_mount_path_in_sandbox = "/workspace"
@@ -201,6 +190,27 @@ model = "gpt-4o"
 #native_tool_calling = None


+# Safety settings for models that support them (e.g., Mistral AI, Gemini)
+# Example for Mistral AI:
+# safety_settings = [
+#   { "category" = "hate", "threshold" = "low" },
+#   { "category" = "harassment", "threshold" = "low" },
+#   { "category" = "sexual", "threshold" = "low" },
+#   { "category" = "dangerous", "threshold" = "low" }
+# ]
+#
+# Example for Gemini:
+# safety_settings = [
+#   { "category" = "HARM_CATEGORY_HARASSMENT", "threshold" = "BLOCK_NONE" },
+#   { "category" = "HARM_CATEGORY_HATE_SPEECH", "threshold" = "BLOCK_NONE" },
+#   { "category" = "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold" = "BLOCK_NONE" },
+#   { "category" = "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold" = "BLOCK_NONE" }
+# ]
+#safety_settings = []
+
+[llm.draft_editor]
+# The number of times llm_editor tries to fix an error when editing.
+correct_num = 5

 [llm.gpt4o-mini]
 api_key = ""
@@ -250,6 +260,9 @@ enable_finish = true
 # length limit
 enable_history_truncation = true

+# Whether the condensation request tool is enabled
+enable_condensation_request = false
+
 [agent.RepoExplorerAgent]
 # Example: use a cheaper model for RepoExplorerAgent to reduce cost, especially
 # useful when an agent doesn't demand high quality but uses a lot of tokens
@@ -318,6 +331,9 @@ classpath = "my_package.my_module.MyCustomAgent"
 # Enable GPU support in the runtime
 #enable_gpu = false

+# When there are multiple cards, you can specify the GPU by ID
+#cuda_visible_devices = ''
+
 # Additional Docker runtime kwargs
 #docker_runtime_kwargs = {}

@@ -415,3 +431,47 @@ type = "noop"
 # Configuration for the evaluation, please refer to the specific evaluation
 # plugin for the available options
 ##############################################################################
+
+
+########################### Kubernetes #######################################
+# Kubernetes configuration when using the Kubernetes runtime
+##############################################################################
+[kubernetes]
+# The Kubernetes namespace to use for OpenHands resources
+#namespace = "default"
+
+# Domain for ingress resources
+#ingress_domain = "localhost"
+
+# Size of the persistent volume claim
+#pvc_storage_size = "2Gi"
+
+# Storage class for persistent volume claims
+#pvc_storage_class = "standard"
+
+# CPU request for runtime pods
+#resource_cpu_request = "1"
+
+# Memory request for runtime pods
+#resource_memory_request = "1Gi"
+
+# Memory limit for runtime pods
+#resource_memory_limit = "2Gi"
+
+# Optional name of image pull secret for private registries
+#image_pull_secret = ""
+
+# Optional name of TLS secret for ingress
+#ingress_tls_secret = ""
+
+# Optional node selector key for pod scheduling
+#node_selector_key = ""
+
+# Optional node selector value for pod scheduling
+#node_selector_val = ""
+
+# Optional YAML string defining pod tolerations
+#tolerations_yaml = ""
+
+# Run the runtime sandbox container in privileged mode for use with docker-in-docker
+#privileged = false
--- a/containers/app/Dockerfile
+++ b/containers/app/Dockerfile
@@ -44,7 +44,7 @@ ENV WORKSPACE_BASE=/opt/workspace_base
 ENV OPENHANDS_BUILD_VERSION=$OPENHANDS_BUILD_VERSION
 ENV SANDBOX_USER_ID=0
 ENV FILE_STORE=local
-ENV FILE_STORE_PATH=/.openhands-state
+ENV FILE_STORE_PATH=/.openhands
 RUN mkdir -p $FILE_STORE_PATH
 RUN mkdir -p $WORKSPACE_BASE

--- a/containers/dev/compose.yml
+++ b/containers/dev/compose.yml
@@ -12,7 +12,7 @@ services:
      - SANDBOX_API_HOSTNAME=host.docker.internal
      - DOCKER_HOST_ADDR=host.docker.internal
      #
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.43-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.47-nikolaik}
      - SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
--- a/dev_config/python/.pre-commit-config.yaml
+++ b/dev_config/python/.pre-commit-config.yaml
@@ -3,10 +3,11 @@ repos:
    rev: v5.0.0
    hooks:
      - id: trailing-whitespace
-        exclude: docs/modules/python
+        exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/)
      - id: end-of-file-fixer
-        exclude: docs/modules/python
+        exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/)
      - id: check-yaml
+        args: ["--allow-multiple-documents"]
      - id: debug-statements

  - repo: https://github.com/tox-dev/pyproject-fmt
@@ -27,17 +28,19 @@ repos:
        entry: ruff check --config dev_config/python/ruff.toml
        types_or: [python, pyi, jupyter]
        args: [--fix, --unsafe-fixes]
+        exclude: third_party/
      # Run the formatter.
      - id: ruff-format
        entry: ruff format --config dev_config/python/ruff.toml
        types_or: [python, pyi, jupyter]
+        exclude: third_party/

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.15.0
    hooks:
      - id: mypy
        additional_dependencies:
-          [types-requests, types-setuptools, types-pyyaml, types-toml, types-docker, lxml]
+          [types-requests, types-setuptools, types-pyyaml, types-toml, types-docker, pydantic, lxml]
        # To see gaps add `--html-report mypy-report/`
        entry: mypy --config-file dev_config/python/mypy.ini openhands/
        always_run: true
--- a/dev_config/python/mypy.ini
+++ b/dev_config/python/mypy.ini
@@ -7,3 +7,9 @@ warn_unreachable = True
 warn_redundant_casts = True
 no_implicit_optional = True
 strict_optional = True
+
+# Exclude third-party runtime directory from type checking
+exclude = third_party/
+
+[mypy-openhands.memory.condenser.impl.*]
+disable_error_code = override
--- a/dev_config/python/ruff.toml
+++ b/dev_config/python/ruff.toml
@@ -1,3 +1,6 @@
+# Exclude third-party runtime directory from linting
+exclude = ["third_party/"]
+
 [lint]
 select = [
    "E",
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -7,8 +7,8 @@ services:
    image: openhands:latest
    container_name: openhands-app-${DATE:-}
    environment:
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik}
-      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of openhands-state for this user
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik}
+      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
      - "3000:3000"
@@ -16,7 +16,7 @@ services:
      - "host.docker.internal:host-gateway"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
-      - ~/.openhands-state:/.openhands-state
+      - ~/.openhands:/.openhands
      - ${WORKSPACE_BASE:-$PWD/workspace}:/opt/workspace_base
    pull_policy: build
    stdin_open: true
--- a/docs/README.md
+++ b/docs/README.md
@@ -4,7 +4,7 @@
 npm install -g mint
 ```

-or 
+or

 ```
 yarn global add mint
@@ -14,4 +14,4 @@ yarn global add mint

 ```
 mint dev
-```
+```
--- a/docs/README_JA.md
+++ b/docs/README_JA.md
@@ -0,0 +1,17 @@
+# セットアップ
+
+```
+npm install -g mint
+```
+
+または
+
+```
+yarn global add mint
+```
+
+# プレビュー
+
+```
+mint dev
+```
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -26,6 +26,7 @@
          "usage/installation",
          "usage/getting-started",
          "usage/key-features",
+          "usage/faqs",
          {
            "group": "OpenHands Cloud",
            "pages": [
@@ -43,7 +44,7 @@
            ]
          },
          {
-            "group": "Running OpenHands on Your Own",
+            "group": "Run OpenHands on Your Own",
            "pages": [
              "usage/local-setup",
              "usage/how-to/gui-mode",
@@ -103,8 +104,9 @@
            ]
          },
          {
-            "group": "Customization",
+            "group": "Customizations & Settings",
            "pages": [
+              "usage/common-settings",
              "usage/prompting/repository",
              {
                "group": "Microagents",
@@ -149,6 +151,12 @@
          }
        ]
      },
+      {
+        "tab": "Success Stories",
+        "pages": [
+          "success-stories/index"
+        ]
+      },
      {
          "tab": "API Reference",
          "openapi": "/openapi.json"
@@ -188,7 +196,7 @@
  },
  "footer": {
    "socials": {
-      "slack": "https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A",
+      "slack": "https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA",
      "github": "https://github.com/All-Hands-AI/OpenHands",
      "discord": "https://discord.gg/ESHStjSjD4"
    }
--- a/docs/success-stories/index.mdx
+++ b/docs/success-stories/index.mdx
@@ -0,0 +1,217 @@
+---
+title: "Success Stories"
+description: "Real-world examples of what you can achieve with OpenHands"
+---
+
+Discover how developers and teams are using OpenHands to automate their software development workflows. From quick fixes to complex projects, see what's possible with AI-powered development assistance.
+
+Check out the [#success-stories](https://www.linen.dev/s/openhands/c/success-stories) channel on our Slack for more!
+
+<Update label="2025-06-13 OpenHands helps frontline support" description="@Joe Pelletier">
+
+## One of the cool things about OpenHands, and especially the Slack Integration, is the ability to empower folks who are on the ‘front lines’ with customers.
+
+For example, often times Support and Customer Success teams will field bug reports, doc questions, and other ‘nits’ from customers. They tend to have few options to deal with this, other than file a feedback ticket with product teams and hope it gets prioritized in an upcoming sprint.
+
+Instead, with tools like OpenHands and the Slack integration, they can request OpenHands to make fixes proactively and then have someone on the engineering team (like a lead engineer, a merge engineer, or even technical product manager) review the PR and approve it — thus reducing the cycle time for ‘quick wins’ from weeks to just a few hours.
+
+Here's how we do that with the OpenHands project:
+
+<iframe
+  width="560"
+  height="560"
+  src="https://www.linen.dev/s/openhands/t/29118545/seems-mcp-config-from-config-toml-is-being-overwritten-hence#629f8e2b-cde8-427e-920c-390557a06cc9"
+  frameborder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowfullscreen
+></iframe>
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29124350/one-of-the-cool-things-about-openhands-and-especially-the-sl#25029f37-7b0d-4535-9187-83b3e06a4011)
+
+</Update>
+
+
+<Update label="2025-06-13 Ask OpenHands to show me some love" description="@Graham Neubig">
+
+## Asked openhands to “show me some love” and...
+
+Asked openhands to “show me some love” and it coded up this app for me, actually kinda genuinely feel loved
+
+<video
+  controls
+  autoplay
+  className="w-full aspect-video"
+  src="/success-stories/stories/2025-06-13-show-love/v1.mp4"
+></video>
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100731/asked-openhands-to-show-me-some-love-and-it-coded-up-this-ap#1e08af6b-b7d5-4167-8a53-17e6806555e0)
+
+</Update>
+
+<Update label="2025-06-11 OpenHands does 100% of my infra IAM research for me" description="@Xingyao Wang">
+
+## Now, OpenHands does 100% of my infra IAM research for me
+
+Got an IAM error on GCP? Send a screenshot to OH... and it just works!!!
+Can't imagine going back to the early days without OH: I'd spend an entire afternoon figuring how to get IAM right
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100732/now-openhands-does-100-of-my-infra-iam-research-for-me-sweat#20482a73-4e2e-4edd-b6d1-c9e8442fccd1)
+
+![](/success-stories/stories/2025-06-11-infra-iam/s1.png)
+![](/success-stories/stories/2025-06-11-infra-iam/s2.png)
+
+</Update>
+
+<Update label="2025-06-08 OpenHands builds an interactive map for me" description="@Rodrigo Argenton Freire (ODLab)">
+
+## Very simple example, but baby steps....
+
+I am a professor of architecture and urban design. We built, me and some students, an interactive map prototype to help visitors and new students to find important places in the campus. Considering that we lack a lot of knowledge in programming, that was really nice to build and a smooth process.
+We first created the main components with all-hands and then adjusted some details locally. Definitely, saved us a lot of time and money.
+That's a prototype but we will have all the info by tuesday.
+https://buriti-emau.github.io/Mapa-UFU/
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100736/very-simple-example-but-baby-steps-i-am-a-professor-of-archi#8f2e3f3f-44e6-44ea-b9a8-d53487470179)
+
+![](/success-stories/stories/2025-06-08-map/s1.png)
+
+</Update>
+
+
+<Update label="2025-06-06 Web Search Saves the Day" description="@Ian Walker">
+
+## Tavily adapter helps solve persistent debugging issue
+
+Big congratulations to the new [Tavily adapter](https://www.all-hands.dev/blog/building-a-provably-versatile-agent)... OpenHands and I have been beavering away at a Lightstreamer client library for most of this week but were getting a persistent (and unhelpful) "unexpected error" from the server.
+
+Coming back to the problem today, after trying several unsuccessful fixes prompted by me, OH decided all by itself to search the web, and found the cause of the problem (of course it was simply CRLF line endings...). I was on the verge of giving up - good thing OH has more stamina than me!
+
+This demonstrates how OpenHands' web search capabilities can help solve debugging issues that would otherwise require extensive manual research.
+
+<iframe
+  width="560"
+  height="560"
+  src="https://www.linen.dev/s/openhands/t/29100737/big-congratulations-to-the-new-tavily-adapter-openhands-and-#87b027e5-188b-425e-8aa9-719dcb4929f4"
+  title="YouTube video player"
+  frameborder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+  allowfullscreen
+></iframe>
+
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100737/big-congratulations-to-the-new-tavily-adapter-openhands-and-#76f1fb26-6ef7-4709-b9ea-fb99105e47e4)
+
+</Update>
+
+<Update label="2025-06-05 OpenHands updates my personal website for a new paper" description="@Xingyao Wang">
+
+## I asked OpenHands to update my personal website for the "OpenHands Versa" paper.
+
+It is an extremely trivial task: You just need to browse to arxiv, copy the author names, format them for BibTeX, and then modify the papers.bib file. But now I'm getting way too lazy to even open my IDE and actually do this one-file change!
+
+[Original Tweet/X thread](https://x.com/xingyaow_/status/1930796287919542410)
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100738/i-asked-openhands-to-update-my-personal-website-for-the-open#f0324022-b12b-4d34-b12b-bdbc43823f69)
+
+</Update>
+
+<Update label="2025-06-02 OpenHands makes an animated gif of swe-bench verified scores over time" description="@Graham Neubig">
+
+## I asked OpenHands to make an animated gif of swe-bench verified scores over time.
+
+It took a bit of prompting but ended up looking pretty nice I think
+
+<video width="560" height="315" autoPlay loop muted src="/success-stories/stories/2025-06-02-swebench-score/s1.mp4"></video>
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100744/i-asked-openhands-to-make-an-animated-gif-of-swe-bench-verif#fb3b82c9-6222-4311-b97b-b2ac1cfe6dff)
+
+</Update>
+
+<Update label="2025-05-30 AWS Troubleshooting" description="@Graham Neubig">
+
+## Quick AWS security group fix
+
+I really don't like trying to fix issues with AWS, especially security groups and other finicky things like this. But I started up an instance and wasn't able to ssh in. So I asked OpenHands:
+
+> Currently, the following ssh command is timing out:
+>
+> $ ssh -i gneubig.pem ubuntu@XXX.us-east-2.compute.amazonaws.com
+> ssh: connect to host XXX.us-east-2.compute.amazonaws.com port 22: Operation timed out
+>
+> Use the provided AWS credentials to take a look at i-XXX and examine why
+
+And 2 minutes later I was able to SSH in!
+
+This shows how OpenHands can quickly diagnose and fix AWS infrastructure issues that would normally require manual investigation.
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100747/i-really-don-t-like-trying-to-fix-issues-with-aws-especially#d92a66d2-3bc1-4467-9d09-dc983004d083)
+
+</Update>
+
+
+<Update label="2025-05-04 Chrome Extension Development" description="@Xingyao Wang">
+
+## OpenHands builds Chrome extension for GitHub integration
+
+I asked OpenHands to write a Chrome extension based on our [OpenHands Cloud API](https://docs.all-hands.dev/modules/usage/cloud/cloud-api). Once installed, you can now easily launch an OpenHands cloud session from your GitHub webpage/PR!
+
+This demonstrates OpenHands' ability to create browser extensions and integrate with external APIs, enabling seamless workflows between GitHub and OpenHands Cloud.
+
+![Chrome extension](/success-stories/stories/2025-05-04-chrome-extension/s1.png)
+![Chrome extension](/success-stories/stories/2025-05-04-chrome-extension/s2.png)
+
+[GitHub Repository](https://github.com/xingyaoww/openhands-chrome-extension)
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100755/i-asked-openhands-to-write-a-chrome-extension-based-on-our-h#88f14b7f-f8ff-40a6-83c2-bd64e95924c5)
+
+</Update>
+
+
+<Update label="2025-04-11 Visual UI Testing" description="@Xingyao Wang">
+
+## OpenHands tests UI automatically with visual browsing
+
+Thanks to visual browsing -- OpenHands can actually test some simple UI by serving the website, clicking the button in the browser and looking at screenshots now!
+
+Prompt is just:
+```
+I want to create a Hello World app in Javascript that:
+* Displays Hello World in the middle.
+* Has a button that when clicked, changes the greeting with a bouncing animation to fun versions of Hello.
+* Has a counter for how many times the button has been clicked.
+* Has another button that changes the app's background color.
+```
+
+Eager-to-work Sonnet 3.7 will test stuff for you without you asking!
+
+This showcases OpenHands' visual browsing capabilities, enabling it to create, serve, and automatically test web applications through actual browser interactions and screenshot analysis.
+
+![Visual UI testing](/success-stories/stories/2025-04-11-visual-ui/s1.png)
+
+[Original Slack thread](https://www.linen.dev/s/openhands/t/29100764/thanks-to-u07k0p3bdb9-s-visual-browsing-openhands-can-actual#21beb9bc-1a04-4272-87e9-4d3e3b9925e7)
+
+</Update>
+
+<Update label="2025-03-07 Proactive Error Handling" description="@Graham Neubig">
+
+## OpenHands fixes crashes before you notice them
+
+Interesting story, I asked OpenHands to start an app on port 12000, it showed up on the app pane. I started using the app, and then it crashed... But because it crashed in OpenHands, OpenHands immediately saw the error message and started fixing the problem without me having to do anything. It was already fixing the problem before I even realized what was going wrong.
+
+This demonstrates OpenHands' proactive monitoring capabilities - it doesn't just execute commands, but actively watches for errors and begins remediation automatically, often faster than human reaction time.
+
+</Update>
+
+<Update label="2024-12-03 Creative Design Acceleration" description="@Rohit Malhotra">
+
+## Pair programming for interactive design projects
+
+Used OpenHands as a pair programmer to do heavy lifting for a creative/interactive design project in p5js.
+
+I usually take around 2 days for high fidelity interactions (planning strategy + writing code + circling back with designer), did this in around 5hrs instead with the designer watching curiously the entire time.
+
+This showcases how OpenHands can accelerate creative and interactive design workflows, reducing development time by 75% while maintaining high quality output.
+
+[Original Tweet](https://x.com/rohit_malh5/status/1863995531657425225)
+
+</Update>
--- a/docs/success-stories/stories/2025-04-11-visual-ui/s1.png
+++ b/docs/success-stories/stories/2025-04-11-visual-ui/s1.png
--- a/docs/success-stories/stories/2025-05-04-chrome-extension/s1.png
+++ b/docs/success-stories/stories/2025-05-04-chrome-extension/s1.png
--- a/docs/success-stories/stories/2025-05-04-chrome-extension/s2.png
+++ b/docs/success-stories/stories/2025-05-04-chrome-extension/s2.png
--- a/docs/success-stories/stories/2025-06-02-swebench-score/s1.mp4
+++ b/docs/success-stories/stories/2025-06-02-swebench-score/s1.mp4
--- a/docs/success-stories/stories/2025-06-08-map/s1.png
+++ b/docs/success-stories/stories/2025-06-08-map/s1.png
--- a/docs/success-stories/stories/2025-06-11-infra-iam/s1.png
+++ b/docs/success-stories/stories/2025-06-11-infra-iam/s1.png
--- a/docs/success-stories/stories/2025-06-11-infra-iam/s2.png
+++ b/docs/success-stories/stories/2025-06-11-infra-iam/s2.png
--- a/docs/success-stories/stories/2025-06-13-show-love/v1.mp4
+++ b/docs/success-stories/stories/2025-06-13-show-love/v1.mp4
--- a/docs/usage/cloud/cloud-ui.mdx
+++ b/docs/usage/cloud/cloud-ui.mdx
@@ -1,7 +1,7 @@
 ---
 title: Cloud UI
-description: The Cloud UI provides a web interface for interacting with OpenHands. This page explains how to use the
- OpenHands Cloud UI.
+description: The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on
+ how to use the OpenHands Cloud UI.
 ---

 ## Landing Page
@@ -19,10 +19,12 @@ The landing page is where you can:
 The Settings page allows you to:

 - [Configure GitHub repository access](/usage/cloud/github-installation#modifying-repository-access) for OpenHands.
+- [Install the OpenHands Slack app](/usage/cloud/slack-installation).
 - Set application settings like your preferred language, notifications and other preferences.
 - Add credits to your account.
- Generate custom secrets.
- Create API keys to work with OpenHands programmatically.
+- [Generate custom secrets](/usage/common-settings#secrets-management).
+- [Create API keys to work with OpenHands programmatically](/usage/cloud/cloud-api).
+- Change your email address.

 ## Key Features

--- a/docs/usage/cloud/github-installation.mdx
+++ b/docs/usage/cloud/github-installation.mdx
@@ -35,7 +35,7 @@ You can grant OpenHands access to specific GitHub repositories:

 You can modify GitHub repository access at any time by:
 - Selecting `Add GitHub repos` on the landing page or
- Visiting the Settings page and selecting `Configure GitHub Repositories` under the `Git` tab
+- Visiting the Settings page and selecting `Configure GitHub Repositories` under the `Integrations` tab

 ## Working With GitHub Repos in Openhands Cloud

--- a/docs/usage/cloud/slack-installation.mdx
+++ b/docs/usage/cloud/slack-installation.mdx
@@ -1,21 +1,51 @@
 ---
-title: Slack Integration - Coming soon...
+title: Slack Integration (Beta)
 description: This guide walks you through installing the OpenHands Slack app.
 ---

-<Warning>This integration is not live yet, but will be available soon.</Warning>
+<iframe
+  className="w-full aspect-video"
+  src="https://www.youtube.com/embed/hbloGmfZsJ4"
+  title="OpenHands Slack Integration Tutorial"
+  frameBorder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+  allowFullScreen>
+</iframe>

 ## Prerequisites

- You are a slack workspace admin
- Access to OpenHands Cloud
+- Access to OpenHands Cloud.

 ## Installation Steps

-1. Log in to [OpenHands Cloud](https://app.all-hands.dev)
-2. Click the button below to OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
-3. In the top right corner, select the workspace to install the OpenHands Slack app.
-4. Review permissions and click allow
+<AccordionGroup>
+<Accordion title="Install Slack App (only for Slack admins/owners)">
+
+  **This step is for Slack admins/owners**
+
+  1. Make sure you have permissions to install Apps to your workspace.
+  2. Click the button below to install OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+</Accordion>
+
+<Accordion title="Authorize Slack App (for all Slack workspace members)">
+
+  **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.**
+
+  Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this:
+  1. Visit [integrations settings](https://app.all-hands.dev/settings/integrations) in OpenHands Cloud.
+  2. Click `Install OpenHands Slack App`.
+  3. In the top right corner, select the workspace to install the OpenHands Slack app.
+  4. Review permissions and click allow.
+
+  Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App.
+
+</Accordion>
+
+</AccordionGroup>
+

 ## Working With the Slack App

@@ -47,6 +77,6 @@ You can mention a repo name when starting a new conversation in the following fo
 2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)

 The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
-If the repo name partially matches against, multiple repos, you'll be asked to select a repo from the filtered list.
+If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list.

 ![slack-pro-tip.png](/static/img/slack-pro-tip.png)
--- a/docs/usage/common-settings.mdx
+++ b/docs/usage/common-settings.mdx
@@ -0,0 +1,52 @@
+---
+title: OpenHands Settings
+description: Overview of some of the settings available in OpenHands.
+---
+
+## Openhands Cloud vs Running on Your Own
+
+There are some differences between the settings available in OpenHands Cloud and those available when running OpenHands
+on your own:
+* [OpenHands Cloud settings](/usage/cloud/cloud-ui#settings)
+* [Settings available when running on your own](/usage/how-to/gui-mode#settings)
+
+Refer to these pages for more detailed information.
+
+## Secrets Management
+
+OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be
+accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment
+variables in the agent's runtime environment.
+
+### Accessing the Secrets Manager
+
+In the Settings page, navigate to the `Secrets` tab. Here, you'll see a list of all your existing custom secrets.
+
+### Adding a New Secret
+1. Click `Add a new secret`.
+2. Fill in the following fields:
+   - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name.
+   - **Value**: The sensitive information you want to store.
+   - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent.
+3. Click `Add secret` to save.
+
+### Editing a Secret
+
+1. Click the `Edit` button next to the secret you want to modify.
+2. You can update the name and description of the secret.
+<Note>
+  For security reasons, you cannot view or edit the value of an existing secret. If you need to change the
+  value, delete the secret and create a new one.
+</Note>
+
+### Deleting a Secret
+
+1. Click the `Delete` button next to the secret you want to remove.
+2. Select `Confirm` to delete the secret.
+
+### Using Secrets in the Agent
+ - All custom secrets are automatically exported as environment variables in the agent's runtime environment.
+ - You can access them in your code using standard environment variable access methods
+   (e.g., `os.environ['SECRET_NAME']` in Python).
+ - Example: If you create a secret named `OPENAI_API_KEY`, you can access it in your code as
+   `process.env.OPENAI_API_KEY` in JavaScript or `os.environ['OPENAI_API_KEY']` in Python.
--- a/docs/usage/configuration-options.mdx
+++ b/docs/usage/configuration-options.mdx
@@ -1,28 +1,17 @@
 ---
 title: Configuration Options
-description: This page outlines all available configuration options for OpenHands, allowing you to customize its behavior and integrate it with other services. In GUI Mode, any settings applied through the Settings UI will take precedence.
+description: This page outlines all available configuration options for OpenHands, allowing you to customize its
+  behavior and integrate it with other services.
 ---

+<Note>
+   In GUI Mode, any settings applied through the Settings UI will take precedence.
+</Note>
+
 ## Core Configuration

 The core configuration options are defined in the `[core]` section of the `config.toml` file.

-### API Keys
- `e2b_api_key`
-  - Type: `str`
-  - Default: `""`
-  - Description: API key for E2B
-
- `modal_api_token_id`
-  - Type: `str`
-  - Default: `""`
-  - Description: API token ID for Modal
-
- `modal_api_token_secret`
-  - Type: `str`
-  - Default: `""`
-  - Description: API token secret for Modal
-
 ### Workspace
 - `workspace_base` **(Deprecated)**
  - Type: `str`
--- a/docs/usage/faqs.mdx
+++ b/docs/usage/faqs.mdx
@@ -0,0 +1,96 @@
+---
+title: FAQs
+description: Frequently asked questions about OpenHands
+icon: question
+---
+
+## Getting Started
+
+### I'm new to OpenHands. Where should I start?
+
+1. **Quick start**: Use [OpenHands Cloud](/usage/cloud/openhands-cloud) to get started quickly with
+  [GitHub](/usage/cloud/github-installation), [GitLab](/usage/cloud/gitlab-installation),
+  and [Slack](/usage/cloud/slack-installation) integrations.
+2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/usage/local-setup).
+3. **First steps**: Complete the [start building tutorial](/usage/getting-started) to learn the basics.
+
+### Can I use OpenHands for production workloads?
+
+OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant
+deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability.
+
+If you're interested in running OpenHands in a multi-tenant environment, check out the source-available,
+commercially-licensed [OpenHands Cloud Helm Chart](https://github.com/all-Hands-AI/OpenHands-cloud).
+
+<Info>
+Using OpenHands for work? We'd love to chat! Fill out
+[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
+to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide
+input on our product roadmap.
+</Info>
+
+## Safety and Security
+
+### It's doing stuff without asking, is that safe?
+
+**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container
+(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration:
+
+**What's protected:**
+- Your host system files and programs (unless you mount them using [this feature](/usage/runtimes/docker#connecting-to-your-filesystem))
+- Host system resources
+- Other containers and processes
+
+**Potential risks to consider:**
+- The agent can access the internet from within the container.
+- If you provide credentials (API keys, tokens), the agent can use them.
+- Mounted files and directories can be modified or deleted.
+- Network requests can be made to external services.
+
+For detailed security information, see our [Runtime Architecture](/usage/architecture/runtime),
+[Security Configuration](/usage/configuration-options#security-configuration),
+and [Hardened Docker Installation](/usage/runtimes/docker#hardened-docker-installation) documentation.
+
+## File Storage and Access
+
+### Where are my files stored?
+
+Your files are stored in different locations depending on how you've configured OpenHands:
+
+**Default behavior (no file mounting):**
+- Files created by the agent are stored inside the runtime Docker container.
+- These files are temporary and will be lost when the container is removed.
+- The agent works in the `/workspace` directory inside the runtime container.
+
+**When you mount your local filesystem (following [this](/usage/runtimes/docker#connecting-to-your-filesystem)):**
+- Your local files are mounted into the container's `/workspace` directory.
+- Changes made by the agent are reflected in your local filesystem.
+- Files persist after the container is stopped.
+
+<Warning>
+Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory.
+</Warning>
+
+## Development Tools and Environment
+
+### How do I get the dev tools I need?
+
+OpenHands comes with a basic runtime environment that includes Python and Node.js.
+It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment.
+
+If you would like to set things up more systematically, you can:
+- **Use setup.sh**: Add a [setup.sh file](/usage/prompting/repository#setup-script) file to
+  your repository, which will be run every time the agent starts.
+- **Use a custom sandbox**: Use a [custom docker image](/usage/how-to/custom-sandbox-guide) to initialize the sandbox.
+
+### Something's not working. Where can I get help?
+
+1. **Search existing issues**: Check our [GitHub issues](https://github.com/All-Hands-AI/OpenHands/issues) to see if
+  others have encountered the same problem.
+2. **Join our community**: Get help from other users and developers:
+   - [Slack community](https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA)
+   - [Discord server](https://discord.gg/ESHStjSjD4)
+3. **Check our troubleshooting guide**: Common issues and solutions are documented in
+  [Troubleshooting](/usage/troubleshooting/troubleshooting).
+4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/All-Hands-AI/OpenHands/issues/new)
+  and fill in as much detail as possible.
--- a/docs/usage/getting-started.mdx
+++ b/docs/usage/getting-started.mdx
@@ -1,6 +1,6 @@
 ---
 title: Start Building
-description: So you've [run OpenHands](./installation) and have [set up your LLM](./installation#setup). Now what?
+description: So you've [run OpenHands](/usage/installation). Now what?
 icon: code
 ---

--- a/docs/usage/how-to/cli-mode.mdx
+++ b/docs/usage/how-to/cli-mode.mdx
@@ -7,32 +7,50 @@ description: The Command-Line Interface (CLI) provides a powerful interface that
 This mode is different from the [headless mode](/usage/how-to/headless-mode), which is non-interactive and better
 for scripting.

+<iframe
+  className="w-full aspect-video"
+  src="https://www.youtube.com/embed/PfvIx4y8h7w"
+  title="OpenHands CLI Tutorial"
+  frameBorder="0"
+  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+  allowFullScreen>
+</iframe>
+
 ## Getting Started

 ### Running with Python

-1. Install OpenHands using pip:
+**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported)

+1. Install OpenHands using pip:
 ```bash
 pip install openhands-ai
 ```

-2. Set your model, API key, and other preferences using environment variables or with the [`config.toml`](https://github.com/All-Hands-AI/OpenHands/blob/main/config.template.toml) file.
-3. Launch an interactive OpenHands conversation from the command line:
+  Or if you prefer not to manage your own Python environment, you can use `uvx`:

+```bash
+uvx --python 3.12 --from openhands-ai openhands
+```
+
+2. Launch an interactive OpenHands conversation from the command line:
 ```bash
 openhands
 ```

+<Note>
+  If you have cloned the repository, you can also run the CLI directly using Poetry:
+
+  poetry run python -m openhands.cli.main
+</Note>
+
+3. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
+
 This command opens an interactive prompt where you can type tasks or commands and get responses from OpenHands.
+The first time you run the CLI, it will take you through configuring the required LLM
+settings. These will be saved for future sessions.

-#### For Developers
-
-If you have cloned the repository, you can run the CLI directly using Poetry:
-
-```bash
-poetry run python -m openhands.cli.main
-```
+The conversation history will be saved in `~/.openhands/sessions`.

 ### Running with Docker

@@ -46,23 +64,30 @@ poetry run python -m openhands.cli.main
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
    -e LLM_MODEL=$LLM_MODEL \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.47 \
    python -m openhands.cli.main --override-cli-mode true
 ```

-This launches the CLI in Docker, allowing you to interact with OpenHands as described above.
+<Note>
+  If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your
+  conversation history to the new location.
+</Note>
+
+This launches the CLI in Docker, allowing you to interact with OpenHands.

 The `-e SANDBOX_USER_ID=$(id -u)` ensures files created by the agent in your workspace have the correct permissions.

+The conversation history will be saved in `~/.openhands/sessions`.
+
 ## Interactive CLI Overview

 ### What is CLI Mode?
--- a/docs/usage/how-to/custom-sandbox-guide.mdx
+++ b/docs/usage/how-to/custom-sandbox-guide.mdx
@@ -1,6 +1,7 @@
 ---
 title: Custom Sandbox
-description: This guide is for users that would like to use their own custom Docker image for the runtime. For example, with certain tools or programming languages pre-installed.
+description: This guide is for users that would like to use their own custom Docker image for the runtime.
+  For example, with certain tools or programming languages pre-installed.
 ---

 The sandbox is where the agent performs its tasks. Instead of running commands directly on your computer
--- a/docs/usage/how-to/gui-mode.mdx
+++ b/docs/usage/how-to/gui-mode.mdx
@@ -25,9 +25,9 @@ You can use the Settings page at any time to:
 - Setup the LLM provider and model for OpenHands.
 - [Setup the search engine](/usage/search-engine-setup).
 - [Configure MCP servers](/usage/mcp).
- [Connect to GitHub](/usage/how-to/gui-mode#github-setup) and [connect to GitLab](/usage/how-to/gui-mode#gitlab-setup)
+- [Connect to GitHub](/usage/how-to/gui-mode#github-setup) and [connect to GitLab](/usage/how-to/gui-mode#gitlab-setup).
 - Set application settings like your preferred language, notifications and other preferences.
- Generate custom secrets.
+- [Manage custom secrets](/usage/common-settings#secrets-management).

 #### GitHub Setup

@@ -45,7 +45,7 @@ OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if pro
     - All Repositories (You can select specific repositories, but this will impact what returns in repo search)
     - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation)
  2. **Enter Token in OpenHands**:
-   - In the Settings page, navigate to the `Git` tab.
+   - In the Settings page, navigate to the `Integrations` tab.
   - Paste your token in the `GitHub Token` field.
   - Click `Save Changes` to apply the changes.

@@ -97,7 +97,7 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
     - `write_repository` (Write repository)
   - Set an expiration date or leave it blank for a non-expiring token.
  2. **Enter Token in OpenHands**:
-   - In the Settings page, navigate to the `Git` tab.
+   - In the Settings page, navigate to the `Integrations` tab.
   - Paste your token in the `GitLab Token` field.
   - Click `Save Changes` to apply the changes.

@@ -122,6 +122,41 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
 </Accordion>
 </AccordionGroup>

+#### BitBucket Setup (Coming soon ...)
+<AccordionGroup>
+<Accordion title="Setting Up a BitBucket Password">
+1. **Generate an App Password**:
+   - On BitBucket, go to Personal Settings > App Password.
+   - Create a new password with the following scopes:
+     - `repository: read`
+     - `repository: write`
+     - `pull requests: read`
+     - `pull requests: write`
+     - `issues: read`
+     - `issues: write`
+   - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future.
+  2. **Enter Token in OpenHands**:
+   - In the Settings page, navigate to the `Integrations` tab.
+   - Paste your token in the `BitBucket Token` field.
+   - Click `Save Changes` to apply the changes.
+</Accordion>
+
+<Accordion title="Troubleshooting">
+  Common issues and solutions:
+
+  - **Token Not Recognized**:
+     - Ensure the token is properly saved in settings.
+     - Check that the token hasn't expired.
+     - Verify the token has the required scopes.
+
+  - **Verifying Token Works**:
+     - The app will show a green checkmark if the token is valid.
+     - Try accessing a repository to confirm permissions.
+     - Check the browser console for any error messages.
+</Accordion>
+
+</AccordionGroup>
+
 #### Advanced Settings

 The `Advanced` settings allows configuration of additional LLM settings. Inside the Settings page, under the `LLM` tab,
@@ -142,11 +177,11 @@ section of the documentation.
 The status indicator located in the bottom left of the screen will cycle through a number of states as a new conversation
 is loaded. Typically these include:

-* `Disconnected` : The frontend is not connected to any conversation
+* `Disconnected` : The frontend is not connected to any conversation.
 * `Connecting` : The frontend is connecting a websocket to a conversation.
 * `Building Runtime...` : The server is building a runtime. This is typically in development mode only while building a docker image.
 * `Starting Runtime...` : The server is starting a new runtime instance - probably a new docker container or remote runtime.
-* `Initializing Agent...` : The server is starting the agent loop. (This step does not appear at present with Nested runtimes)
+* `Initializing Agent...` : The server is starting the agent loop (This step does not appear at present with Nested runtimes).
 * `Setting up workspace...` : Usually this means a `git clone ...` operation.
 * `Setting up git hooks` : Setting up the git pre commit hooks for the workspace.
 * `Agent is awaiting user input...` : Ready to go!
@@ -154,7 +189,7 @@ is loaded. Typically these include:
 ## Tips for Effective Use

 - Be specific in your requests to get the most accurate and helpful responses, as described in the [prompting best practices](../prompting/prompting-best-practices).
- Use one of the recommended models, as described in the [LLMs section](usage/llms/llms.md).
+- Use one of the recommended models, as described in the [LLMs section](/usage/llms/llms).

 ## Other Ways to Run Openhands
 - [Run OpenHands in a scriptable headless mode.](/usage/how-to/headless-mode)
--- a/docs/usage/how-to/headless-mode.mdx
+++ b/docs/usage/how-to/headless-mode.mdx
@@ -32,19 +32,20 @@ To run OpenHands in Headless mode with Docker:
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
    -e LLM_MODEL=$LLM_MODEL \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.47 \
    python -m openhands.core.main -t "write a bash script that prints hi"
 ```
+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.

 The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user’s
 permissions. This prevents the agent from creating root-owned files in the mounted workspace.
--- a/docs/usage/installation.mdx
+++ b/docs/usage/installation.mdx
@@ -1,12 +1,12 @@
 ---
 title: Quick Start
-description: Running OpenHands Cloud or running on your local system.
+description: Running OpenHands Cloud or running on your own.
 icon: rocket
 ---

 ## OpenHands Cloud

-The easiest way to get started with OpenHands is on OpenHands Cloud, which comes with $50 in free credits for new users.
+The easiest way to get started with OpenHands is on OpenHands Cloud, which comes with $20 in free credits for new users.

 To get started with OpenHands Cloud, visit [app.all-hands.dev](https://app.all-hands.dev).

--- a/docs/usage/llms/google-llms.mdx
+++ b/docs/usage/llms/google-llms.mdx
@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to Google's chat models. You c
 When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `Gemini`
 - `LLM Model` to the model you will be using.
-If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` 
+If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
 (e.g. gemini/&lt;model-name&gt; like `gemini/gemini-2.0-flash`).
 - `API Key` to your Gemini API key

@@ -26,5 +26,5 @@ VERTEXAI_LOCATION="<your-gcp-location>"
 Then set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `VertexAI`
 - `LLM Model` to the model you will be using.
-If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` 
+If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
 (e.g. vertex_ai/&lt;model-name&gt;).
--- a/docs/usage/llms/groq.mdx
+++ b/docs/usage/llms/groq.mdx
@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to chat models on Groq. You ca
 When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `Groq`
 - `LLM Model` to the model you will be using. [Visit here to see the list of
-models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, 
+models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list,
 enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/&lt;model-name&gt; like `groq/llama3-70b-8192`).
 - `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys).

--- a/docs/usage/llms/litellm-proxy.mdx
+++ b/docs/usage/llms/litellm-proxy.mdx
@@ -16,7 +16,7 @@ To use LiteLLM proxy with OpenHands, you need to:

 ## Supported Models

-The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy 
+The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy
 is configured to handle.

 Refer to your LiteLLM proxy configuration for the list of available models and their names.
--- a/docs/usage/llms/llms.mdx
+++ b/docs/usage/llms/llms.mdx
@@ -73,6 +73,15 @@ We have a few guides for running OpenHands with specific model providers:
 - [OpenAI](/usage/llms/openai-llms)
 - [OpenRouter](/usage/llms/openrouter)

+## Model Customization
+
+LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as:
+
+- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer
+- **Native Tool Calling**: Toggle native function/tool calling capabilities
+
+For detailed information about model customization, see [LLM Configuration Options](configuration-options#llm-customization).
+
 ### API retries and rate limits

 LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically
--- a/docs/usage/llms/local-llms.mdx
+++ b/docs/usage/llms/local-llms.mdx
@@ -6,73 +6,85 @@ description: When using a Local LLM, OpenHands may have limited functionality. I
 ## News

 - 2025/05/21: We collaborated with Mistral AI and released [Devstral Small](https://mistral.ai/news/devstral) that achieves [46.8% on SWE-Bench Verified](https://github.com/SWE-bench/experiments/pull/228)!
- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
+- 2025/03/31: We released an open model OpenHands LM 32B v0.1 that achieves 37.1% on SWE-Bench Verified
 ([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).

+## Quickstart: Running OpenHands with a Local LLM using LM Studio

-## Quickstart: Running OpenHands on Your Macbook
+This guide explains how to serve a local Devstral LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it.

-### Serve the model on your Macbook
+We recommend:
+- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration.
+- **Devstral Small 2505** as the LLM for software development, trained on real GitHub issues and optimized for agent-style workflows like OpenHands.

-We recommend using [LMStudio](https://lmstudio.ai/) for serving these models locally.
+### Hardware Requirements

-1. Download [LM Studio](https://lmstudio.ai/) and install it
+Running Devstral requires a recent GPU with at least 16GB of VRAM, or a Mac with Apple Silicon (M1, M2, etc.) with at least 32GB of RAM.

-2. Download the model:
-   - Option 1: Directly download the LLM from [this link](https://lmstudio.ai/model/devstral-small-2505-mlx) or by searching for the name `Devstral-Small-2505` in LM Studio
-   - Option 2: Download a LLM in GGUF format. For example, to download [Devstral Small 2505 GGUF](https://huggingface.co/mistralai/Devstral-Small-2505_gguf), using `huggingface-cli download mistralai/Devstral-Small-2505_gguf --local-dir mistralai/Devstral-Small-2505_gguf`. Then in bash terminal, run `lms import {model_name}` in the directory where you've downloaded the model checkpoint (e.g. run `lms import devstralQ4_K_M.gguf` in `mistralai/Devstral-Small-2505_gguf`)
+### 1. Install LM Studio

-3. Open LM Studio application, you should first switch to `power user` mode, and then open the developer tab:
+Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/).

-![image](./screenshots/1_select_power_user.png)
+### 2. Download Devstral Small

-4. Then click `Select a model to load` on top of the application:
+1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window.
+2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page.

-![image](./screenshots/2_select_model.png)
+![image](./screenshots/01_lm_studio_open_model_hub.png)

-5. And choose the model you want to use, holding `option` on mac to enable advanced loading options:
+3. Search for the "Devstral Small 2505" model, confirm it's the official Mistral AI (mistralai) model, then proceed to download.

-![image](./screenshots/3_select_devstral.png)
+![image](./screenshots/02_lm_studio_download_devstral.png)

-6. You should then pick an appropriate context window for OpenHands based on your hardware configuration (larger than 32768 is recommended for using OpenHands, but too large may cause you to run out of memory); Flash attention is also recommended if it works on your machine.
+4. Wait for the download to finish.

-![image](./screenshots/4_set_context_window.png)
+### 3. Load the Model

-7. And you should start the server (if it is not already in `Running` status), un-toggle `Serve on Local Network` and remember the port number of the LMStudio URL (`1234` is the port number for `http://127.0.0.1:1234` in this example):
+1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console.
+2. Click the "Select a model to load" dropdown at the top of the application window.

-![image](./screenshots/5_copy_url.png)
+![image](./screenshots/03_lm_studio_open_load_model.png)

-8. Finally, you can click the `copy` button near model name to copy the model name (`imported-models/uncategorized/devstralq4_k_m.gguf` in this example):
+3. Enable the "Manually choose model load parameters" switch.
+4. Select 'Devstral Small 2505' from the model list.

-![image](./screenshots/6_copy_to_get_model_name.png)
+![image](./screenshots/04_lm_studio_setup_devstral_part_1.png)

-### Start OpenHands with locally served model
+5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings.
+6. Set "Context Length" to at least 32768 and enable Flash Attention.
+7. Click "Load Model" to start loading the model.

-Check [the installation guide](/usage/local-setup) to make sure you have all the prerequisites for running OpenHands.
+![image](./screenshots/05_lm_studio_setup_devstral_part_2.png)
+
+### 4. Start the LLM server
+
+1. Enable the switch next to "Status" at the top-left of the Window.
+2. Take note of the Model API Identifier shown on the sidebar on the right.
+
+![image](./screenshots/06_lm_studio_start_server.png)
+
+### 5. Start OpenHands
+
+1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:

 ```bash
-export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
-export LMSTUDIO_URL="http://host.docker.internal:1234"  # <- Replace this with the port from LMStudio
-
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
-
-mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.47
 ```

-Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
+2. Wait until the server is running (see log below):
 ```
 Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
-Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.43
+Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.47
 Starting OpenHands...
 Running OpenHands as root
 14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
@@ -82,53 +94,88 @@ INFO:     Application startup complete.
 INFO:     Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
 ```

+3. Visit `http://localhost:3000` in your browser.

-## Advanced: Serving LLM on GPUs
+### 6. Configure OpenHands to use the LLM server

-### Download model checkpoints
+Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started.

-<Note>
-The model checkpoints downloaded here should NOT be in GGUF format.
-</Note>
+When started for the first time, OpenHands will prompt you to set up the LLM provider.

-For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):
+1. Click "see advanced settings" to open the LLM Settings page.
+
+![image](./screenshots/07_openhands_open_advanced_settings.png)
+
+2. Enable the "Advanced" switch at the top of the page to show all the available settings.
+
+3. Set the following values:
+    - **Custom Model**: `openai/mistralai/devstral-small-2505` (the Model API identifier from LM Studio, prefixed with "openai/")
+    - **Base URL**: `http://host.docker.internal:1234/v1`
+    - **API Key**: `local-llm`
+
+4. Click "Save Settings" to save the configuration.
+
+![image](./screenshots/08_openhands_configure_local_llm_parameters.png)
+
+That's it! You can now start using OpenHands with the local LLM server.
+
+If you encounter any issues, let us know on [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-3847of6xi-xuYJIPa6YIPg4ElbDWbtSA) or [Discord](https://discord.gg/ESHStjSjD4).
+
+## Advanced: Alternative LLM Backends
+
+This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio.
+
+### Create an OpenAI-Compatible Endpoint with Ollama
+
+- Install Ollama following [the official documentation](https://ollama.com/download).
+- Example launch command for Devstral Small 2505:

 ```bash
-huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir all-hands/openhands-lm-32b-v0.1
+# ⚠️ WARNING: OpenHands requires a large context size to work properly.
+# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 32768.
+# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly.
+OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve &
+ollama pull devstral:latest
 ```

-### Create an OpenAI-Compatible Endpoint With SGLang
+### Create an OpenAI-Compatible Endpoint with vLLM or SGLang
+
+First, download the model checkpoints. For [Devstral Small 2505](https://huggingface.co/mistralai/Devstral-Small-2505):
+
+```bash
+huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505
+```
+
+#### Serving the model using SGLang

 - Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
+- Example launch command for Devstral Small 2505 (with at least 2 GPUs):

 ```bash
 SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
-    --model all-hands/openhands-lm-32b-v0.1 \
-    --served-model-name openhands-lm-32b-v0.1 \
+    --model mistralai/Devstral-Small-2505 \
+    --served-model-name Devstral-Small-2505 \
    --port 8000 \
    --tp 2 --dp 1 \
    --host 0.0.0.0 \
    --api-key mykey --context-length 131072
 ```

-### Create an OpenAI-Compatible Endpoint with vLLM
+#### Serving the model using vLLM

 - Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
+- Example launch command for Devstral Small 2505 (with at least 2 GPUs):

 ```bash
-vllm serve all-hands/openhands-lm-32b-v0.1 \
+vllm serve mistralai/Devstral-Small-2505 \
    --host 0.0.0.0 --port 8000 \
    --api-key mykey \
    --tensor-parallel-size 2 \
-    --served-model-name openhands-lm-32b-v0.1
+    --served-model-name Devstral-Small-2505 \
    --enable-prefix-caching
 ```

-## Advanced: Run and Configure OpenHands
-
-### Run OpenHands
+### Run OpenHands (Alternative Backends)

 #### Using Docker

@@ -137,24 +184,20 @@ Run OpenHands using [the official docker run command](../installation#start-the-
 #### Using Development Mode

 Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
-Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:
-
-```
-[core]
-workspace_base="/path/to/your/workspace"
-
-[llm]
-model="openhands-lm-32b-v0.1"
-ollama_base_url="http://localhost:8000"
-```

 Start OpenHands using `make run`.

-### Configure OpenHands
+### Configure OpenHands (Alternative Backends)

-Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
-1. Enable `Advanced` options.
-2. Set the following:
- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
- `Base URL` to `http://host.docker.internal:8000`
- `API key` to the same string you set when serving the model (e.g. `mykey`)
+Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab.
+
+1. Click **"see advanced settings"** to access the full configuration panel.
+2. Enable the **Advanced** toggle at the top of the page.
+3. Set the following parameters, if you followed the examples above:
+   - **Custom Model**: `openai/<served-model-name>`
+     e.g. `openai/devstral` if you're using Ollama, or `openai/Devstral-Small-2505` for SGLang or vLLM.
+   - **Base URL**: `http://host.docker.internal:<port>/v1`
+     Use port `11434` for Ollama, or `8000` for SGLang and vLLM.
+   - **API Key**:
+     - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`)
+     - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`)
--- a/docs/usage/llms/openrouter.mdx
+++ b/docs/usage/llms/openrouter.mdx
@@ -9,6 +9,6 @@ When running OpenHands, you'll need to set the following in the OpenHands UI thr
 * `LLM Provider` to `OpenRouter`
 * `LLM Model` to the model you will be using.
 [Visit here to see a full list of OpenRouter models](https://openrouter.ai/models).
-If the model is not in the list, enable `Advanced` options, and enter it in 
+If the model is not in the list, enable `Advanced` options, and enter it in
 `Custom Model` (e.g. openrouter/&lt;model-name&gt; like `openrouter/anthropic/claude-3.5-sonnet`).
 * `API Key` to your OpenRouter API key.
--- a/docs/usage/llms/screenshots/01_lm_studio_open_model_hub.png
+++ b/docs/usage/llms/screenshots/01_lm_studio_open_model_hub.png
--- a/docs/usage/llms/screenshots/02_lm_studio_download_devstral.png
+++ b/docs/usage/llms/screenshots/02_lm_studio_download_devstral.png
--- a/docs/usage/llms/screenshots/03_lm_studio_open_load_model.png
+++ b/docs/usage/llms/screenshots/03_lm_studio_open_load_model.png
--- a/docs/usage/llms/screenshots/04_lm_studio_setup_devstral_part_1.png
+++ b/docs/usage/llms/screenshots/04_lm_studio_setup_devstral_part_1.png
--- a/docs/usage/llms/screenshots/05_lm_studio_setup_devstral_part_2.png
+++ b/docs/usage/llms/screenshots/05_lm_studio_setup_devstral_part_2.png
--- a/docs/usage/llms/screenshots/06_lm_studio_start_server.png
+++ b/docs/usage/llms/screenshots/06_lm_studio_start_server.png
--- a/docs/usage/llms/screenshots/07_openhands_open_advanced_settings.png
+++ b/docs/usage/llms/screenshots/07_openhands_open_advanced_settings.png
--- a/docs/usage/llms/screenshots/08_openhands_configure_local_llm_parameters.png
+++ b/docs/usage/llms/screenshots/08_openhands_configure_local_llm_parameters.png
--- a/docs/usage/llms/screenshots/1_select_power_user.png
+++ b/docs/usage/llms/screenshots/1_select_power_user.png
--- a/docs/usage/llms/screenshots/2_select_model.png
+++ b/docs/usage/llms/screenshots/2_select_model.png
--- a/docs/usage/llms/screenshots/3_select_devstral.png
+++ b/docs/usage/llms/screenshots/3_select_devstral.png
--- a/docs/usage/llms/screenshots/4_set_context_window.png
+++ b/docs/usage/llms/screenshots/4_set_context_window.png
--- a/docs/usage/llms/screenshots/5_copy_url.png
+++ b/docs/usage/llms/screenshots/5_copy_url.png
--- a/docs/usage/llms/screenshots/6_copy_to_get_model_name.png
+++ b/docs/usage/llms/screenshots/6_copy_to_get_model_name.png
--- a/docs/usage/local-setup.mdx
+++ b/docs/usage/local-setup.mdx
@@ -67,19 +67,21 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
 ### Start the App

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.47-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
-    -v ~/.openhands-state:/.openhands-state \
+    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.43
+    docker.all-hands.dev/all-hands-ai/openhands:0.47
 ```

+> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 You'll find OpenHands running at http://localhost:3000!

 ### Setup
@@ -151,8 +153,6 @@ To enable search functionality in OpenHands:

 For more details, see the [Search Engine Setup](/usage/search-engine-setup) guide.

-Now you're ready to [get started with OpenHands](/usage/getting-started).
-
 ### Versions

 The [docker command above](/usage/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well:
--- a/docs/usage/mcp.mdx
+++ b/docs/usage/mcp.mdx
@@ -1,6 +1,7 @@
 ---
 title: Model Context Protocol (MCP)
-description: This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you to extend the agent's capabilities with custom tools.
+description: This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you
+  to extend the agent's capabilities with custom tools.
 ---

 ## Overview
--- a/docs/usage/prompting/microagents-org.mdx
+++ b/docs/usage/prompting/microagents-org.mdx
@@ -5,12 +5,14 @@ description: Organizations and users can define microagents that apply to all re

 ## Usage

-These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded 
+These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded
 accordingly. However, they are applied to all repositories belonging to the organization or user.

 Add a `.openhands` repository under the organization or user and create a `microagents` directory and place the
 microagents in that directory.

+For GitLab organizations, use `openhands-config` as the repository name instead of `.openhands`, since GitLab doesn't support repository names starting with non-alphanumeric characters.
+
 ## Example

 General microagent file example for organization `Great-Co` located inside the `.openhands` repository:
@@ -20,3 +22,5 @@ General microagent file example for organization `Great-Co` located inside the `
 * Document interfaces and public APIs; use implementation comments only for non-obvious logic.
 * Follow the same naming convention for variables, classes, constants, etc. already used in each repository.
 ```
+
+For GitLab organizations, the same microagent would be located inside the `openhands-config` repository.
--- a/docs/usage/runtimes/daytona.mdx
+++ b/docs/usage/runtimes/daytona.mdx
@@ -3,7 +3,6 @@ title: Daytona Runtime
 description: You can use [Daytona](https://www.daytona.io/) as a runtime provider.
 ---

-
 ## Step 1: Retrieve Your Daytona API Key
 1. Visit the [Daytona Dashboard](https://app.daytona.io/dashboard/keys).
 2. Click **"Create Key"**.
--- a/docs/usage/runtimes/docker.mdx
+++ b/docs/usage/runtimes/docker.mdx
@@ -3,8 +3,6 @@ title: Docker Runtime
 description: This is the default Runtime that's used when you start OpenHands.
 ---

-This is the default Runtime that's used when you start OpenHands.
-
 ## Image
 The `SANDBOX_RUNTIME_CONTAINER_IMAGE` from nikolaik is a pre-built runtime image
 that contains our Runtime server, as well as some basic utilities for Python and NodeJS.
@@ -128,3 +126,7 @@ docker network create openhands-network
 docker run # ... \
    --network openhands-network \
 ```
+
+<Note>
+**Docker Desktop Required**: Network isolation features, including custom networks and `host.docker.internal` routing, require Docker Desktop. Docker Engine alone does not support these features on localhost across custom networks. If you're using Docker Engine without Docker Desktop, network isolation may not work as expected.
+</Note>
--- a/docs/usage/runtimes/e2b.mdx
+++ b/docs/usage/runtimes/e2b.mdx
@@ -3,7 +3,8 @@ title: E2B Runtime
 description: E2B is an open-source secure cloud environment (sandbox) made for running AI-generated code and agents.
 ---

-[E2B](https://e2b.dev) offers [Python](https://pypi.org/project/e2b/) and [JS/TS](https://www.npmjs.com/package/e2b) SDK to spawn and control these sandboxes.
+[E2B](https://e2b.dev) offers [Python](https://pypi.org/project/e2b/) and [JS/TS](https://www.npmjs.com/package/e2b)
+SDK to spawn and control these sandboxes.

 ## Getting started

@@ -18,9 +19,13 @@ description: E2B is an open-source secure cloud environment (sandbox) made for r
    Full CLI API is [here](https://e2b.dev/docs/cli/installation).

 ## OpenHands sandbox
-You can use the E2B CLI to create a custom sandbox with a Dockerfile. Read the full guide [here](https://e2b.dev/docs/guide/custom-sandbox). The premade OpenHands sandbox for E2B is set up in the `containers` directory. and it's called `openhands`.
+
+You can use the E2B CLI to create a custom sandbox with a Dockerfile. Read the full guide
+[here](https://e2b.dev/docs/guide/custom-sandbox). The premade OpenHands sandbox for E2B is set up in the `containers`
+directory. and it's called `openhands`.

 ## Debugging
+
 You can connect to a running E2B sandbox with E2B CLI in your terminal.

 - List all running sandboxes (based on your API key)
@@ -34,5 +39,6 @@ You can connect to a running E2B sandbox with E2B CLI in your terminal.
    ```

 ## Links
+
 - [E2B Docs](https://e2b.dev/docs)
 - [E2B GitHub](https://github.com/e2b-dev/e2b)
--- a/docs/usage/runtimes/local.mdx
+++ b/docs/usage/runtimes/local.mdx
@@ -1,6 +1,8 @@
 ---
 title: Local Runtime
-description: The Local Runtime allows the OpenHands agent to execute actions directly on your local machine without using Docker. This runtime is primarily intended for controlled environments like CI pipelines or testing scenarios where Docker is not available.
+description: The Local Runtime allows the OpenHands agent to execute actions directly on your local machine without
+  using Docker. This runtime is primarily intended for controlled environments like CI pipelines or testing scenarios
+  where Docker is not available.
 ---

 <Warning>
@@ -15,7 +17,7 @@ Before using the Local Runtime, ensure that:
 1. You can run OpenHands using the [Development workflow](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).
 2. For Linux and Mac, tmux is available on your system.
 3. For Windows, PowerShell is available on your system.
-    - Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime. 
+    - Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime.

 ## Configuration

--- a/docs/usage/runtimes/overview.mdx
+++ b/docs/usage/runtimes/overview.mdx
@@ -9,8 +9,6 @@ commands.
 By default, OpenHands uses a [Docker-based runtime](/usage/runtimes/docker), running on your local computer.
 This means you only have to pay for the LLM you're using, and your code is only ever sent to the LLM.

-We also support other runtimes, which are typically managed by third-parties.
-
 Additionally, we provide a [Local Runtime](/usage/runtimes/local) that runs directly on your machine without Docker,
 which can be useful in controlled environments like CI pipelines.

@@ -21,6 +19,18 @@ OpenHands supports several different runtime environments:
 - [Docker Runtime](/usage/runtimes/docker) - The default runtime that uses Docker containers for isolation (recommended for most users).
 - [OpenHands Remote Runtime](/usage/runtimes/remote) - Cloud-based runtime for parallel execution (beta).
 - [Local Runtime](/usage/runtimes/local) - Direct execution on your local machine without Docker.
- And more third-party runtimes:
-  - [Modal Runtime](/usage/runtimes/modal) - Runtime provided by our partners at Modal.
-  - [Daytona Runtime](/usage/runtimes/daytona) - Runtime provided by Daytona.
+
+### Third-Party Runtimes
+
+The following third-party runtimes are available when you install the `third_party_runtimes` extra:
+
+```bash
+pip install openhands-ai[third_party_runtimes]
+```
+
+- [E2B Runtime](/usage/runtimes/e2b) - Open source runtime using E2B sandboxes.
+- [Modal Runtime](/usage/runtimes/modal) - Serverless runtime using Modal infrastructure.
+- [Runloop Runtime](/usage/runtimes/runloop) - Cloud runtime using Runloop infrastructure.
+- [Daytona Runtime](/usage/runtimes/daytona) - Development environment runtime using Daytona.
+
+**Note**: These third-party runtimes are supported by their respective developers, not by the OpenHands team. For issues specific to these runtimes, please refer to their documentation or contact their support teams.
--- a/docs/usage/runtimes/remote.mdx
+++ b/docs/usage/runtimes/remote.mdx
@@ -1,7 +1,11 @@
 ---
 title: Remote Runtime
-description: This runtime is specifically designed for agent evaluation purposes only through the [OpenHands evaluation harness](https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation). It should not be used to launch production OpenHands applications.
+description: This runtime is specifically designed for agent evaluation purposes only through the
+  [OpenHands evaluation harness](https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation). It should not be
+  used to launch production OpenHands applications.
 ---

-OpenHands Remote Runtime is currently in beta (read [here](https://runtime.all-hands.dev/) for more details), it allows you to launch runtimes
-in parallel in the cloud. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
+OpenHands Remote Runtime is currently in beta (read [here](https://runtime.all-hands.dev/) for more details),
+it allows you to launch runtimes in parallel in the cloud. Fill out
+[this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to
+apply if you want to try this out!
--- a/docs/usage/runtimes/runloop.mdx
+++ b/docs/usage/runtimes/runloop.mdx
@@ -1,6 +1,7 @@
 ---
 title: Runloop Runtime
-description: Runloop provides a fast, secure and scalable AI sandbox (Devbox). Check out the [runloop docs](https://docs.runloop.ai/overview/what-is-runloop) for more detail.
+description: Runloop provides a fast, secure and scalable AI sandbox (Devbox). Check out the
+  [runloop docs](https://docs.runloop.ai/overview/what-is-runloop) for more detail.
 ---

 ## Access
--- a/docs/usage/search-engine-setup.mdx
+++ b/docs/usage/search-engine-setup.mdx
@@ -1,6 +1,6 @@
 ---
 title: Search Engine Setup
-description: Configure OpenHands to use Tavily as a search engine
+description: Configure OpenHands to use Tavily as a search engine.
 ---

 ## Setting Up Search Engine in OpenHands
@@ -11,10 +11,10 @@ OpenHands can be configured to use [Tavily](https://tavily.com/) as a search eng

 To use the search functionality in OpenHands, you'll need to obtain a Tavily API key:

-1. Visit [Tavily's website](https://tavily.com/) and sign up for an account
-2. Navigate to the API section in your dashboard
-3. Generate a new API key
-4. Copy the API key (it should start with `tvly-`)
+1. Visit [Tavily's website](https://tavily.com/) and sign up for an account.
+2. Navigate to the API section in your dashboard.
+3. Generate a new API key.
+4. Copy the API key (it should start with `tvly-`).

 ### Configuring Search in OpenHands

@@ -22,13 +22,12 @@ Once you have your Tavily API key, you can configure OpenHands to use it:

 #### In the OpenHands UI

-1. Open OpenHands and navigate to the Settings page by clicking the gear icon
-2. In the LLM settings tab, locate the "Search API Key (Tavily)" field
-3. Enter your Tavily API key (starting with `tvly-`)
-4. Click "Save" to apply the changes
+1. Open OpenHands and navigate to the Settings page.
+2. Under the `LLM` tab, enter your Tavily API key (starting with `tvly-`) in the `Search API Key (Tavily)` field.
+3. Click `Save` to apply the changes.

 <Note>
-The search API key field is optional. If you don't provide a key, the search functionality will not be available to the agent.
+  The search API key field is optional. If you don't provide a key, the search functionality will not be available to the agent.
 </Note>

 #### Using Configuration Files
@@ -45,22 +44,23 @@ search_api_key = "tvly-your-api-key-here"

 When the search engine is configured:

-1. The agent can decide to search the web when it needs external information
-2. Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map).
-3. Results are returned and incorporated into the agent's context
-4. The agent can use this information to provide more accurate and up-to-date responses
+- The agent can decide to search the web when it needs external information.
+- Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which
+  includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map).
+- Results are returned and incorporated into the agent's context.
+- The agent can use this information to provide more accurate and up-to-date responses.

 ### Limitations

- Search results depend on Tavily's coverage and freshness
- Usage may be subject to Tavily's rate limits and pricing tiers
- The agent will only search when it determines that external information is needed
+- Search results depend on Tavily's coverage and freshness.
+- Usage may be subject to Tavily's rate limits and pricing tiers.
+- The agent will only search when it determines that external information is needed.

 ### Troubleshooting

 If you encounter issues with the search functionality:

- Verify that your API key is correct and active
- Check that your API key starts with `tvly-`
- Ensure you have an active internet connection
- Check Tavily's status page for any service disruptions
+- Verify that your API key is correct and active.
+- Check that your API key starts with `tvly-`.
+- Ensure you have an active internet connection.
+- Check Tavily's status page for any service disruptions.
--- a/docs/usage/troubleshooting/troubleshooting.mdx
+++ b/docs/usage/troubleshooting/troubleshooting.mdx
@@ -31,9 +31,9 @@ On initial prompt, an error is seen with `Permission Denied` or `PermissionError

 **Resolution**

-* Check if the `~/.openhands-state` is owned by `root`. If so, you can:
-  * Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands-state`.
-  * or update permissions on the directory: `sudo chmod 777 ~/.openhands-state`
+* Check if the `~/.openhands` is owned by `root`. If so, you can:
+  * Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands`.
+  * or update permissions on the directory: `sudo chmod 777 ~/.openhands`
  * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings.
 * If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running
  OpenHands.
@@ -56,13 +56,16 @@ To fix this:
       -e SANDBOX_VSCODE_PORT=41234 \
       -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:latest \
       -v /var/run/docker.sock:/var/run/docker.sock \
-       -v ~/.openhands-state:/.openhands-state \
+       -v ~/.openhands:/.openhands \
       -p 3000:3000 \
       -p 41234:41234 \
       --add-host host.docker.internal:host-gateway \
       --name openhands-app \
       docker.all-hands.dev/all-hands-ai/openhands:latest
   ```
+
+   > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
+
 2. Make sure to expose the same port with `-p 41234:41234` in your Docker command.
 3. If running with the development workflow, you can set this in your `config.toml` file:
   ```toml
--- a/docs/usage/windows-without-wsl.mdx
+++ b/docs/usage/windows-without-wsl.mdx
@@ -133,13 +133,66 @@ This guide provides step-by-step instructions for running OpenHands on a Windows

   > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001`

+## Installing and Running the CLI
+
+To install and run the OpenHands CLI on Windows without WSL, follow these steps:
+
+### 1. Install uv (Python Package Manager)
+
+Open PowerShell as Administrator and run:
+
+```powershell
+powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
+```
+
+### 2. Install .NET SDK (Required)
+
+The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime:
+
+```powershell
+winget install Microsoft.DotNet.SDK.8
+```
+
+Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download).
+
+After installation, restart your PowerShell session to ensure the environment variables are updated.
+
+### 3. Install and Run OpenHands
+
+After installing the prerequisites, you can install and run OpenHands with:
+
+```powershell
+uvx --python 3.12 --from openhands-ai openhands
+```
+
+### Troubleshooting CLI Issues
+
+#### CoreCLR Error
+
+If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this:
+
+1. Install the .NET SDK as described in step 2 above
+2. Verify that your system PATH includes the .NET SDK directories
+3. Restart your PowerShell session completely after installing the .NET SDK
+4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell
+
+To verify your .NET installation, run:
+
+```powershell
+dotnet --info
+```
+
+This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH.
+
+If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download).
+
 ## Limitations on Windows

 When running OpenHands on Windows without WSL or Docker, be aware of the following limitations:

 1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows.

-2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. If .NET Core is not available, OpenHands will automatically fall back to a more limited PowerShell implementation with reduced functionality.
+2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed.

 3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS.

--- a/evaluation/benchmarks/browsing_delegation/run_infer.py
+++ b/evaluation/benchmarks/browsing_delegation/run_infer.py
@@ -144,7 +144,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False

    if llm_config is None:
--- a/evaluation/benchmarks/gaia/.gitignore
+++ b/evaluation/benchmarks/gaia/.gitignore
@@ -0,0 +1 @@
+data/
--- a/evaluation/benchmarks/gaia/README.md
+++ b/evaluation/benchmarks/gaia/README.md
@@ -6,6 +6,13 @@ This folder contains evaluation harness for evaluating agents on the [GAIA bench

 Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.

+To enable the Tavily MCP Server, you can add the Tavily API key under the `core` section of your `config.toml` file, like below:
+
+```toml
+[core]
+search_api_key = "tvly-******"
+```
+
 ## Run the evaluation

 We are using the GAIA dataset hosted on [Hugging Face](https://huggingface.co/datasets/gaia-benchmark/GAIA).
--- a/evaluation/benchmarks/gaia/run_infer.py
+++ b/evaluation/benchmarks/gaia/run_infer.py
@@ -1,13 +1,22 @@
 import asyncio
+import copy
 import functools
 import os
 import re
+import shutil
+import zipfile

 import huggingface_hub
 import pandas as pd
 from datasets import load_dataset
+from PIL import Image
+from pydantic import SecretStr

 from evaluation.benchmarks.gaia.scorer import question_scorer
+from evaluation.benchmarks.gaia.utils import (
+    image_to_jpg_base64_url,
+    image_to_png_base64_url,
+)
 from evaluation.utils.shared import (
    EvalMetadata,
    EvalOutput,
@@ -24,6 +33,7 @@ from openhands.core.config import (
    OpenHandsConfig,
    get_llm_config_arg,
    get_parser,
+    load_from_toml,
 )
 from openhands.core.config.utils import get_agent_config_arg
 from openhands.core.logger import openhands_logger as logger
@@ -41,7 +51,7 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
 }

 AGENT_CLS_TO_INST_SUFFIX = {
-    'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
+    'CodeActAgent': 'When you think you have solved the question, please use the finish tool and include your final answer in the message parameter of the finish tool. Your final answer MUST be encapsulated within <solution> and </solution>.\n'
 }


@@ -49,7 +59,7 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    sandbox_config = get_default_sandbox_config_for_eval()
-    sandbox_config.base_container_image = 'python:3.12-bookworm'
+    sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
    config = OpenHandsConfig(
        default_agent=metadata.agent_class,
        run_as_openhands=False,
@@ -67,6 +77,11 @@ def get_config(
        logger.info('Agent config not provided, using default settings')
        agent_config = config.get_agent_config(metadata.agent_class)
        agent_config.enable_prompt_extensions = False
+
+    config_copy = copy.deepcopy(config)
+    load_from_toml(config_copy)
+    if config_copy.search_api_key:
+        config.search_api_key = SecretStr(config_copy.search_api_key)
    return config


@@ -89,27 +104,44 @@ def initialize_runtime(
    if instance['file_name'] != '':
        # if this question comes with a file, we need to save it to the workspace
        assert metadata.data_split is not None
+        extension_name = instance['file_name'].split('.')[-1]
        src_file = os.path.join(
            DATASET_CACHE_DIR, '2023', metadata.data_split, instance['file_name']
        )
        assert os.path.exists(src_file)
-        dest_file = os.path.join('/workspace', instance['file_name'])
-        runtime.copy_to(src_file, dest_file)
+        if extension_name == 'zip':
+            temp_dir = os.path.join(
+                DATASET_CACHE_DIR, '2023', metadata.data_split, 'tmp_file'
+            )
+            os.makedirs(temp_dir, exist_ok=True)
+            with zipfile.ZipFile(src_file, 'r') as zip_ref:
+                zip_ref.extractall(temp_dir)
+            for root, dirs, files in os.walk(temp_dir):
+                for file in files:
+                    dest_file = '/workspace'
+                    runtime.copy_to(os.path.join(root, file), dest_file)
+            shutil.rmtree(temp_dir)
+        elif extension_name not in ['jpg', 'png']:
+            dest_file = '/workspace'
+            runtime.copy_to(src_file, dest_file)

-        # rename to file.extension_name
-        extension_name = instance['file_name'].split('.')[-1]
-        action = CmdRunAction(
-            command=f'mv /workspace/{instance["file_name"]} /workspace/file.{extension_name}'
-        )
-        logger.info(action, extra={'msg_type': 'ACTION'})
-        obs = runtime.run_action(action)
-        assert obs.exit_code == 0
+            # rename to file.extension_name
+            action = CmdRunAction(
+                command=f'mv /workspace/{instance["file_name"]} /workspace/file.{extension_name}'
+            )
+            logger.info(action, extra={'msg_type': 'ACTION'})
+            obs = runtime.run_action(action)
+            assert obs.exit_code == 0

    action = CmdRunAction(command='cd /workspace')
    logger.info(action, extra={'msg_type': 'ACTION'})
    obs = runtime.run_action(action)
    assert obs.exit_code == 0

+    action = CmdRunAction(
+        command='apt-get update && apt-get install -y ffmpeg && apt-get install -y ffprobe'
+    )
+    runtime.run_action(action)
    logger.info(f'{"-" * 50} END Runtime Initialization Fn {"-" * 50}')


@@ -134,16 +166,49 @@ def process_instance(
        dest_file = None

    # Prepare instruction
-    instruction = f'{instance["Question"]}\n'
+    instruction = """You have one question to answer. It is paramount that you provide a correct answer.
+Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
+You must make sure you find the correct answer! You MUST strictly follow the task-specific formatting instructions for your final answer.
+Here is the task:
+{task_question}
+""".format(
+        task_question=instance['Question'],
+    )
    logger.info(f'Instruction: {instruction}')
+    image_urls = []
    if dest_file:
-        instruction += f'\n\nThe mentioned file is provided in the workspace at: {dest_file.split("/")[-1]}'
+        if extension_name not in ['jpg', 'png', 'zip']:
+            instruction += f'To solve this task you will have to use the attached file provided in the workspace at location: {dest_file}\n\n'
+        elif extension_name == 'zip':
+            filenames = []
+            src_file = os.path.join(
+                DATASET_CACHE_DIR, '2023', metadata.data_split, instance['file_name']
+            )
+            with zipfile.ZipFile(src_file, 'r') as zip_ref:
+                filenames = zip_ref.namelist()

-    instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
-    instruction += 'Please encapsulate your final answer (answer ONLY) within <solution> and </solution>.\n'
+            filenames = [f'/workspace/{file}' for file in filenames]
+            filenames = ', '.join(filenames)
+            instruction += f'To solve this task you will have to use the attached files provided in the workspace at locations: {filenames}\n\n'
+        else:  # Image files: jpg, png
+            src_file = os.path.join(
+                DATASET_CACHE_DIR, '2023', metadata.data_split, instance['file_name']
+            )
+            instruction += 'Image: To solve this task you will have to use the image shown below.\n\n'
+            image = Image.open(src_file)
+            if extension_name == 'jpg':
+                image_urls.append(image_to_jpg_base64_url(image))
+            else:
+                image_urls.append(image_to_png_base64_url(image))
+
+    instruction += """IMPORTANT: When seeking information from a website, REFRAIN from arbitrary URL navigation. You should utilize the designated search engine tool with precise keywords to obtain relevant URLs or use the specific website's search interface. DO NOT navigate directly to specific URLs as they may not exist.\n\nFor example: if you want to search for a research paper on Arxiv, either use the search engine tool with specific keywords or navigate to arxiv.org and then use its interface.\n"""
+    instruction += 'IMPORTANT: You should NEVER ask for Human Help.\n'
+    instruction += 'IMPORTANT: Please encapsulate your final answer (answer ONLY) within <solution> and </solution>. Your answer will be evaluated using string matching approaches so it important that you STRICTLY adhere to the output formatting instructions specified in the task (e.g., alphabetization, sequencing, units, rounding, decimal places, etc.)\n'
    instruction += (
        'For example: The answer to the question is <solution> 42 </solution>.\n'
    )
+    instruction += "IMPORTANT: Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, express it numerically (i.e., with digits rather than words), do not use commas, and do not include units such as $ or percent signs unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities). If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.\n"
+
    # NOTE: You can actually set slightly different instruction for different agents
    instruction += AGENT_CLS_TO_INST_SUFFIX.get(metadata.agent_class, '')
    logger.info(f'Instruction:\n{instruction}', extra={'msg_type': 'OBSERVATION'})
@@ -156,7 +221,9 @@ def process_instance(
    state: State | None = asyncio.run(
        run_controller(
            config=config,
-            initial_user_action=MessageAction(content=instruction),
+            initial_user_action=MessageAction(
+                content=instruction, image_urls=image_urls
+            ),
            runtime=runtime,
            fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
                metadata.agent_class
@@ -175,7 +242,7 @@ def process_instance(
    for event in reversed(state.history):
        if event.source == 'agent':
            if isinstance(event, AgentFinishAction):
-                model_answer_raw = event.thought
+                model_answer_raw = event.final_thought
                break
            elif isinstance(event, CmdRunAction):
                model_answer_raw = event.thought
@@ -222,6 +289,7 @@ def process_instance(
        error=state.last_error if state and state.last_error else None,
        test_result=test_result,
    )
+    runtime.close()
    return output


@@ -253,6 +321,8 @@ if __name__ == '__main__':
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')

+    toml_config = OpenHandsConfig()
+    load_from_toml(toml_config)
    metadata = make_metadata(
        llm_config=llm_config,
        dataset_name='gaia',
@@ -261,7 +331,10 @@ if __name__ == '__main__':
        eval_note=args.eval_note,
        eval_output_dir=args.eval_output_dir,
        data_split=args.data_split,
-        details={'gaia-level': args.level},
+        details={
+            'gaia-level': args.level,
+            'mcp-servers': ['tavily'] if toml_config.search_api_key else [],
+        },
        agent_config=agent_config,
    )

--- a/evaluation/benchmarks/gaia/scripts/run_infer.sh
+++ b/evaluation/benchmarks/gaia/scripts/run_infer.sh
@@ -39,7 +39,7 @@ echo "LEVELS: $LEVELS"
 COMMAND="poetry run python ./evaluation/benchmarks/gaia/run_infer.py \
  --agent-cls $AGENT \
  --llm-config $MODEL_CONFIG \
-  --max-iterations 30 \
+  --max-iterations 60 \
  --level $LEVELS \
  --data-split validation \
  --eval-num-workers $NUM_WORKERS \
--- a/evaluation/benchmarks/gaia/utils.py
+++ b/evaluation/benchmarks/gaia/utils.py
@@ -0,0 +1,43 @@
+import base64
+import io
+
+import numpy as np
+from PIL import Image
+
+
+def image_to_png_base64_url(
+    image: np.ndarray | Image.Image, add_data_prefix: bool = True
+):
+    """Convert a numpy array to a base64 encoded png image url."""
+    if isinstance(image, np.ndarray):
+        image = Image.fromarray(image)
+    if image.mode in ('RGBA', 'LA'):
+        image = image.convert('RGB')
+    buffered = io.BytesIO()
+    image.save(buffered, format='PNG')
+
+    image_base64 = base64.b64encode(buffered.getvalue()).decode()
+    return (
+        f'data:image/png;base64,{image_base64}'
+        if add_data_prefix
+        else f'{image_base64}'
+    )
+
+
+def image_to_jpg_base64_url(
+    image: np.ndarray | Image.Image, add_data_prefix: bool = True
+):
+    """Convert a numpy array to a base64 encoded jpeg image url."""
+    if isinstance(image, np.ndarray):
+        image = Image.fromarray(image)
+    if image.mode in ('RGBA', 'LA'):
+        image = image.convert('RGB')
+    buffered = io.BytesIO()
+    image.save(buffered, format='JPEG')
+
+    image_base64 = base64.b64encode(buffered.getvalue()).decode()
+    return (
+        f'data:image/jpeg;base64,{image_base64}'
+        if add_data_prefix
+        else f'{image_base64}'
+    )
--- a/evaluation/benchmarks/miniwob/run_infer.py
+++ b/evaluation/benchmarks/miniwob/run_infer.py
@@ -223,7 +223,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
--- a/evaluation/benchmarks/swe_bench/README.md
+++ b/evaluation/benchmarks/swe_bench/README.md
@@ -2,6 +2,8 @@

 This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).

+**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
+
 **UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**

 **UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
--- a/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
+++ b/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
@@ -0,0 +1,65 @@
+# SWE-bench-Live
+
+<p align="center">
+<a href="https://arxiv.org/abs/2505.23419">📃 Paper</a>
+•
+<a href="https://huggingface.co/SWE-bench-Live" >🤗 HuggingFace</a>
+•
+<a href="https://SWE-bench-Live.github.io" >📊 Leaderboard</a>
+</p>
+
+SWE-bench-Live is a live benchmark for issue resolving, providing a dataset that contains the latest issue tasks. This document explains how to run the evaluation of OpenHands on SWE-bench-Live.
+
+Since SWE-bench-Live has an almost identical setting to SWE-bench, you only need to simply change the dataset name to `SWE-bench-Live/SWE-bench-Live`, the other parts are basically the same as running on SWE-bench.
+
+## Setting Up
+
+Set up the development environment and configure your LLM provider by following the [README](README.md).
+
+## Running Inference
+
+Use the same script, but change the dataset name to `SWE-bench-Live` and select the split (either `lite` or `full`). The lite split contains 300 instances from the past six months, while the full split includes 1,319 instances created after 2024.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
+```
+
+In the original SWE-bench-Live paper, max_iterations is set to 100.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.your_llm HEAD CodeActAgent 300 100 3 SWE-bench-Live/SWE-bench-Live lite
+```
+
+## Evaluating Results
+
+After OpenHands generates patch results for each issue, we evaluate the results using the [SWE-bench-Live evaluation harness](https://github.com/microsoft/SWE-bench-Live).
+
+Convert to the format of predictions for SWE benchmarks:
+
+```shell
+# You can find output.jsonl in evaluation/evaluation_outputs
+python evaluation/benchmarks/swe_bench/scripts/live/convert.py --output_jsonl [path/to/evaluation/output.jsonl] > preds.jsonl
+```
+
+Please refer to the original [SWE-bench-Live repository](https://github.com/microsoft/SWE-bench-Live) to set up the evaluation harness and use the provided scripts to generate the evaluation report:
+
+```shell
+python -m swebench.harness.run_evaluation \
+    --dataset_name SWE-bench-Live/SWE-bench-Live \
+    --split lite \
+    --namespace starryzhang \
+    --predictions_path preds.jsonl \
+    --max_workers 10 \
+    --run_id openhands
+```
+
+## Citation
+
+```bibtex
+@article{zhang2025swebenchgoeslive,
+  title={SWE-bench Goes Live!},
+  author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang},
+  journal={arXiv preprint arXiv:2505.23419},
+  year={2025}
+}
+```
--- a/evaluation/benchmarks/swe_bench/live_utils.py
+++ b/evaluation/benchmarks/swe_bench/live_utils.py
@@ -0,0 +1,80 @@
+from typing import Any
+
+import pandas as pd
+
+from evaluation.utils.shared import assert_and_raise
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action import CmdRunAction
+from openhands.events.observation import (
+    CmdOutputObservation,
+    ErrorObservation,
+)
+from openhands.runtime.base import Runtime
+from openhands.utils.shutdown_listener import sleep_if_should_continue
+
+
+def complete_runtime(
+    runtime: Runtime,
+    instance: pd.Series,
+) -> dict[str, Any]:
+    """Complete the runtime and export the git patch for SWE-bench-Live."""
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Completion Fn')
+    logger.info('-' * 30)
+    obs: CmdOutputObservation
+    workspace_dir_name = instance.instance_id
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action)
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+    action = CmdRunAction(command='git config --global core.pager ""')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git config --global core.pager "": {str(obs)}',
+    )
+    action = CmdRunAction(command='git add -A')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git add -A: {str(obs)}',
+    )
+    n_retries = 0
+    git_patch = None
+    while n_retries < 5:
+        action = CmdRunAction(
+            command=f'git diff --no-color --cached {instance["base_commit"]}',
+        )
+        action.set_hard_timeout(100 + 10 * n_retries)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+        n_retries += 1
+        if isinstance(obs, CmdOutputObservation):
+            if obs.exit_code == 0:
+                git_patch = obs.content.strip()
+                break
+            else:
+                logger.info('Failed to get git diff, retrying...')
+                sleep_if_should_continue(10)
+        elif isinstance(obs, ErrorObservation):
+            logger.error(f'Error occurred: {obs.content}. Retrying...')
+            sleep_if_should_continue(10)
+        else:
+            assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+    assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
+    logger.info('-' * 30)
+    logger.info('END Runtime Completion Fn')
+    logger.info('-' * 30)
+    return {'git_patch': git_patch}
--- a/evaluation/benchmarks/swe_bench/loc_prompt.py
+++ b/evaluation/benchmarks/swe_bench/loc_prompt.py
@@ -1,4 +1,4 @@
-TASK_INSTRUECTION="""
+TASK_INSTRUECTION = """
 Given the following GitHub problem description, your objective is to localize the specific files, classes or functions, and lines of code that need modification or contain key information to resolve the issue.

 Follow these steps to localize the issue:
@@ -66,4 +66,4 @@ FAKE_USER_MSG_FOR_LOC = (
    'Verify that you have carefully analyzed the impact of the found locations on the repository, especially their dependencies. '
    'If you think you have solved the task, please send your final answer (including the former answer and reranking) to user through message and then call `finish` to finish.\n'
    'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n'
-)
+)
--- a/evaluation/benchmarks/swe_bench/prompts/swe_gpt4.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swe_gpt4.j2
@@ -27,7 +27,7 @@ You MUST plan extensively before each function call, and reflect extensively on
 5. Debug as needed. Use debugging techniques to isolate and resolve issues.
 6. Test frequently. Run tests after each change to verify correctness.
 7. Iterate until the root cause is fixed and all tests pass.
-8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, 
+8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness,
 and remember there are hidden tests that must also pass before the solution is truly complete.

 Refer to the detailed sections below for more information on each step.
--- a/evaluation/benchmarks/swe_bench/run_infer.py
+++ b/evaluation/benchmarks/swe_bench/run_infer.py
@@ -43,7 +43,7 @@ from openhands.core.config import (
    AgentConfig,
    OpenHandsConfig,
    get_llm_config_arg,
-    get_parser
+    get_parser,
 )
 from openhands.core.config.condenser_config import NoOpCondenserConfig
 from openhands.core.config.utils import get_condenser_config_arg
@@ -66,6 +66,26 @@ RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'tru
 ENABLE_LLM_EDITOR = os.environ.get('ENABLE_LLM_EDITOR', 'false').lower() == 'true'
 BenchMode = Literal['swe', 'swt', 'swt-ci']

+# Global variable to track dataset type
+DATASET_TYPE = 'SWE-bench'
+
+
+def set_dataset_type(dataset_name: str) -> str:
+    """Set dataset type based on dataset name."""
+    global DATASET_TYPE
+    name_lower = dataset_name.lower()
+
+    if 'swe-gym' in name_lower:
+        DATASET_TYPE = 'SWE-Gym'
+    elif 'swe-bench-live' in name_lower:
+        DATASET_TYPE = 'SWE-bench-Live'
+    elif 'multimodal' in name_lower:
+        DATASET_TYPE = 'Multimodal'
+    else:
+        DATASET_TYPE = 'SWE-bench'
+
+    logger.info(f'Dataset type set to: {DATASET_TYPE}')
+

 AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
    'CodeActAgent': codeact_user_response,
@@ -73,7 +93,10 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {


 def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
-    return f'{instance.repo}__{instance.version}'.replace('/', '__')
+    if DATASET_TYPE == 'SWE-bench-Live':
+        return instance.instance_id
+    else:
+        return f'{instance.repo}__{instance.version}'.replace('/', '__')


 def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
@@ -92,10 +115,12 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
        elif 'gpt-4.1' in llm_model:
            template_name = 'swe_gpt4.j2'
        else:
-            template_name = 'swe_default.j2'  # Default for 'swe' mode (regular swe-bench)
+            template_name = (
+                'swe_default.j2'  # Default for 'swe' mode (regular swe-bench)
+            )
    else:
        # Fallback or error handling if mode is unexpected
-        logger.error(f"Unexpected evaluation mode: {mode}. Falling back to default.")
+        logger.error(f'Unexpected evaluation mode: {mode}. Falling back to default.')
        template_name = 'swe_default.j2'

    # Set up Jinja2 environment
@@ -117,7 +142,7 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
            f'The following command can be used to run the tests: `{list(MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE[instance.repo].values())[0]}`. Make sure they fail in the expected way.\n'
        )
    else:
-        context['test_instructions'] = '' # Ensure it's defined for other modes
+        context['test_instructions'] = ''  # Ensure it's defined for other modes

    # Render the instruction
    instruction = template.render(context)
@@ -151,9 +176,13 @@ def get_instance_docker_image(
    if swebench_official_image:
        # Official SWE-Bench image
        # swebench/sweb.eval.x86_64.django_1776_django-11333:v1
-        docker_image_prefix = 'docker.io/swebench/'
+        # SWE-bench-Live uses the same naming convention as SWE-Bench
+        if DATASET_TYPE == 'SWE-bench-Live':
+            docker_image_prefix = 'docker.io/starryzhang/'
+        elif DATASET_TYPE == 'SWE-bench':
+            docker_image_prefix = 'docker.io/swebench/'
        repo, name = instance_id.split('__')
-        image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
+        image_name = f'{docker_image_prefix.rstrip("/")}/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
        logger.debug(f'Using official SWE-Bench image: {image_name}')
        return image_name
    else:
@@ -171,7 +200,8 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    # We use a different instance image for the each instance of swe-bench eval
-    use_swebench_official_image = 'swe-gym' not in metadata.dataset.lower()
+    use_swebench_official_image = DATASET_TYPE != 'SWE-Gym'
+
    base_container_image = get_instance_docker_image(
        instance['instance_id'],
        swebench_official_image=use_swebench_official_image,
@@ -288,8 +318,12 @@ def initialize_runtime(
        runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')

        # inject the instance swe entry
+        if DATASET_TYPE == 'SWE-bench-Live':
+            entry_script_path = 'instance_swe_entry_live.sh'
+        else:
+            entry_script_path = 'instance_swe_entry.sh'
        runtime.copy_to(
-            str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
+            str(os.path.join(script_dir, f'scripts/setup/{entry_script_path}')),
            '/swe_util/',
        )

@@ -309,14 +343,14 @@ def initialize_runtime(
        logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
    assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')

-    action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
+    action = CmdRunAction(command=f'source /swe_util/{entry_script_path}')
    action.set_hard_timeout(600)
    logger.info(action, extra={'msg_type': 'ACTION'})
    obs = runtime.run_action(action)
    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
    assert_and_raise(
        obs.exit_code == 0,
-        f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
+        f'Failed to source /swe_util/{entry_script_path}: {str(obs)}',
    )

    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
@@ -369,9 +403,9 @@ def initialize_runtime(
            obs = runtime.run_action(action)
            logger.info(obs, extra={'msg_type': 'OBSERVATION'})

-    if 'multimodal' not in metadata.dataset.lower():
+    if DATASET_TYPE != 'Multimodal' and DATASET_TYPE != 'SWE-bench-Live':
        # Only for non-multimodal datasets, we need to activate the testbed environment for Python
-        # SWE-Bench multimodal datasets are not using the testbed environment
+        # SWE-Bench multimodal datasets and SWE-bench-Live are not using the testbed environment
        action = CmdRunAction(command='which python')
        action.set_hard_timeout(600)
        logger.info(action, extra={'msg_type': 'ACTION'})
@@ -613,7 +647,13 @@ def process_instance(

        # ======= THIS IS SWE-Bench specific =======
        # Get git patch
-        return_val = complete_runtime(runtime, instance)
+        if DATASET_TYPE == 'SWE-bench-Live':
+            from evaluation.benchmarks.swe_bench.live_utils import (
+                complete_runtime as complete_runtime_fn,
+            )
+        else:
+            complete_runtime_fn = complete_runtime
+        return_val = complete_runtime_fn(runtime, instance)
        git_patch = return_val['git_patch']
        logger.info(
            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
@@ -718,11 +758,15 @@ if __name__ == '__main__':
    # NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
    # so we don't need to manage file uploading to OpenHands's repo
    dataset = load_dataset(args.dataset, split=args.split)
+
+    # Set the global dataset type based on dataset name
+    set_dataset_type(args.dataset)
+
    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
    logger.info(
        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
    )
-    if 'SWE-Gym' in args.dataset:
+    if DATASET_TYPE == 'SWE-Gym':
        with open(
            os.path.join(
                os.path.dirname(os.path.abspath(__file__)),
--- a/evaluation/benchmarks/swe_bench/run_localize.py
+++ b/evaluation/benchmarks/swe_bench/run_localize.py
@@ -192,6 +192,8 @@ def get_config(
        dataset_name=metadata.dataset,
        instance_id=instance['instance_id'],
    )
+    oh_aci_li_cmd = '/openhands/micromamba/bin/micromamba run -n openhands poetry run pip install openhands-aci[llama]'
+    sandbox_config.runtime_extra_deps = oh_aci_li_cmd
    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
    sandbox_config.runtime_startup_env_vars = {
        'REPO_PATH': f'/workspace/{workspace_dir_name}/',
@@ -216,6 +218,7 @@ def get_config(
        enable_jupyter=False,
        enable_browsing=RUN_WITH_BROWSING,
        enable_llm_editor=False,
+        enable_mcp=os.environ.get('ENABLE_MCP', False),
        condenser=metadata.condenser_config,
        enable_prompt_extensions=False,
    )
--- a/evaluation/benchmarks/swe_bench/scripts/live/convert.py
+++ b/evaluation/benchmarks/swe_bench/scripts/live/convert.py
@@ -0,0 +1,33 @@
+import argparse
+import json
+
+
+def main(output_jsonl: str):
+    with open(output_jsonl, 'r') as f:
+        for line in f:
+            try:
+                output = json.loads(line)
+                pred = {
+                    'instance_id': output['instance_id'],
+                    'model_name_or_path': output['metadata']['llm_config']['model'],
+                    'model_patch': output['test_result']['git_patch'],
+                }
+            except Exception as e:
+                print(
+                    f'Error while reading output of instance {output["instance_id"]}: {e}'
+                )
+
+            print(json.dumps(pred))
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--output_jsonl',
+        type=str,
+        required=True,
+        help='Path to the prediction file (.../outputs.jsonl)',
+    )
+    args = parser.parse_args()
+
+    main(args.output_jsonl)
--- a/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
+++ b/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+
+source ~/.bashrc
+SWEUTIL_DIR=/swe_util
+
+# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
+# SWE_INSTANCE_ID=django__django-11099
+if [ -z "$SWE_INSTANCE_ID" ]; then
+    echo "Error: SWE_INSTANCE_ID is not set." >&2
+    exit 1
+fi
+
+# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
+item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
+
+if [[ -z "$item" ]]; then
+  echo "No item found for the provided instance ID."
+  exit 1
+fi
+
+
+echo "WORKSPACE_NAME: $SWE_INSTANCE_ID"
+
+# Clear the workspace
+if [ -d /workspace ]; then
+    rm -rf /workspace/*
+else
+    mkdir /workspace
+fi
+# Copy repo to workspace
+if [ -d /workspace/$SWE_INSTANCE_ID ]; then
+    rm -rf /workspace/$SWE_INSTANCE_ID
+fi
+mkdir -p /workspace
+cp -r /testbed /workspace/$SWE_INSTANCE_ID
+
+# SWE-bench-Live does not use conda to manage Python
+# if [ -d /opt/miniconda3 ]; then
+#     . /opt/miniconda3/etc/profile.d/conda.sh
+#     conda activate testbed
+# fi
--- a/evaluation/benchmarks/versicode/README.md
+++ b/evaluation/benchmarks/versicode/README.md
@@ -0,0 +1,102 @@
+# VersiCode benchmark
+
+This project is used to evaluate the performance of the model on VersiCode. It includes:
+
+- data: the test data needed and the model outputs
+- inference_utils: inference scripts for ours tasks and models
+- metric: scripts for calculating various metric
+- output_processing: process the model output to facilitate the calculation of model metrics
+
+# Details
+
+1. **Prepare the environment**
+
+   ```shell
+   #create conda environment
+   conda create -n VersiCode python==3.12
+
+   #install requirements
+   pip install -r requirements.txt
+   ```
+
+2. **Experiment Data**
+
+    To obtain the experimental data, please visit the Hugging Face link: https://huggingface.co/datasets/AstoneNg/VersiCode.
+    Locate the files `VersiCode_block_completion.json` and `VersiCode_migration.json` under the `experiment_data` directory, and place them in the `/data/test_data directory` of this project.
+
+
+3. **Model inference**
+
+   ```shell
+   #cd inference_utils directory
+   cd inference_utils
+
+   #The script file starting with 'test' is used to test the local model
+   #The script file at the beginning of the API is used to test the API call model
+
+   #block level code completipn
+   #Modify the 10th and 12th lines of code to specify the base URL and model name
+   python api_test_block_completion.py
+   #Modify the 30th line of code to specify the local model path
+   python test_block.py
+
+   # code migration (migration order is 'old_to_new')
+   #Modify the 10th and 12th lines of code to specify the base URL and model name
+   python api_code_migration.py
+   #Modify the 30th line of code to specify the local model path
+   python test_migration.py
+   ```
+
+4. **Process output**
+   Process the output content of the model, remove redundant content, extract specified content for easy calculation of indicators.
+
+   ```shell
+   #cd output_processing
+   cd output_processing
+
+   #Extract content from<start> and <end>
+   #Modify the 8th and 9th lines of code to specify the model and task granularity
+   python clear_ans.py
+
+   #In the block completion task and migration task, cdc@k The calculation of indicators needs to be targeted at key rows,
+   #Modify lines 76 and 79 to specify the data path
+   python choose_core_line_from_block_versicode.py
+   python choose_core_line_from_migration_versicode.py
+   ```
+
+5. **Metric**
+   We have three metrics pass@k，em@k and cdc@k Due to our inability to automatically build a dynamic evaluation environment, we have not provided pass@k .
+
+   ```shell
+   #cd metric
+   cd metric
+
+   #Modify lines 137-140 in migration task (compute_migration_cdc_score.py) or 143-145 in block and line completion task (compute_versicode_cdc_score.py and compute_versicode_em_score.py) of the code to specify the data path and calculate the k-value of the metric
+   python compute_migration_cdc_score.py
+   python compute_versicode_cdc_score.py
+   python compute_versicode_em_score.py
+
+   #Notes
+   #We found limitations in the ISM@k and PM@k metrics for evaluating code generation, so they are used only as reference in our experiments.
+   #Modify lines 261-265 in block and line completion task of the code to specify the data path and calculate the k-value of the metric
+   python compute_ism_pm_score.py
+   ```
+
+# Citation
+
+```
+@article{versicode,
+  author={Tongtong Wu and Weigang Wu and Xingyu Wang and Kang Xu and Suyu Ma and Bo Jiang and Ping Yang and Zhenchang Xing and Yuan-Fang Li and Gholamreza Haffari},
+  title        = {VersiCode: Towards Version-controllable Code Generation},
+  journal      = {CoRR},
+  volume       = {abs/2406.07411},
+  year         = {2024},
+  url          = {https://arxiv.org/abs/2406.07411},
+}
+```
+
+**Github url**: https://github.com/wutong8023/VersiCode
+
+# Contributor
+
+[Tongtong Wu](https://scholar.google.com/citations?hl=zh-CN&user=u1Qp8lUAAAAJ&view_op=list_works&sortby=pubdate), [Weigang Wu](https://scholar.google.com/citations?hl=zh-CN&user=UneIZo8AAAAJ), [Xingyu Wang](https://scholar.google.com/citations?hl=zh-CN&user=wqPJcxcAAAAJ), [Kang Xu](https://scholar.google.com/citations?hl=zh-CN&user=N1UUDi0AAAAJ), [Suyu Ma](https://scholar.google.com/citations?hl=zh-CN&user=NJHR1ukAAAAJ), [Bo Jiang](https://wutong8023.site/VersiCode/), [Ping Yang](https://scholar.google.com/citations?view_op=list_works&hl=en&hl=en&user=hrogvxoAAAAJ), [Zhenchang Xing](https://scholar.google.com/citations?hl=zh-CN&user=0vCxuH4AAAAJ), [Yuan-Fang Li](https://scholar.google.com/citations?hl=zh-CN&user=wufXO1kAAAAJ), [Gholamreza Haffari](https://scholar.google.com/citations?hl=zh-CN&user=Perjx5EAAAAJ)
--- a/evaluation/benchmarks/versicode/inference_utils/api_code_migration.py
+++ b/evaluation/benchmarks/versicode/inference_utils/api_code_migration.py
@@ -0,0 +1,134 @@
+"""
+GPT performs line level generation prediction and truncates overly long tokens
+"""
+
+import json
+import os
+
+import tiktoken
+from openai import OpenAI
+
+max_tokens = 127000  # gpt3.5 is 16ktoken    gpt4o is 128k
+model_name = ''
+
+os.environ['OPENAI_API_KEY'] = ''
+client = OpenAI()
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+def predict(content, model_name):
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=[{'role': 'user', 'content': content}],
+        frequency_penalty=0.1,
+        max_tokens=128,
+        logit_bias=None,
+        logprobs=None,
+        n=6,
+        presence_penalty=0.0,
+        seed=None,
+        stop=None,
+        stream=False,
+        temperature=0.8,
+        top_p=0.95,
+    )
+    ans_list = []
+    choices_list = response.choices
+    for c in choices_list:
+        content = c.message.content
+        ans_list.append(content)
+    final_ans = str(ans_list)
+    return final_ans
+
+
+def bulid_prompt(description, old_version, old_code, new_version) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+    You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
+    including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
+    Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
+    Please note that you only need to return the refactored code and enclose it with <start> and <end>:
+    ###Functionality description of the code
+    {description}
+    ###Dependency and old version
+    {old_version}
+    ###Old version code
+    {old_code}
+    ###Dependency and new version
+    {new_version}
+    ###Refactored new code
+    """
+
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_migration.json'
+
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+data_dict = lodict
+data_list = data_dict
+
+
+for data in data_list:
+    if 'model_output' in data:
+        print(
+            f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
+        )
+        continue
+    try:
+        print(f'Predicting {data_list.index(data) + 1} ')
+        old_version = data['dependency'] + data['old_version']  # package == x.x.x
+        new_version = data['dependency'] + data['new_version']  # package == x.x.x
+        description = data['description']  # 功能描述
+        old_code = data['old_code']  # mask后的代码
+
+        instruction = bulid_prompt(description, old_version, old_code, new_version)
+        truncated_text = truncate_text(instruction, max_tokens)
+        prediction = predict(truncated_text, model_name)
+
+        data['model_output'] = prediction
+    except Exception as e:
+        print(f'error：{e}')
+        print('save current data')
+        save_folder_path = os.path.join(
+            '../data/result_data/code_migration', model_name
+        )
+        if not os.path.exists(save_folder_path):
+            os.makedirs(save_folder_path)
+        save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+        with open(save_json_path, 'w', encoding='utf-8') as fw:
+            json.dump(data_dict, fw, indent=4, ensure_ascii=False)
+        break
+
+
+save_folder_path = os.path.join('../data/result_data/code_migration', model_name)
+if not os.path.exists(save_folder_path):
+    os.makedirs(save_folder_path)
+save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+with open(save_json_path, 'w', encoding='utf-8') as fw:
+    json.dump(data_dict, fw, indent=4, ensure_ascii=False)
--- a/evaluation/benchmarks/versicode/inference_utils/api_test_block_completion.py
+++ b/evaluation/benchmarks/versicode/inference_utils/api_test_block_completion.py
@@ -0,0 +1,141 @@
+"""
+GPT performs line level generation prediction and truncates overly long tokens
+"""
+
+import json
+import os
+
+import tiktoken
+from openai import OpenAI
+
+max_tokens = 127000  # gpt3.5 is 16ktoken    gpt4o is 128k
+model_name = ''
+
+os.environ['OPENAI_API_KEY'] = ''
+client = OpenAI()
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+def predict(content, model_name):
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=[{'role': 'user', 'content': content}],
+        frequency_penalty=0.1,
+        max_tokens=128,
+        logit_bias=None,
+        logprobs=None,
+        n=6,
+        presence_penalty=0.0,
+        seed=None,
+        stop=None,
+        stream=False,
+        temperature=0.8,
+        top_p=0.95,
+    )
+    ans_list = []
+    choices_list = response.choices
+    for c in choices_list:
+        content = c.message.content
+        ans_list.append(content)
+    final_ans = str(ans_list)
+    return final_ans
+
+
+def bulid_prompt(version, description) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+            You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
+            You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
+            Please note that you only need to return the code that implements the function, and do not return any other content.
+            Please use <start> and <end> to enclose the generated code. Here is an example:
+            ###Function Description：
+            The function of this code is to print the results predicted by calling the model using vllm.
+            ###dependeny and version：
+            vllm==0.3.3
+            ###response:
+            <start>
+            for output in outputs:
+                prompt = output.prompt
+                generated_text = output.outputs[0].text
+                print("Prompt,Generated text")
+            <end>
+
+            ###Function Description：
+            {description}
+            ###dependeny and version：
+            {version}
+            ###response:
+
+
+        """
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_block_completion.json'
+
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+data_dict = lodict
+data_list = data_dict
+
+
+for data in data_list:
+    if 'model_output' in data:
+        print(
+            f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
+        )
+        continue
+    try:
+        print(f'Predicting {data_list.index(data) + 1} ')
+        version = data['dependency'] + data['version']  # package == x.x.x
+        description = data['description']  # func description
+
+        instruction = bulid_prompt(version, description)
+        truncated_text = truncate_text(instruction, max_tokens)
+        prediction = predict(truncated_text, model_name)
+
+        data['model_output'] = prediction
+    except Exception as e:
+        print(f'error：{e}')
+        print('save current data')
+        save_folder_path = os.path.join(
+            '../data/result_data/block_completion', model_name
+        )
+        if not os.path.exists(save_folder_path):
+            os.makedirs(save_folder_path)
+        save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+        with open(save_json_path, 'w', encoding='utf-8') as fw:
+            json.dump(data_dict, fw, indent=4, ensure_ascii=False)
+        break
+
+
+save_folder_path = os.path.join('../data/result_data/block_completion', model_name)
+if not os.path.exists(save_folder_path):
+    os.makedirs(save_folder_path)
+save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+with open(save_json_path, 'w', encoding='utf-8') as fw:
+    json.dump(data_dict, fw, indent=4, ensure_ascii=False)
--- a/evaluation/benchmarks/versicode/inference_utils/test_block.py
+++ b/evaluation/benchmarks/versicode/inference_utils/test_block.py
@@ -0,0 +1,129 @@
+"""
+block completion
+"""
+
+import copy
+import gc
+import json
+import os
+import time
+from multiprocessing import Process
+
+import tiktoken
+import torch
+from vllm import LLM, SamplingParams
+
+# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
+
+
+def run_inference(model_name, origin_data_list):
+    temp_data_list = copy.deepcopy(origin_data_list)
+    test_list = []
+    for data in temp_data_list:
+        version = data['dependency'] + data['version']  # package == x.x.x
+        description = data['description']  # func description
+
+        instruction = bulid_prompt(version, description)
+        test_list.append(instruction)
+
+    sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=64)
+    llm = LLM(
+        model=model_name,
+        tensor_parallel_size=4,
+        gpu_memory_utilization=0.9,
+        swap_space=20,
+    )
+
+    outputs = llm.generate(test_list, sampling_params)
+    for output in outputs:
+        requests_id = int(output.request_id)
+        temp_ans_list = []
+        output_list = output.outputs
+        for o in output_list:
+            text = o.text
+            temp_ans_list.append(text)
+
+        temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
+
+    save_folder_path = os.path.join(
+        '../data/result_data/block_completion', model_name.split('/')[-1]
+    )
+    if not os.path.exists(save_folder_path):
+        os.makedirs(save_folder_path)
+
+    save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+    with open(save_json_path, 'w', encoding='utf-8') as fw:
+        json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
+
+    gc.collect()
+    torch.cuda.empty_cache()
+
+
+def bulid_prompt(version, description) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+            You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
+            You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
+            Please note that you only need to return the code that implements the function, and do not return any other content.
+            Please use <start> and <end> to enclose the generated code. Here is an example:
+            ###Function Description：
+            The function of this code is to print the results predicted by calling the model using vllm.
+            ###dependeny and version：
+            vllm==0.3.3
+            ###response:
+            <start>
+            for output in outputs:
+                prompt = output.prompt
+                generated_text = output.outputs[0].text
+                print("Prompt,Generated text")
+            <end>
+
+            ###Function Description：
+            {description}
+            ###dependeny and version：
+            {version}
+            ###response:
+
+
+        """
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_block_completion.json'
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+
+origin_data_list = lodict
+
+for model_name in model_list:
+    process = Process(target=run_inference, args=(model_name, origin_data_list))
+    process.start()
+    process.join()
+    time.sleep(120)
--- a/evaluation/benchmarks/versicode/inference_utils/test_migration.py
+++ b/evaluation/benchmarks/versicode/inference_utils/test_migration.py
@@ -0,0 +1,122 @@
+"""
+code migration
+"""
+
+import copy
+import gc
+import json
+import os
+import time
+from multiprocessing import Process
+
+import tiktoken
+import torch
+from vllm import LLM, SamplingParams
+
+# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
+
+
+def run_inference(model_name, origin_data_list):
+    temp_data_list = copy.deepcopy(origin_data_list)
+    test_list = []
+    for data in temp_data_list:
+        old_version = data['dependency'] + data['old_version']  # package == x.x.x
+        new_version = data['dependency'] + data['new_version']  # package == x.x.x
+        description = data['description']  # 功能描述
+        old_code = data['old_code']  # mask后的代码
+
+        instruction = bulid_prompt(description, old_version, old_code, new_version)
+        test_list.append(instruction)
+
+    sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=512)
+    llm = LLM(
+        model=model_name,
+        tensor_parallel_size=4,
+        gpu_memory_utilization=0.6,
+        swap_space=40,
+    )
+
+    outputs = llm.generate(test_list, sampling_params)
+    for output in outputs:
+        requests_id = int(output.request_id)
+        temp_ans_list = []
+        output_list = output.outputs
+        for o in output_list:
+            text = o.text
+            temp_ans_list.append(text)
+
+        temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
+
+    save_folder_path = os.path.join(
+        '../data/result_data/code_migration', model_name.split('/')[-1]
+    )
+    if not os.path.exists(save_folder_path):
+        os.makedirs(save_folder_path)
+
+    save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+    with open(save_json_path, 'w', encoding='utf-8') as fw:
+        json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
+
+    gc.collect()
+    torch.cuda.empty_cache()
+
+
+def bulid_prompt(description, old_version, old_code, new_version) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+    You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
+    including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
+    Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
+    Please note that you only need to return the refactored code and enclose it with <start> and <end>:
+    ###Functionality description of the code
+    {description}
+    ###Dependency and old version
+    {old_version}
+    ###Old version code
+    {old_code}
+    ###Dependency and new version
+    {new_version}
+    ###Refactored new code
+    """
+
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_migration.json'
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+
+origin_data_list = lodict
+
+for model_name in model_list:
+    process = Process(target=run_inference, args=(model_name, origin_data_list))
+    process.start()
+    process.join()
+    time.sleep(120)
--- a/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
@@ -0,0 +1,356 @@
+"""
+评测block的预测能力
+1、判断是否包含正确的函数名
+2、判断是否合法
+3、计算ISM，和PM
+"""
+
+import io
+import json
+import math
+import os
+import re
+import tokenize
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def longest_common_prefix_between_lists_with_elements(list1, list2):
+    """
+    计算两个字符串列表中元素的最长前缀匹配长度
+    :param list1:
+    :param list2:
+    :return:
+    """
+    max_prefix_length = 0
+    max_prefix_elements = ()
+    for str1 in list1:
+        for str2 in list2:
+            prefix_length = 0
+            min_len = min(len(str1), len(str2))
+            for i in range(min_len):
+                if str1[i] == str2[i]:
+                    prefix_length += 1
+                else:
+                    break
+            if prefix_length > max_prefix_length:
+                max_prefix_length = prefix_length
+                max_prefix_elements = (str1, str2)
+    return max_prefix_length, max_prefix_elements
+
+
+def get_token(ans_code: str, output_code: str):
+    """
+    对代码进行词法分析，分解成标识符，返回两个标识符列表
+    :param ans_code:
+    :param output_code:
+    :return:
+    """
+    output_flag = True
+    ans_flag = True
+    try:
+        tokens_ans = tokenize.tokenize(io.BytesIO(ans_code.encode('utf-8')).readline)
+    except Exception:
+        tokens_ans = ans_code.splitlines()
+        ans_flag = False
+
+    try:
+        tokens_output = tokenize.tokenize(
+            io.BytesIO(output_code.encode('utf-8')).readline
+        )
+    except Exception:
+        tokens_output = output_code.splitlines()
+        output_flag = False
+
+    identifiers_ans = []
+    identifiers_output = []
+    if ans_flag:
+        try:
+            for token in tokens_ans:
+                if token.type == tokenize.NAME:
+                    identifiers_ans.append(token.string)
+        except Exception:
+            identifiers_ans = tokens_ans
+    else:
+        identifiers_ans = tokens_ans
+
+    if output_flag:
+        try:
+            for to in tokens_output:
+                if to.type == tokenize.NAME:
+                    identifiers_output.append(to.string)
+        except Exception:
+            identifiers_output = tokens_output
+    else:
+        identifiers_output = tokens_output
+
+    return identifiers_ans, identifiers_output
+
+
+def get_token_per_line(code: str):
+    """
+    对每一行代码进行词法分析，记录每一行的标识符
+    :param code: 代码字符串
+    :return: 每一行的标识符列表组成的列表
+    """
+    lines = code.split('\n')  # 将代码按行分割成列表
+    identifiers_per_line = []  # 用于存储每一行的标识符列表的列表
+
+    for line in lines:
+        tokens = tokenize.tokenize(io.BytesIO(line.encode('utf-8')).readline)
+        identifiers = []
+        try:
+            for token in tokens:
+                if token.type == tokenize.NAME:
+                    identifiers.append(token.string)
+        except Exception:
+            identifiers = line.split(' ')
+        identifiers_per_line.append(identifiers)
+
+    return identifiers_per_line
+
+
+def get_ISM(answer_code: str, model_output_list: list, answer_name: str) -> list:
+    """
+    计算ISM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if '```python' in code:
+            code = code.replace('```python', '')
+            code = code.replace('```', '')
+        if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
+            code
+        ):
+            score_list.append(0)
+            continue
+
+        # if answer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        identifiers_ans, identifiers_output = get_token(answer_code, code)
+        max_len, elements = longest_common_prefix_between_lists_with_elements(
+            identifiers_ans, identifiers_output
+        )
+        if max_len != 0:
+            base_element_len = max(len(elements[0]), len(elements[1]))
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+        # base_element_len = max(len(elements[0]), len(elements[1]))
+        # temp_score = max_len/base_element_len
+        # score_list.append(temp_score)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def get_ISM_without_verification(
+    answer_code: str, model_output_list: list, answer_name: str
+) -> list:
+    """
+    计算ISM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if answer_name not in code:
+            score_list.append(0)
+            continue
+
+        # if answer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        identifiers_ans, identifiers_output = get_token(answer_code, code)
+        max_len, elements = longest_common_prefix_between_lists_with_elements(
+            identifiers_ans, identifiers_output
+        )
+        if max_len != 0:
+            base_element_len = max(len(elements[0]), len(elements[1]))
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+        # base_element_len = max(len(elements[0]), len(elements[1]))
+        # temp_score = max_len/base_element_len
+        # score_list.append(temp_score)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def longest_common_prefix_with_lengths(list1, list2):
+    """
+    计算两个二维列表中每个子列表的最长前缀匹配长度，并记录拥有最长前缀匹配长度的两个子列表的长度
+    :param list1: 第一个二维列表
+    :param list2: 第二个二维列表
+    :return: 最长前缀匹配长度以及拥有最长前缀匹配长度的两个子列表的长度
+    """
+    max_length = 0
+    len_list1 = 0
+    len_list2 = 0
+    for i, sublist1 in enumerate(list1):
+        for j, sublist2 in enumerate(list2):
+            match_length = 0
+            min_length = min(len(sublist1), len(sublist2))
+            for k in range(min_length):
+                if sublist1[k] == sublist2[k]:
+                    match_length += 1
+                else:
+                    break
+            if match_length > max_length:
+                max_length = match_length
+                len_list1 = len(sublist1)
+                len_list2 = len(sublist2)
+    return max_length, len_list1, len_list2
+
+
+def get_PM(answer_code: str, model_output_list: list, answer_name: str) -> list:
+    """
+    计算PM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if '```python' in code:
+            code = code.replace('```python', '')
+            code = code.replace('```', '')
+        if not re.search(rf'\b{re.escape(answer_name)}\b', code) or not is_code_valid(
+            code
+        ):
+            # if answer_name not in code or is_code_valid(code) == False:
+            score_list.append(0)
+            continue
+
+        # if answer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        ans_list = get_token_per_line(answer_code)
+        output_token_list = get_token_per_line(code)
+        max_len, len1, len2 = longest_common_prefix_with_lengths(
+            ans_list, output_token_list
+        )
+        base_element_len = max(len1, len2)
+
+        if base_element_len != 0:
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def get_score(score_list: list, k):
+    """
+    计算score@n,k
+    :param score_list:
+    :param k:
+    :return:
+    """
+    n = len(score_list)
+    sum = 0
+    final = n - k + 1
+    for i in range(1, final + 1):
+        sum += math.comb(n - i, k - 1) * score_list[i - 1]
+
+    final_score = sum / math.comb(n, k)
+
+    return final_score
+
+
+k = 1
+task = 'block'  # block or line
+json_name = f'Versicode_{task}_completion.json'
+
+folder_path = f'../data/result_data/{task}_completion'
+model_list = os.listdir(folder_path)
+
+for model in model_list:
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_dict = lodict
+    data_list = data_dict
+    data_len = len(data_list)
+    sum_ISM = 0
+    sum_PM = 0
+
+    for data in data_list:
+        # model_output_list = eval(data['model_output'])
+        model_output_list = eval(data['model_output_clear'])[:1]
+        temp_list = []
+        for o in model_output_list:
+            temp_out = o.replace('```python', '')
+            temp_out = temp_out.replace('```', '')
+            temp_list.append(temp_out)
+        model_output_list = temp_list
+        answer_code = data['code']
+        answer_name = data['core_token']
+        #
+        # answer_code = data['new_code']  #code editing
+        # answer_name = data['new_name']    #code editing
+
+        # answer_code = data['old_code']  # code editing new to old
+        # answer_name = data['old_name']  # code editing new to old
+        #
+        ISM_score_list = get_ISM(answer_code, model_output_list, answer_name)
+        # ISM_score_without_verification_list = get_ISM_without_verification(answer_code, model_output_list, answer_name)     #新增
+        PM_score_list = get_PM(answer_code, model_output_list, answer_name)
+
+        # if not ISM_score_without_verification_list == ISM_score_list:#新增
+        #     for s in ISM_score_list:#新增
+        #         if s != ISM_score_without_verification_list[ISM_score_list.index(s)]:#新增
+        #             print('元数据如下')#新增
+        #             print(data)#新增
+        #             print('答案如下')#新增
+        #             print(model_output_list[ISM_score_list.index(s)])#新增
+
+        # flag = int(input('输入1继续，0退出'))#新增
+        # if flag == 1:
+        #     continue
+
+        ISM_score = get_score(ISM_score_list, k)
+        PM_score = get_score(PM_score_list, k)
+
+        sum_ISM += ISM_score
+        sum_PM += PM_score
+        # print(f"ISM分数：{ISM_score}")
+        # print(f"PM分数：{PM_score}")
+
+    print(f'{model}, {task} completion task, ISM@{k} score: {sum_ISM / data_len}')
+    print(f'{model}, {task} completion task, PM@{k} score: {sum_PM / data_len}')
+
+
+# def get_token(ans_code:str, output_code:str):
+#     """
+#     对代码进行词法分析，分解成标识符，返回两个标识符列表
+#     :param ans_code:
+#     :param output_code:
+#     :return:
+#     """
+#     tokens_ans = tokenize.tokenize(io.BytesIO(ans_code.encode('utf-8')).readline)
+#     tokens_output = tokenize.tokenize(io.BytesIO(output_code.encode('utf-8')).readline)
+#     identifiers_ans = []
+#     identifiers_output = []
+#     for token in tokens_ans:
+#         if token.type == tokenize.NAME:
+#             identifiers_ans.append(token.string)
+#
+#     for to in tokens_output:
+#         if to.type == tokenize.NAME:
+#             identifiers_output.append(to.string)
+#
+#     return identifiers_ans, identifiers_output
--- a/evaluation/benchmarks/versicode/metric/compute_migration_cdc_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_migration_cdc_score.py
@@ -0,0 +1,198 @@
+"""
+Calculate the cdc score for migration
+"""
+
+import json
+import math
+import os
+import re
+
+# warnings.filterwarnings("ignore", category=SyntaxWarning)
+
+
+def is_correct_parameter_count(function_name, correct_code, test_code):
+    """
+    判断参数数量是否一致
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 获取正确代码中的参数数量
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+        expected_count = len(correct_param_list)
+    else:
+        expected_count = 0  # 如果没有参数，期望数量为0
+
+    # 在需要判断的代码中查找函数调用
+    test_match = re.search(pattern, test_code)
+
+    if test_match:
+        test_params = test_match.group(1).strip()
+        test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+        return len(test_param_list) == expected_count  # 检查参数数量
+    else:
+        # 如果没有括号，检查函数名是否在字符串中
+        return expected_count == 0 and function_name in test_code
+
+
+def check_keyword_parameters(function_name, correct_code, test_code):
+    """
+    判断关键词参数赋值是否正确使用
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 正则表达式匹配正确代码中的函数调用
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+
+        # 检查待检测代码中的函数调用
+        test_match = re.search(pattern, test_code)
+
+        if test_match:
+            test_params = test_match.group(1).strip()
+            test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+
+            # 确保待检测的每个参数都以关键字参数形式赋值
+            for correct_param in correct_param_list:
+                if '=' in correct_param:  # 仅当正确代码中有关键词参数
+                    param_name = correct_param.split('=')[0].strip()
+                    if not any(
+                        param_name in test_param and '=' in test_param
+                        for test_param in test_param_list
+                    ):
+                        return False  # 如果对应参数不是关键词参数，则返回False
+
+            return True  # 所有关键字参数匹配
+
+    return False  # 如果没有匹配，返回False
+
+
+def with_correct(answer_code: str, model_output: str) -> bool:
+    """
+    当answer是with结构时，判断模型生成的是不是with结构
+    :param answer_code:
+    :param model_output:
+    :return:
+    """
+    # return True
+    if not answer_code.startswith('with') and not model_output.startswith('with'):
+        return True
+    elif answer_code.startswith('with') and model_output.startswith('with'):
+        return True
+    else:
+        return False
+
+
+def compute_block_score_k(
+    answer: str,
+    model_output: list,
+    k: int,
+    model_filled_code,
+    core_line_in_core_block,
+    core_line_in_output_clear,
+):
+    """
+    cdc需要满足五个条件，em只需要满足第一个条件
+    """
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if (
+            re.search(rf'\b{re.escape(answer)}\b', code)
+            and is_code_valid(model_filled_code[index])
+            and is_correct_parameter_count(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+            and with_correct(core_line_in_core_block, core_line_in_output_clear[index])
+            and check_keyword_parameters(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+        ):  # block
+            # if re.search(rf'\b{re.escape(answer)}\b', code):#block
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def compute_score_k(answer: str, model_output: list, k: int):
+    c = 0
+    n = len(model_output)
+    for output in model_output:
+        if '```python' in output:
+            output = output.replace('```python', '')
+            output = output.replace('```', '')
+        # if answer == output:
+
+        if re.search(rf'\b{re.escape(answer)}\b', output) and is_code_valid(output):
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+k = 1  # cdc@k
+json_name = 'VersiCode_migration.json'
+task = 'migration'
+folder_path = '../data/result_data/code_migration'
+
+model_list = os.listdir(folder_path)
+for model in model_list:
+    # if model != 'gpt-4o':
+    #     continue
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_list = lodict
+
+    score_list = []
+    for data in data_list:
+        answer = data['new_name']  # old -> new
+        model_output = data['model_output_clear']  # old -> new
+
+        model_filled_code = model_output
+        # core_line_in_core_block = data['core_line_in_new_core_block']# old -> new
+        core_line_in_core_block = data['core_line_in_code']  # old -> new
+        core_line_in_output_clear = data['core_line_in_output_clear']  # old -> new
+
+        score_list.append(
+            compute_block_score_k(
+                answer,
+                model_output,
+                k,
+                model_filled_code,
+                core_line_in_core_block,
+                core_line_in_output_clear,
+            )
+        )
+
+    final_score = sum(score_list) / len(score_list)
+    print(f'{model}, {task} task, cdc@{k} score: {final_score}')
--- a/Show More
+++ b/Show More