Fix unit tests for new CLI LLM arguments

- Update test_help_message to include new CLI arguments (--llm-model, --llm-base-url, --llm-api-key) - Update expected option count from 20 to 23 - Add tests for default and custom values of new CLI arguments - All arg parser tests now pass
Fix formatting issues from pre-commit hooks
2026-04-29 03:00:45 -04:00 · 2025-06-17 14:34:37 +00:00 · 2025-06-16 12:11:14 +00:00 · 2025-06-16 11:03:17 +00:00 · 2025-06-15 21:21:33 -04:00 · 2025-06-15 15:41:08 -04:00
205 changed files with 9833 additions and 4311 deletions
--- a/.devcontainer/devcontainer.json
+++ b/.devcontainer/devcontainer.json
@@ -12,5 +12,8 @@
 		"ghcr.io/devcontainers/features/node:1": {},
 	},
 	"postCreateCommand": ".devcontainer/setup.sh",
-	"runArgs": ["--network=host"],
+	"runArgs": ["--add-host=host.docker.internal:host-gateway"],
+	"containerEnv": {
+		"DOCKER_HOST_ADDR": "host.docker.internal"
+	},
 }
--- a/.github/workflows/lint-fix.yml
+++ b/.github/workflows/lint-fix.yml
@@ -74,7 +74,7 @@ jobs:
      - name: Fix python lint issues
        run: |
          # Run all pre-commit hooks and continue even if they modify files (exit code 1)
-          pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --files openhands/**/* evaluation/**/* tests/**/* || true
+          pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --all-files || true

      # Commit and push changes if any
      - name: Check for changes
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -53,7 +53,7 @@ jobs:
      - name: Install pre-commit
        run: pip install pre-commit==3.7.0
      - name: Run pre-commit hooks
-        run: pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
+        run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml

  # Check version consistency across documentation
  check-version-consistency:
--- a/.github/workflows/py-unit-tests.yml
+++ b/.github/workflows/py-unit-tests.yml
@@ -81,4 +81,3 @@ jobs:
        env:
          TEST_RUNTIME: local
          DEBUG: "1"
-
--- a/.openhands/microagents/repo.md
+++ b/.openhands/microagents/repo.md
@@ -2,6 +2,8 @@ This repository contains the code for OpenHands, an automated AI software engine
 (in the `openhands` directory) and React frontend (in the `frontend` directory).

 ## General Setup:
+To set up the entire repo, including frontend and backend, run `make build`.
+You don't need to do this unless the user asks you to, or if you're trying to run the entire application.

 IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.

@@ -19,91 +21,13 @@ then re-run the command to ensure it passes. Common issues include:
 - Trailing whitespace
 - Missing newlines at end of files

-## Testing and Debugging
-
-### Environment Setup for Testing
- Run `make build` to install all dependencies (only necessary for running tests):
-  ```bash
-  make build
-  ```
-  **IMPORTANT**: When using `execute_bash` to run `make build` or similar long-running commands, set the `timeout` parameter to a high value (e.g., 600 seconds):
-  ```
-  execute_bash(command="make build", timeout=600)
-  ```
-
-#### Docker Installation
-**NOTE: Docker installation is ONLY required for running runtime tests with the Docker runtime.**
-
- Install Docker on Debian-based systems:
-  ```bash
-  sudo apt-get update
-  sudo apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
-  curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
-  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
-  sudo apt-get update
-  sudo apt-get install -y docker-ce docker-ce-cli containerd.io
-  ```
- Start Docker daemon (in container environments without systemd):
-  ```bash
-  sudo dockerd > /tmp/docker.log 2>&1 & sleep 5
-  ```
- Verify Docker installation:
-  ```bash
-  sudo docker run hello-world
-  ```
-
-#### Development Environment Setup
- Before running `make run`, ensure netcat is installed:
-  ```bash
-  sudo apt-get install -y netcat-openbsd
-  ```
-
-### Unit Tests
- All unit tests are in `tests/unit/test_*.py`
- To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality
- Write all tests with pytest
-
-### Runtime Tests
- Runtime tests are in `tests/runtime/test_*.py`
- Run tests with different runtime implementations by setting the `TEST_RUNTIME` environment variable:
-  ```bash
-  # Use Docker runtime (default)
-  DEBUG=1 poetry run pytest -vvxss tests/runtime/test_bash.py
-  
-  # Use CLI runtime (more reliable in some environments)
-  DEBUG=1 TEST_RUNTIME=cli poetry run pytest -vvxss tests/runtime/test_bash.py
-  
-  # Run a specific test
-  DEBUG=1 TEST_RUNTIME=cli poetry run pytest -vvxss tests/runtime/test_bash.py::test_bash_server
-  ```
- **IMPORTANT**: Runtime tests can take a long time to run, especially when building Docker images. Set a high timeout value:
-  ```
-  execute_bash(command="DEBUG=1 poetry run pytest -vvxss tests/runtime/test_bash.py", timeout=600)
-  ```
- The `DEBUG=1` flag enables more verbose logging
- The `-vvxss` flags make the test output more verbose and stop after the first failure
-
-### Debugging Docker Issues
- Check Docker container status:
-  ```bash
-  sudo docker ps -a
-  ```
- View Docker logs:
-  ```bash
-  sudo docker logs <container_id>
-  ```
- Check Docker daemon logs:
-  ```bash
-  sudo cat /tmp/docker.log | tail -n 100
-  ```
- Check OpenHands logs:
-  ```bash
-  cat logs/openhands_*.log | grep -i error | tail -n 20
-  ```
-
 ## Repository Structure
 Backend:
 - Located in the `openhands` directory
+- Testing:
+  - All tests are in `tests/unit/test_*.py`
+  - To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality
+  - Write all tests with pytest

 Frontend:
 - Located in the `frontend` directory
@@ -120,19 +44,18 @@ Frontend:
  - Available variables: VITE_BACKEND_HOST, VITE_USE_TLS, VITE_INSECURE_SKIP_VERIFY, VITE_FRONTEND_PORT
 - Internationalization:
  - Generate i18n declaration file: `npm run make-i18n`
-
+- Data Fetching & Cache Management:
+  - We use TanStack Query (fka React Query) for data fetching and cache management
+  - Data Access Layer: API client methods are located in `frontend/src/api` and should never be called directly from UI components - they must always be wrapped with TanStack Query
+  - Custom hooks are located in `frontend/src/hooks/query/` and `frontend/src/hooks/mutation/`
+  - Query hooks should follow the pattern use[Resource] (e.g., `useConversationMicroagents`)
+  - Mutation hooks should follow the pattern use[Action] (e.g., `useDeleteConversation`)
+  - Architecture rule: UI components → TanStack Query hooks → Data Access Layer (`frontend/src/api`) → API endpoints

 ## Template for Github Pull Request

 If you are starting a pull request (PR), please follow the template in `.github/pull_request_template.md`.

-## Runtime Architecture
- OpenHands uses a Docker-based runtime for secure execution of agent actions
- The runtime builds a custom Docker image based on a specified base image
- The image includes OpenHands-specific code and the runtime client
- The runtime client executes actions in the sandboxed environment and returns observations
- More details in the [runtime architecture documentation](https://docs.all-hands.dev/usage/architecture/runtime)
-
 ## Implementation Details

 These details may or may not be useful for your current task.
@@ -163,4 +86,4 @@ These details may or may not be useful for your current task.
     - Add the translation key to `frontend/src/i18n/declaration.ts`
  2. Add the setting to the backend:
     - Add the setting to the `Settings` model in `openhands/storage/data_models/settings.py`
-     - Update any relevant backend code to apply the setting (e.g., in session creation)
+     - Update any relevant backend code to apply the setting (e.g., in session creation)
--- a/Development.md
+++ b/Development.md
@@ -136,7 +136,7 @@ poetry run pytest ./tests/unit/test_*.py
 To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
 container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.

-Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.41-nikolaik`
+Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.43-nikolaik`

 ## Develop inside Docker container

--- a/2
+++ b/2
@@ -189,7 +189,7 @@ install-pre-commit-hooks:

 lint-backend:
 	@echo "$(YELLOW)Running linters...$(RESET)"
-	@poetry run pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)
+	@poetry run pre-commit run --all-files --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)

 lint-frontend:
 	@echo "$(YELLOW)Running linters for frontend...$(RESET)"
--- a/README.md
+++ b/README.md
@@ -18,6 +18,17 @@
  <a href="https://docs.all-hands.dev/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
  <a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Paper on Arxiv"></a>
  <a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0"><img src="https://img.shields.io/badge/Benchmark%20score-000?logoColor=FFE165&logo=huggingface&style=for-the-badge" alt="Evaluation Benchmark Score"></a>
+
+  <!-- Keep these links. Translations will automatically update with the README. -->
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=de">Deutsch</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=es">Español</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=fr">français</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ja">日本語</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ko">한국어</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=pt">Português</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ru">Русский</a> |
+  <a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=zh">中文</a>
+
  <hr>
 </div>

@@ -51,17 +62,17 @@ system requirements and more information.


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41
+    docker.all-hands.dev/all-hands-ai/openhands:0.43
 ```

 You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
--- a/README_CN.md
+++ b/README_CN.md
@@ -51,17 +51,17 @@ OpenHands也可以使用Docker在本地系统上运行。


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41
+    docker.all-hands.dev/all-hands-ai/openhands:0.43
 ```

 您将在[http://localhost:3000](http://localhost:3000)找到运行中的OpenHands！
--- a/containers/dev/compose.yml
+++ b/containers/dev/compose.yml
@@ -10,8 +10,9 @@ services:
    environment:
      - BACKEND_HOST=${BACKEND_HOST:-"0.0.0.0"}
      - SANDBOX_API_HOSTNAME=host.docker.internal
+      - DOCKER_HOST_ADDR=host.docker.internal
      #
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.41-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.43-nikolaik}
      - SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -7,7 +7,7 @@ services:
    image: openhands:latest
    container_name: openhands-app-${DATE:-}
    environment:
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik}
      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of openhands-state for this user
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,17 @@
+# Setup
+
+```
+npm install -g mint
+```
+
+or
+
+```
+yarn global add mint
+```
+
+# Preview
+
+```
+mint dev
+```
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -34,7 +34,8 @@
                "group": "Integrations",
                "pages": [
                  "usage/cloud/github-installation",
-                  "usage/cloud/gitlab-installation"
+                  "usage/cloud/gitlab-installation",
+                  "usage/cloud/slack-installation"
                ]
              },
              "usage/cloud/cloud-ui",
@@ -48,13 +49,62 @@
              "usage/how-to/gui-mode",
              "usage/how-to/cli-mode",
              "usage/how-to/headless-mode",
-              "usage/how-to/github-action"
+              "usage/how-to/github-action",
+              {
+                "group": "Advanced Configuration",
+                "pages": [
+                {
+                  "group": "LLM Configuration",
+                  "pages": [
+                    "usage/llms/llms",
+                    {
+                      "group": "Providers",
+                      "pages": [
+                        "usage/llms/azure-llms",
+                        "usage/llms/google-llms",
+                        "usage/llms/groq",
+                        "usage/llms/local-llms",
+                        "usage/llms/litellm-proxy",
+                        "usage/llms/openai-llms",
+                        "usage/llms/openrouter"
+                      ]
+                    }
+                  ]
+                },
+                {
+                  "group": "Runtime Configuration",
+                  "pages": [
+                    "usage/runtimes/overview",
+                    {
+                      "group": "Providers",
+                      "pages": [
+                        "usage/runtimes/docker",
+                        "usage/runtimes/remote",
+                        "usage/runtimes/local",
+                        {
+                          "group": "Third-Party Providers",
+                          "pages": [
+                            "usage/runtimes/modal",
+                            "usage/runtimes/daytona",
+                            "usage/runtimes/runloop",
+                            "usage/runtimes/e2b"
+                          ]
+                        }
+                      ]
+                    }
+                  ]
+                },
+                "usage/configuration-options",
+                "usage/how-to/custom-sandbox-guide",
+                "usage/search-engine-setup",
+                "usage/mcp"
+                ]
+              }
            ]
          },
          {
            "group": "Customization",
            "pages": [
-              "usage/prompting/prompting-best-practices",
              "usage/prompting/repository",
              {
                "group": "Microagents",
@@ -69,53 +119,9 @@
            ]
          },
          {
-            "group": "Advanced Configuration",
+            "group": "Tips and Tricks",
            "pages": [
-            {
-              "group": "LLM Configuration",
-              "pages": [
-                "usage/llms/llms",
-                {
-                  "group": "Providers",
-                  "pages": [
-                    "usage/llms/azure-llms",
-                    "usage/llms/google-llms",
-                    "usage/llms/groq",
-                    "usage/llms/local-llms",
-                    "usage/llms/litellm-proxy",
-                    "usage/llms/openai-llms",
-                    "usage/llms/openrouter"
-                  ]
-                }
-              ]
-            },
-            {
-              "group": "Runtime Configuration",
-              "pages": [
-                "usage/runtimes/overview",
-                {
-                  "group": "Providers",
-                  "pages": [
-                    "usage/runtimes/docker",
-                    "usage/runtimes/remote",
-                    "usage/runtimes/local",
-                    {
-                      "group": "Third-Party Providers",
-                      "pages": [
-                        "usage/runtimes/modal",
-                        "usage/runtimes/daytona",
-                        "usage/runtimes/runloop",
-                        "usage/runtimes/e2b"
-                      ]
-                    }
-                  ]
-                }
-              ]
-            },
-            "usage/configuration-options",
-            "usage/how-to/custom-sandbox-guide",
-            "usage/search-engine-setup",
-            "usage/mcp"
+              "usage/prompting/prompting-best-practices"
            ]
          },
          {
--- a/docs/static/img/slack-create-convo.png
+++ b/docs/static/img/slack-create-convo.png
--- a/docs/static/img/slack-pro-tip.png
+++ b/docs/static/img/slack-pro-tip.png
--- a/docs/static/img/slack-results-and-follow-up.png
+++ b/docs/static/img/slack-results-and-follow-up.png
--- a/docs/usage/cloud/gitlab-installation.mdx
+++ b/docs/usage/cloud/gitlab-installation.mdx
@@ -19,6 +19,12 @@ appropriate repository and branch you'd like OpenHands to work on. Then click on

 ![Connect Repo](/static/img/connect-repo.png)

+## Using Tokens with Reduced Scopes
+
+OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent.
+To restrict the agent's permissions, you can define a custom secret `GITLAB_TOKEN`, which will override the default token assigned to the agent.
+While the high-permission API token is still requested and used for other components of the application (e.g. opening merge requests), the agent will not have access to it.
+
 ## Next Steps

 - [Learn about the Cloud UI](/usage/cloud/cloud-ui).
--- a/docs/usage/cloud/slack-installation.mdx
+++ b/docs/usage/cloud/slack-installation.mdx
@@ -0,0 +1,50 @@
+---
+title: Slack Integration (Beta)
+description: This guide walks you through installing the OpenHands Slack app.
+---
+
+## Prerequisites
+
+- You are a slack workspace admin
+- Access to OpenHands Cloud
+
+## Installation Steps
+
+1. Log in to [OpenHands Cloud](https://app.all-hands.dev)
+2. Click the button below to OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
+3. In the top right corner, select the workspace to install the OpenHands Slack app.
+4. Review permissions and click allow
+
+## Working With the Slack App
+
+To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel.
+
+Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands.
+
+To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. You must be the user who started the conversation.
+
+## Example conversation
+
+### Start a new conversation, and select repo
+
+Conversation is started by mentioning `@openhands`.
+
+![slack-create-convo.png](/static/img/slack-create-convo.png)
+
+### See agent response and send follow up messages
+
+Initial request is followed up by mentioning `@openhands` in a thread reply.
+
+![slack-results-and-follow-up.png](/static/img/slack-results-and-follow-up.png)
+
+## Pro tip
+
+You can mention a repo name when starting a new conversation in the following formats
+
+1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`)
+2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)
+
+The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
+If the repo name partially matches against, multiple repos, you'll be asked to select a repo from the filtered list.
+
+![slack-pro-tip.png](/static/img/slack-pro-tip.png)
--- a/docs/usage/how-to/cli-mode.mdx
+++ b/docs/usage/how-to/cli-mode.mdx
@@ -17,13 +17,14 @@ for scripting.
 pip install openhands-ai
 ```

-2. Set your model, API key, and other preferences using environment variables or with the [`config.toml`](https://github.com/All-Hands-AI/OpenHands/blob/main/config.template.toml) file.
-3. Launch an interactive OpenHands conversation from the command line:
+2. Launch an interactive OpenHands conversation from the command line:

 ```bash
 openhands
 ```

+3. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
+
 This command opens an interactive prompt where you can type tasks or commands and get responses from OpenHands.

 #### For Developers
@@ -46,7 +47,7 @@ poetry run python -m openhands.cli.main
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
@@ -55,7 +56,7 @@ docker run -it \
    -v ~/.openhands-state:/.openhands-state \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
    python -m openhands.cli.main --override-cli-mode true
 ```

--- a/docs/usage/how-to/gui-mode.mdx
+++ b/docs/usage/how-to/gui-mode.mdx
@@ -27,7 +27,7 @@ You can use the Settings page at any time to:
 - [Configure MCP servers](/usage/mcp).
 - [Connect to GitHub](/usage/how-to/gui-mode#github-setup) and [connect to GitLab](/usage/how-to/gui-mode#gitlab-setup)
 - Set application settings like your preferred language, notifications and other preferences.
- Generate custom secrets.
+- [Manage custom secrets](/usage/how-to/gui-mode#secrets-management).

 #### GitHub Setup

@@ -100,6 +100,11 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
   - In the Settings page, navigate to the `Git` tab.
   - Paste your token in the `GitLab Token` field.
   - Click `Save Changes` to apply the changes.
+
+  3. **(Optional): Restrict agent permissions**
+   - Create another PAT using Step 1 and exclude `api` scope .
+   - In the Settings page, in the `Secrets` tab, create a new secret `GITLAB_TOKEN` and paste your lower scope token.
+   - OpenHands will use the higher scope token, and the agent will use the lower scope token
 </Accordion>

 <Accordion title="Troubleshooting">
@@ -117,6 +122,36 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
 </Accordion>
 </AccordionGroup>

+#### Secrets Management
+
+OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment variables in the agent's runtime environment.
+
+1. **Accessing the Secrets Manager**:
+   - In the Settings page, navigate to the `Secrets` tab.
+   - You'll see a list of all your existing custom secrets (if any).
+
+2. **Adding a New Secret**:
+   - Click the `Add New Secret` button.
+   - Fill in the following fields:
+     - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name.
+     - **Value**: The sensitive information you want to store.
+     - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent.
+   - Click `Add Secret` to save.
+
+3. **Editing a Secret**:
+   - Click the `Edit` button next to the secret you want to modify.
+   - You can update the name and description of the secret.
+   - Note: For security reasons, you cannot view or edit the value of an existing secret. If you need to change the value, delete the secret and create a new one.
+
+4. **Deleting a Secret**:
+   - Click the `Delete` button next to the secret you want to remove.
+   - Confirm the deletion when prompted.
+
+5. **Using Secrets in the Agent**:
+   - All custom secrets are automatically exported as environment variables in the agent's runtime environment.
+   - You can access them in your code using standard environment variable access methods (e.g., `os.environ['SECRET_NAME']` in Python).
+   - Example: If you create a secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or `os.environ['OPENAI_API_KEY']` in Python.
+
 #### Advanced Settings

 The `Advanced` settings allows configuration of additional LLM settings. Inside the Settings page, under the `LLM` tab,
@@ -132,10 +167,24 @@ toggle `Advanced` options to access additional settings.
 For an overview of the key features available inside a conversation, please refer to the [Key Features](/usage/key-features)
 section of the documentation.

+### Status Indicator
+
+The status indicator located in the bottom left of the screen will cycle through a number of states as a new conversation
+is loaded. Typically these include:
+
+* `Disconnected` : The frontend is not connected to any conversation
+* `Connecting` : The frontend is connecting a websocket to a conversation.
+* `Building Runtime...` : The server is building a runtime. This is typically in development mode only while building a docker image.
+* `Starting Runtime...` : The server is starting a new runtime instance - probably a new docker container or remote runtime.
+* `Initializing Agent...` : The server is starting the agent loop. (This step does not appear at present with Nested runtimes)
+* `Setting up workspace...` : Usually this means a `git clone ...` operation.
+* `Setting up git hooks` : Setting up the git pre commit hooks for the workspace.
+* `Agent is awaiting user input...` : Ready to go!
+
 ## Tips for Effective Use

 - Be specific in your requests to get the most accurate and helpful responses, as described in the [prompting best practices](../prompting/prompting-best-practices).
- Use one of the recommended models, as described in the [LLMs section](usage/llms/llms.md).
+- Use one of the recommended models, as described in the [LLMs section](/usage/llms/llms).

 ## Other Ways to Run Openhands
 - [Run OpenHands in a scriptable headless mode.](/usage/how-to/headless-mode)
--- a/docs/usage/how-to/headless-mode.mdx
+++ b/docs/usage/how-to/headless-mode.mdx
@@ -32,7 +32,7 @@ To run OpenHands in Headless mode with Docker:
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
@@ -42,7 +42,7 @@ docker run -it \
    -v ~/.openhands-state:/.openhands-state \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.43 \
    python -m openhands.core.main -t "write a bash script that prints hi"
 ```

--- a/docs/usage/llms/google-llms.mdx
+++ b/docs/usage/llms/google-llms.mdx
@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to Google's chat models. You c
 When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `Gemini`
 - `LLM Model` to the model you will be using.
-If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` 
+If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
 (e.g. gemini/&lt;model-name&gt; like `gemini/gemini-2.0-flash`).
 - `API Key` to your Gemini API key

@@ -26,5 +26,5 @@ VERTEXAI_LOCATION="<your-gcp-location>"
 Then set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `VertexAI`
 - `LLM Model` to the model you will be using.
-If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` 
+If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
 (e.g. vertex_ai/&lt;model-name&gt;).
--- a/docs/usage/llms/groq.mdx
+++ b/docs/usage/llms/groq.mdx
@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to chat models on Groq. You ca
 When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
 - `LLM Provider` to `Groq`
 - `LLM Model` to the model you will be using. [Visit here to see the list of
-models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, 
+models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list,
 enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/&lt;model-name&gt; like `groq/llama3-70b-8192`).
 - `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys).

--- a/docs/usage/llms/litellm-proxy.mdx
+++ b/docs/usage/llms/litellm-proxy.mdx
@@ -16,7 +16,7 @@ To use LiteLLM proxy with OpenHands, you need to:

 ## Supported Models

-The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy 
+The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy
 is configured to handle.

 Refer to your LiteLLM proxy configuration for the list of available models and their names.
--- a/docs/usage/llms/llms.mdx
+++ b/docs/usage/llms/llms.mdx
@@ -14,23 +14,28 @@ recommendations for model selection. Our latest benchmarking results can be foun

 Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands:

+### Cloud / API-Based Models
+
 - [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended)
 - [openai/o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
 - [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
 - [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [all-hands/openhands-lm-32b-v0.1](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) -- available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1)

+If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process
+to help others using the same provider!
+
+For a full list of the providers and models available, please consult the
+[litellm documentation](https://docs.litellm.ai/docs/providers).

 <Warning>
 OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending
 limits and monitor usage.
 </Warning>

-If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process 
-to help others using the same provider!
+### Local / Self-Hosted Models

-For a full list of the providers and models available, please consult the
-[litellm documentation](https://docs.litellm.ai/docs/providers).
+- [mistralai/devstral-small](https://www.all-hands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free)
+- [all-hands/openhands-lm-32b-v0.1](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1)

 <Note>
 Most current local and open source models are not as powerful. When using such models, you may see long
--- a/docs/usage/llms/local-llms.mdx
+++ b/docs/usage/llms/local-llms.mdx
@@ -54,25 +54,25 @@ Check [the installation guide](/usage/local-setup) to make sure you have all the
 export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
 export LMSTUDIO_URL="http://host.docker.internal:1234"  # <- Replace this with the port from LMStudio

-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik

 mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41
+    docker.all-hands.dev/all-hands-ai/openhands:0.43
 ```

 Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
 ```
 Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
-Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.41
+Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.43
 Starting OpenHands...
 Running OpenHands as root
 14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
--- a/docs/usage/llms/openrouter.mdx
+++ b/docs/usage/llms/openrouter.mdx
@@ -9,6 +9,6 @@ When running OpenHands, you'll need to set the following in the OpenHands UI thr
 * `LLM Provider` to `OpenRouter`
 * `LLM Model` to the model you will be using.
 [Visit here to see a full list of OpenRouter models](https://openrouter.ai/models).
-If the model is not in the list, enable `Advanced` options, and enter it in 
+If the model is not in the list, enable `Advanced` options, and enter it in
 `Custom Model` (e.g. openrouter/&lt;model-name&gt; like `openrouter/anthropic/claude-3.5-sonnet`).
 * `API Key` to your OpenRouter API key.
--- a/docs/usage/local-setup.mdx
+++ b/docs/usage/local-setup.mdx
@@ -10,6 +10,7 @@ description: Getting started with running OpenHands on your own.
 - MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements)
 - Linux
 - Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements)
+- Windows without WSL (see [Windows Without WSL Guide](/usage/windows-without-wsl))

 A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands.

@@ -55,6 +56,10 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
  The docker command below to start the app must be run inside the WSL terminal.
  </Note>

+  **Alternative: Windows without WSL**
+
+  If you prefer to run OpenHands on Windows without WSL or Docker, see our [Windows Without WSL Guide](/usage/windows-without-wsl).
+
 </Accordion>

 </AccordionGroup>
@@ -62,17 +67,17 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
 ### Start the App

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.43-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands-state:/.openhands-state \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.41
+    docker.all-hands.dev/all-hands-ai/openhands:0.43
 ```

 You'll find OpenHands running at http://localhost:3000!
@@ -117,10 +122,24 @@ OpenHands requires an API key to access most language models. Here's how to get

 </Accordion>

+<Accordion title="Local LLM (e.g. LM Studio, llama.cpp, Ollama)">
+
+If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used.
+
+</Accordion>
+
 </AccordionGroup>

 Consider setting usage limits to control costs.

+#### Using a Local LLM
+
+<Note>
+Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior.
+</Note>
+
+To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/usage/llms/local-llms) for setup instructions.
+
 #### Setting Up Search Engine

 OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed.
--- a/docs/usage/prompting/microagents-org.mdx
+++ b/docs/usage/prompting/microagents-org.mdx
@@ -5,7 +5,7 @@ description: Organizations and users can define microagents that apply to all re

 ## Usage

-These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded 
+These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded
 accordingly. However, they are applied to all repositories belonging to the organization or user.

 Add a `.openhands` repository under the organization or user and create a `microagents` directory and place the
--- a/docs/usage/runtimes/local.mdx
+++ b/docs/usage/runtimes/local.mdx
@@ -15,7 +15,7 @@ Before using the Local Runtime, ensure that:
 1. You can run OpenHands using the [Development workflow](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).
 2. For Linux and Mac, tmux is available on your system.
 3. For Windows, PowerShell is available on your system.
-    - Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime. 
+    - Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime.

 ## Configuration

--- a/docs/usage/windows-without-wsl.mdx
+++ b/docs/usage/windows-without-wsl.mdx
@@ -0,0 +1,200 @@
+---
+title: Windows Without WSL
+description: Running OpenHands GUI on Windows without using WSL or Docker
+---
+
+# Running OpenHands GUI on Windows Without WSL
+
+This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker.
+
+## Prerequisites
+
+1. **Windows 10/11** - A modern Windows operating system
+2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors)
+3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet
+4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility)
+5. **Git** - For cloning the repository and version control
+6. **Node.js and npm** - For running the frontend
+
+## Step 1: Install Required Software
+
+1. **Install Python 3.12 or 3.13**
+   - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/)
+   - During installation, check "Add Python to PATH"
+   - Verify installation by opening PowerShell and running:
+     ```powershell
+     python --version
+     ```
+
+2. **Install PowerShell 7**
+   - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases)
+   - Choose the MSI installer appropriate for your system (x64 for most modern computers)
+   - Run the installer with default options
+   - Verify installation by opening a new terminal and running:
+     ```powershell
+     pwsh --version
+     ```
+   - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors
+
+3. **Install .NET Core Runtime**
+   - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download)
+   - Choose the latest .NET Core Runtime (not SDK)
+   - Verify installation by opening PowerShell and running:
+     ```powershell
+     dotnet --info
+     ```
+   - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation.
+
+4. **Install Git**
+   - Download Git from [git-scm.com](https://git-scm.com/download/win)
+   - Use default installation options
+   - Verify installation:
+     ```powershell
+     git --version
+     ```
+
+5. **Install Node.js and npm**
+   - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended)
+   - During installation, accept the default options which will install npm as well
+   - Verify installation:
+     ```powershell
+     node --version
+     npm --version
+     ```
+
+6. **Install Poetry**
+   - Open PowerShell as Administrator and run:
+     ```powershell
+     (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
+     ```
+   - Add Poetry to your PATH:
+     ```powershell
+     $env:Path += ";$env:APPDATA\Python\Scripts"
+     ```
+   - Verify installation:
+     ```powershell
+     poetry --version
+     ```
+
+## Step 2: Clone and Set Up OpenHands
+
+1. **Clone the Repository**
+   ```powershell
+   git clone https://github.com/All-Hands-AI/OpenHands.git
+   cd OpenHands
+   ```
+
+2. **Install Dependencies**
+   ```powershell
+   poetry install
+   ```
+
+   This will install all required dependencies, including:
+   - pythonnet - Required for Windows PowerShell integration
+   - All other OpenHands dependencies
+
+## Step 3: Run OpenHands
+
+1. **Build the Frontend**
+   ```powershell
+   cd frontend
+   npm install
+   npm run build
+   cd ..
+   ```
+
+   This will build the frontend files that the backend will serve.
+
+2. **Start the Backend**
+   ```powershell
+   # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell
+   pwsh
+   $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace"
+   ```
+
+   This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`.
+
+   > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above.
+
+   > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below.
+
+3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)**
+   ```powershell
+   cd frontend
+   npm run dev
+   ```
+
+4. **Access the OpenHands GUI**
+
+   Open your browser and navigate to:
+   ```
+   http://localhost:3000
+   ```
+
+   > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001`
+
+## Limitations on Windows
+
+When running OpenHands on Windows without WSL or Docker, be aware of the following limitations:
+
+1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows.
+
+2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. If .NET Core is not available, OpenHands will automatically fall back to a more limited PowerShell implementation with reduced functionality.
+
+3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS.
+
+4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems.
+
+## Troubleshooting
+
+### "System.Management.Automation" Not Found Error
+
+If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing.
+
+> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default.
+
+To resolve this issue:
+
+1. **Install the latest version of PowerShell 7** from the official Microsoft repository:
+   - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases)
+   - Download and install the latest MSI package for your system architecture (x64 for most systems)
+   - During installation, ensure you select the following options:
+     - "Add PowerShell to PATH environment variable"
+     - "Register Windows PowerShell 7 as the default shell"
+     - "Enable PowerShell remoting"
+   - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default
+
+2. **Restart your terminal or command prompt** to ensure the new PowerShell is available
+
+3. **Verify the installation** by running:
+   ```powershell
+   pwsh --version
+   ```
+
+   You should see output indicating PowerShell 7.x.x
+
+4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell:
+   ```powershell
+   pwsh
+   cd path\to\openhands
+   $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace"
+   ```
+
+   > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell".
+
+5. **If the issue persists**, ensure that you have the .NET Runtime installed:
+   - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download)
+   - Choose ".NET Runtime" (not SDK) version 6.0 or later
+   - After installation, verify it's properly installed by running:
+     ```powershell
+     dotnet --info
+     ```
+   - Restart your computer after installation
+   - Try running OpenHands again
+
+6. **Ensure that the .NET Framework is properly installed** on your system:
+   - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off
+   - Make sure ".NET Framework 4.8 Advanced Services" is enabled
+   - Click OK and restart if prompted
+
+This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration.
--- a/evaluation/benchmarks/browsing_delegation/run_infer.py
+++ b/evaluation/benchmarks/browsing_delegation/run_infer.py
@@ -144,7 +144,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False

    if llm_config is None:
--- a/evaluation/benchmarks/miniwob/run_infer.py
+++ b/evaluation/benchmarks/miniwob/run_infer.py
@@ -223,7 +223,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
--- a/evaluation/benchmarks/swe_bench/README.md
+++ b/evaluation/benchmarks/swe_bench/README.md
@@ -2,6 +2,8 @@

 This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).

+**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
+
 **UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**

 **UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**
--- a/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
+++ b/evaluation/benchmarks/swe_bench/SWE-bench-Live.md
@@ -0,0 +1,65 @@
+# SWE-bench-Live
+
+<p align="center">
+<a href="https://arxiv.org/abs/2505.23419">📃 Paper</a>
+•
+<a href="https://huggingface.co/SWE-bench-Live" >🤗 HuggingFace</a>
+•
+<a href="https://SWE-bench-Live.github.io" >📊 Leaderboard</a>
+</p>
+
+SWE-bench-Live is a live benchmark for issue resolving, providing a dataset that contains the latest issue tasks. This document explains how to run the evaluation of OpenHands on SWE-bench-Live.
+
+Since SWE-bench-Live has an almost identical setting to SWE-bench, you only need to simply change the dataset name to `SWE-bench-Live/SWE-bench-Live`, the other parts are basically the same as running on SWE-bench.
+
+## Setting Up
+
+Set up the development environment and configure your LLM provider by following the [README](README.md).
+
+## Running Inference
+
+Use the same script, but change the dataset name to `SWE-bench-Live` and select the split (either `lite` or `full`). The lite split contains 300 instances from the past six months, while the full split includes 1,319 instances created after 2024.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
+```
+
+In the original SWE-bench-Live paper, max_iterations is set to 100.
+
+```shell
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.your_llm HEAD CodeActAgent 300 100 3 SWE-bench-Live/SWE-bench-Live lite
+```
+
+## Evaluating Results
+
+After OpenHands generates patch results for each issue, we evaluate the results using the [SWE-bench-Live evaluation harness](https://github.com/microsoft/SWE-bench-Live).
+
+Convert to the format of predictions for SWE benchmarks:
+
+```shell
+# You can find output.jsonl in evaluation/evaluation_outputs
+python evaluation/benchmarks/swe_bench/scripts/live/convert.py --output_jsonl [path/to/evaluation/output.jsonl] > preds.jsonl
+```
+
+Please refer to the original [SWE-bench-Live repository](https://github.com/microsoft/SWE-bench-Live) to set up the evaluation harness and use the provided scripts to generate the evaluation report:
+
+```shell
+python -m swebench.harness.run_evaluation \
+    --dataset_name SWE-bench-Live/SWE-bench-Live \
+    --split lite \
+    --namespace starryzhang \
+    --predictions_path preds.jsonl \
+    --max_workers 10 \
+    --run_id openhands
+```
+
+## Citation
+
+```bibtex
+@article{zhang2025swebenchgoeslive,
+  title={SWE-bench Goes Live!},
+  author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang},
+  journal={arXiv preprint arXiv:2505.23419},
+  year={2025}
+}
+```
--- a/evaluation/benchmarks/swe_bench/live_utils.py
+++ b/evaluation/benchmarks/swe_bench/live_utils.py
@@ -0,0 +1,80 @@
+from typing import Any
+
+import pandas as pd
+
+from evaluation.utils.shared import assert_and_raise
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action import CmdRunAction
+from openhands.events.observation import (
+    CmdOutputObservation,
+    ErrorObservation,
+)
+from openhands.runtime.base import Runtime
+from openhands.utils.shutdown_listener import sleep_if_should_continue
+
+
+def complete_runtime(
+    runtime: Runtime,
+    instance: pd.Series,
+) -> dict[str, Any]:
+    """Complete the runtime and export the git patch for SWE-bench-Live."""
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Completion Fn')
+    logger.info('-' * 30)
+    obs: CmdOutputObservation
+    workspace_dir_name = instance.instance_id
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action)
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+    action = CmdRunAction(command='git config --global core.pager ""')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git config --global core.pager "": {str(obs)}',
+    )
+    action = CmdRunAction(command='git add -A')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git add -A: {str(obs)}',
+    )
+    n_retries = 0
+    git_patch = None
+    while n_retries < 5:
+        action = CmdRunAction(
+            command=f'git diff --no-color --cached {instance["base_commit"]}',
+        )
+        action.set_hard_timeout(100 + 10 * n_retries)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+        n_retries += 1
+        if isinstance(obs, CmdOutputObservation):
+            if obs.exit_code == 0:
+                git_patch = obs.content.strip()
+                break
+            else:
+                logger.info('Failed to get git diff, retrying...')
+                sleep_if_should_continue(10)
+        elif isinstance(obs, ErrorObservation):
+            logger.error(f'Error occurred: {obs.content}. Retrying...')
+            sleep_if_should_continue(10)
+        else:
+            assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+    assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
+    logger.info('-' * 30)
+    logger.info('END Runtime Completion Fn')
+    logger.info('-' * 30)
+    return {'git_patch': git_patch}
--- a/evaluation/benchmarks/swe_bench/loc_prompt.py
+++ b/evaluation/benchmarks/swe_bench/loc_prompt.py
@@ -1,4 +1,4 @@
-TASK_INSTRUECTION="""
+TASK_INSTRUECTION = """
 Given the following GitHub problem description, your objective is to localize the specific files, classes or functions, and lines of code that need modification or contain key information to resolve the issue.

 Follow these steps to localize the issue:
@@ -66,4 +66,4 @@ FAKE_USER_MSG_FOR_LOC = (
    'Verify that you have carefully analyzed the impact of the found locations on the repository, especially their dependencies. '
    'If you think you have solved the task, please send your final answer (including the former answer and reranking) to user through message and then call `finish` to finish.\n'
    'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n'
-)
+)
--- a/evaluation/benchmarks/swe_bench/prompts/swe_claude.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swe_claude.j2
@@ -0,0 +1,65 @@
+<uploaded_files>
+/workspace/{{ workspace_dir_name }}
+</uploaded_files>
+
+I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
+
+<issue_description>
+{{ instance.problem_statement }}
+</issue_description>
+
+Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
+I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
+Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
+Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
+
+Follow these phases to resolve the issue:
+
+Phase 1. READING: read the problem and reword it in clearer terms
+   1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
+   1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
+   1.3 Explain the problem in clear terms.
+   1.4 Enumerate the steps to reproduce the problem.
+   1.5 Hightlight any best practices to take into account when testing and fixing the issue
+
+Phase 2. RUNNING: install and run the tests on the repository
+   2.1 Follow the readme
+   2.2 Install the environment and anything needed
+   2.2 Iterate and figure out how to run the tests
+
+Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
+   3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
+   3.2 Identify all files related to the problem statement.
+   3.3 Propose the methods and files to fix the issue and explain why.
+   3.4 From the possible file locations, select the most likely location to fix the issue.
+
+Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
+   4.1 Look at existing test files in the repository to understand the test format/structure.
+   4.2 Create a minimal reproduction script that reproduces the located issue.
+   4.3 Run the reproduction script to confirm you are reproducing the issue.
+   4.4 Adjust the reproduction script as necessary.
+
+Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
+   5.1 State clearly what the problem is.
+   5.2 State clearly where the problem is located.
+   5.3 State clearly how the test reproduces the issue.
+   5.4 State clearly the best practices to take into account in the fix.
+   5.5 State clearly how to fix the problem.
+
+Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
+   6.1 Make minimal, focused changes to fix the issue.
+
+Phase 7. VERIFICATION: Test your implementation thoroughly.
+   7.1 Run your reproduction script to verify the fix works.
+   7.2 Add edge cases to your test script to ensure comprehensive coverage.
+   7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
+
+8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
+   8.1 Ensure you've fully addressed all requirements.
+   8.2 Run any tests in the repository related to:
+     8.2.1 The issue you are fixing
+     8.2.2 The files you modified
+     8.2.3 The functions you changed
+   8.3 If any tests fail, revise your implementation until all tests pass
+
+Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
--- a/evaluation/benchmarks/swe_bench/prompts/swe_default.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swe_default.j2
@@ -0,0 +1,65 @@
+<uploaded_files>
+/workspace/{{ workspace_dir_name }}
+</uploaded_files>
+
+I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
+
+<issue_description>
+{{ instance.problem_statement }}
+</issue_description>
+
+Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
+I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
+Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
+Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
+
+Follow these phases to resolve the issue:
+
+Phase 1. READING: read the problem and reword it in clearer terms
+   1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
+   1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
+   1.3 Explain the problem in clear terms.
+   1.4 Enumerate the steps to reproduce the problem.
+   1.5 Hightlight any best practices to take into account when testing and fixing the issue
+
+Phase 2. RUNNING: install and run the tests on the repository
+   2.1 Follow the readme
+   2.2 Install the environment and anything needed
+   2.2 Iterate and figure out how to run the tests
+
+Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
+   3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
+   3.2 Identify all files related to the problem statement.
+   3.3 Propose the methods and files to fix the issue and explain why.
+   3.4 From the possible file locations, select the most likely location to fix the issue.
+
+Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
+   4.1 Look at existing test files in the repository to understand the test format/structure.
+   4.2 Create a minimal reproduction script that reproduces the located issue.
+   4.3 Run the reproduction script to confirm you are reproducing the issue.
+   4.4 Adjust the reproduction script as necessary.
+
+Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
+   5.1 State clearly what the problem is.
+   5.2 State clearly where the problem is located.
+   5.3 State clearly how the test reproduces the issue.
+   5.4 State clearly the best practices to take into account in the fix.
+   5.5 State clearly how to fix the problem.
+
+Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
+   6.1 Make minimal, focused changes to fix the issue.
+
+Phase 7. VERIFICATION: Test your implementation thoroughly.
+   7.1 Run your reproduction script to verify the fix works.
+   7.2 Add edge cases to your test script to ensure comprehensive coverage.
+   7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
+
+8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
+   8.1 Ensure you've fully addressed all requirements.
+   8.2 Run any tests in the repository related to:
+     8.2.1 The issue you are fixing
+     8.2.2 The files you modified
+     8.2.3 The functions you changed
+   8.3 If any tests fail, revise your implementation until all tests pass
+
+Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
--- a/evaluation/benchmarks/swe_bench/prompts/swe_gemini.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swe_gemini.j2
@@ -0,0 +1,45 @@
+# Task: Fix Issue in Python Repository
+
+## Repository Context
+You are provided with a Python code repository that contains an issue requiring your attention. The repository is located in a sandboxed environment, and you have access to the codebase to implement the necessary changes.
+The code repository is located at: `/workspace/{{ workspace_dir_name }}`
+(This path is provided for context; use file system tools to confirm paths before access).
+
+## Goal
+Your goal is to fix the issue described in the **Issue Description** section below. Implement the necessary changes to **non-test files only** within the repository, ensuring that **all relevant tests pass** after your changes.
+
+## Key Requirements & Constraints
+
+1.  **Understand the problem** very well: it is a bug report, and you know humans don't always write good descriptions. Explore the codebase to understand the related code and the problem in depth. It is possible that the solution needs to be a bit more extensive than just the stated text. Don't exagerate though: don't do unrelated refactoring, but also don't interpret the description too strictly.
+2.  **Focus on the issues:** Implement the fix focusing on non-test files related to the issue.
+2.  **Environment Ready:** The Python environment is pre-configured with all dependencies. Do not install packages.
+3.  **Mandatory Testing Procedure:**
+    *   **Create Test to Reproduce the Issue:** *Before* implementing any fix, you MUST create a *new test* (separate from existing tests) that specifically reproduces the issue.
+            * Take existing tests as example to understand the testing format/structure.
+            * Enhance this test with edge cases.
+            * Run this test to confirm reproduction.
+    *   **Verify Fix:** After implementing the fix, run your test again to verify the issue is resolved.
+    *   **Identify ALL Relevant Tests:** You MUST perform a **dedicated search and analysis** to identify **all** existing unit tests potentially affected by your changes. This includes:
+        *   Tests in the same module/directory as the changed files (e.g., `tests/` subdirectories).
+        *   Tests explicitly importing or using the modified code/classes/functions.
+        *   Tests mentioned in the issue description or related documentation.
+        *   Tests covering functionalities that *depend on* the modified code (analyze callers/dependencies if necessary).
+        **If you cannot confidently identify a specific subset, you MUST identify and plan to run the entire test suite for the modified application or module(s). State your identified test scope clearly.**
+    *   **Run Identified Relevant Tests:** You MUST execute the **complete set** of relevant existing unit tests you identified in the previous step. Ensure you are running the *correct and comprehensive set* of tests. You MUST NOT modify these existing tests.
+    *   **Final Check & Verification:** Before finishing, ensure **all** identified relevant existing tests pass. **Explicitly confirm that you have considered potential omissions in your test selection and believe the executed tests comprehensively cover the impact of your changes.** Failing to identify and run the *complete* relevant set constitutes a failure. If any identified tests fail, revise your fix. Passing all relevant tests is the primary measure of success.
+4.  **Defensive Programming:** Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
+5.  **Final Review:** Compare your solution against the original issue and the base commit ({{ instance.base_commit }}) to ensure completeness and test passage.
+
+## General Workflow Guidance
+
+*   Prioritize understanding the problem, exploring the code, planning your fix, implementing it carefully using the required diff format, and **thoroughly testing** according to the **Mandatory Testing Procedure**.
+*   Consider trade-offs between different solutions. The goal is a **robust change that makes the relevant tests pass.** Quality, correctness, and reliability are key.
+*   Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
+
+*   IMPORTANT: Your solution will be tested by additional hidden tests, so do not assume the task is complete just because visible tests pass! Refine the solution until you are confident that it is robust and comprehensive according to the **Defensive Programming** requirement.
+
+## Final Note
+Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
+
+## Issue Description
+{{ instance.problem_statement }}
--- a/evaluation/benchmarks/swe_bench/prompts/swe_gpt4.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swe_gpt4.j2
@@ -0,0 +1,80 @@
+You will be tasked to fix an issue from an open-source repository.
+
+Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.
+
+You MUST iterate and keep going until the problem is solved.
+
+You already have everything you need to solve this problem in the /workspace/{{ workspace_dir_name }} folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
+
+Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct.
+NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
+
+THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.
+
+Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it.
+At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
+
+You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
+
+# Workflow
+
+## High-Level Problem Solving Strategy
+
+1. Understand the problem deeply. Carefully read the issue and think critically about what is required.
+2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
+3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.
+4. Implement the fix incrementally. Make small, testable code changes.
+5. Debug as needed. Use debugging techniques to isolate and resolve issues.
+6. Test frequently. Run tests after each change to verify correctness.
+7. Iterate until the root cause is fixed and all tests pass.
+8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness,
+and remember there are hidden tests that must also pass before the solution is truly complete.
+
+Refer to the detailed sections below for more information on each step.
+
+## 1. Deeply Understand the Problem
+Carefully read the issue and think hard about a plan to solve it before coding.
+
+## 2. Codebase Investigation
+- Explore relevant files and directories.
+- Search for key functions, classes, or variables related to the issue.
+- Read and understand relevant code snippets.
+- Identify the root cause of the problem.
+- Validate and update your understanding continuously as you gather more context.
+
+## 3. Develop a Detailed Plan
+- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
+- Break down the fix into small, incremental changes.
+
+## 4. Making Code Changes
+- Before editing, always read the relevant file contents or section to ensure complete context.
+- If a patch is not applied correctly, attempt to reapply it.
+- Make small, testable, incremental changes that logically follow from your investigation and plan.
+
+## 5. Debugging
+- Make code changes only if you have high confidence they can solve the problem
+- When debugging, try to determine the root cause rather than addressing symptoms
+- Debug for as long as needed to identify the root cause and identify a fix
+- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
+- To test hypotheses, you can also add test statements or functions
+- Revisit your assumptions if unexpected behavior occurs.
+
+## 6. Testing
+- Run tests frequently using `python3 run_tests.py` (or equivalent).
+- After each change, verify correctness by running relevant tests.
+- If tests fail, analyze failures and revise your patch.
+- Write additional tests if needed to capture important behaviors or edge cases.
+- Ensure all tests pass before finalizing.
+
+## 7. Final Verification
+- Confirm the root cause is fixed.
+- Review your solution for logic correctness and robustness.
+- Iterate until you are extremely confident the fix is complete and all tests pass.
+
+## 8. Final Reflection and Additional Testing
+- Reflect carefully on the original intent of the user and the problem statement.
+- Think about potential edge cases or scenarios that may not be covered by existing tests.
+- Write additional tests that would need to pass to fully validate the correctness of your solution.
+- Run these new tests and ensure they all pass.
+- Be aware that there are additional hidden tests that must also pass for the solution to be successful.
+- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.
--- a/evaluation/benchmarks/swe_bench/prompts/swt.j2
+++ b/evaluation/benchmarks/swe_bench/prompts/swt.j2
@@ -0,0 +1,19 @@
+<uploaded_files>
+/workspace/{{ workspace_dir_name }}
+</uploaded_files>
+I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
+
+<issue_description>
+{{ instance.problem_statement }}
+</issue_description>
+
+
+Can you help me implement the necessary changes to the repository to test whether the issue in <issue_description> was resolved?
+I will take care of all changes to any of the non-test files. This means you DON'T have to modify the actual logic and ONLY have to update test logic and tests!
+Your task is to make the minimal changes to tests files in the /workspace directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
+Follow these steps to reproduce the issue:
+1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.
+2. Create a script `reproduction.py` to reproduce the error and execute it with `python reproduction.py` using the BashTool, to confirm the error
+3. Edit the sourcecode of the repo to integrate your reproduction script into the test framework
+4. Run the test framework and make sure your tests fail! Only submit FAILING tests! Never submit passing tests.
+{{ test_instructions }}Your thinking should be thorough and so it's fine if it's very long.
--- a/evaluation/benchmarks/swe_bench/run_infer.py
+++ b/evaluation/benchmarks/swe_bench/run_infer.py
@@ -8,6 +8,7 @@ from typing import Any, Literal
 import pandas as pd
 import toml
 from datasets import load_dataset
+from jinja2 import Environment, FileSystemLoader

 import openhands.agenthub
 from evaluation.benchmarks.swe_bench.binary_patch_utils import (
@@ -42,7 +43,7 @@ from openhands.core.config import (
    AgentConfig,
    OpenHandsConfig,
    get_llm_config_arg,
-    get_parser
+    get_parser,
 )
 from openhands.core.config.condenser_config import NoOpCondenserConfig
 from openhands.core.config.utils import get_condenser_config_arg
@@ -65,6 +66,26 @@ RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'tru
 ENABLE_LLM_EDITOR = os.environ.get('ENABLE_LLM_EDITOR', 'false').lower() == 'true'
 BenchMode = Literal['swe', 'swt', 'swt-ci']

+# Global variable to track dataset type
+DATASET_TYPE = 'SWE-bench'
+
+
+def set_dataset_type(dataset_name: str) -> str:
+    """Set dataset type based on dataset name."""
+    global DATASET_TYPE
+    name_lower = dataset_name.lower()
+
+    if 'swe-gym' in name_lower:
+        DATASET_TYPE = 'SWE-Gym'
+    elif 'swe-bench-live' in name_lower:
+        DATASET_TYPE = 'SWE-bench-Live'
+    elif 'multimodal' in name_lower:
+        DATASET_TYPE = 'Multimodal'
+    else:
+        DATASET_TYPE = 'SWE-bench'
+
+    logger.info(f'Dataset type set to: {DATASET_TYPE}')
+

 AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
    'CodeActAgent': codeact_user_response,
@@ -72,107 +93,59 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {


 def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
-    return f'{instance.repo}__{instance.version}'.replace('/', '__')
+    if DATASET_TYPE == 'SWE-bench-Live':
+        return instance.instance_id
+    else:
+        return f'{instance.repo}__{instance.version}'.replace('/', '__')


 def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
    mode = metadata.details['mode']
+    llm_model = metadata.llm_config.model
+
+    # Determine the template file based on mode and LLM
    if mode.startswith('swt'):
-        test_instructions = (
-            f'The following command can be used to run the tests: `{list(MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE[instance.repo].values())[0]}`. Make sure they fail in the expected way.\n'
-            if mode.endswith('ci')
-            else ''
-        )
-        instruction = f"""\
-<uploaded_files>
-/workspace/{workspace_dir_name}
-</uploaded_files>
-I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:
-
-<issue_description>
-{instance.problem_statement}
-</issue_description>
-
-
-Can you help me implement the necessary changes to the repository to test whether the issue in <issue_description> was resolved?
-I will take care of all changes to any of the non-test files. This means you DON'T have to modify the actual logic and ONLY have to update test logic and tests!
-Your task is to make the minimal changes to tests files in the /workspace directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
-Follow these steps to reproduce the issue:
-1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.
-2. Create a script `reproduction.py` to reproduce the error and execute it with `python reproduction.py` using the BashTool, to confirm the error
-3. Edit the sourcecode of the repo to integrate your reproduction script into the test framework
-4. Run the test framework and make sure your tests fail! Only submit FAILING tests! Never submit passing tests.
-{test_instructions}Your thinking should be thorough and so it's fine if it's very long.
-"""
+        template_name = 'swt.j2'
+    elif mode == 'swe':
+        if 'claude' in llm_model:
+            template_name = 'swe_claude.j2'
+        elif 'gemini' in llm_model:
+            template_name = 'swe_gemini.j2'
+        elif 'gpt-4.1' in llm_model:
+            template_name = 'swe_gpt4.j2'
+        else:
+            template_name = (
+                'swe_default.j2'  # Default for 'swe' mode (regular swe-bench)
+            )
    else:
-        instruction = f"""
-<uploaded_files>
-/workspace/{workspace_dir_name}
-</uploaded_files>
+        # Fallback or error handling if mode is unexpected
+        logger.error(f'Unexpected evaluation mode: {mode}. Falling back to default.')
+        template_name = 'swe_default.j2'

-I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:
+    # Set up Jinja2 environment
+    # Assuming templates are in 'evaluation/benchmarks/swe_bench/prompts' relative to this script
+    prompts_dir = os.path.join(os.path.dirname(__file__), 'prompts')
+    env = Environment(loader=FileSystemLoader(prompts_dir))
+    template = env.get_template(template_name)

-<issue_description>
-{instance.problem_statement}
-</issue_description>
+    # Prepare context for rendering
+    context = {
+        'instance': instance,
+        'workspace_dir_name': workspace_dir_name,
+        'metadata': metadata,  # Pass metadata if needed in templates
+    }

-Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
-I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
-Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
-Your task is to make the minimal changes to non-test files in the /workspace/{workspace_dir_name} directory to ensure the <issue_description> is satisfied.
+    # Add specific context for swt-ci mode if needed
+    if mode == 'swt-ci':
+        context['test_instructions'] = (
+            f'The following command can be used to run the tests: `{list(MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE[instance.repo].values())[0]}`. Make sure they fail in the expected way.\n'
+        )
+    else:
+        context['test_instructions'] = ''  # Ensure it's defined for other modes

-Follow these phases to resolve the issue:
-
-Phase 1. READING: read the problem and reword it in clearer terms
-   1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
-   1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
-   1.3 Explain the problem in clear terms.
-   1.4 Enumerate the steps to reproduce the problem.
-   1.5 Hightlight any best practices to take into account when testing and fixing the issue
-
-Phase 2. RUNNING: install and run the tests on the repository
-   2.1 Follow the readme
-   2.2 Install the environment and anything needed
-   2.2 Iterate and figure out how to run the tests
-
-Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
-   3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
-   3.2 Identify all files related to the problem statement.
-   3.3 Propose the methods and files to fix the issue and explain why.
-   3.4 From the possible file locations, select the most likely location to fix the issue.
-
-Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
-   4.1 Look at existing test files in the repository to understand the test format/structure.
-   4.2 Create a minimal reproduction script that reproduces the located issue.
-   4.3 Run the reproduction script to confirm you are reproducing the issue.
-   4.4 Adjust the reproduction script as necessary.
-
-Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
-   5.1 State clearly what the problem is.
-   5.2 State clearly where the problem is located.
-   5.3 State clearly how the test reproduces the issue.
-   5.4 State clearly the best practices to take into account in the fix.
-   5.5 State clearly how to fix the problem.
-
-Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
-   6.1 Make minimal, focused changes to fix the issue.
-
-Phase 7. VERIFICATION: Test your implementation thoroughly.
-   7.1 Run your reproduction script to verify the fix works.
-   7.2 Add edge cases to your test script to ensure comprehensive coverage.
-   7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
-
-8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {instance['base_commit']}.
-   8.1 Ensure you've fully addressed all requirements.
-   8.2 Run any tests in the repository related to:
-     8.2.1 The issue you are fixing
-     8.2.2 The files you modified
-     8.2.3 The functions you changed
-   8.3 If any tests fail, revise your implementation until all tests pass
-
-Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
-"""
+    # Render the instruction
+    instruction = template.render(context)

    if RUN_WITH_BROWSING:
        instruction += (
@@ -203,9 +176,13 @@ def get_instance_docker_image(
    if swebench_official_image:
        # Official SWE-Bench image
        # swebench/sweb.eval.x86_64.django_1776_django-11333:v1
-        docker_image_prefix = 'docker.io/swebench/'
+        # SWE-bench-Live uses the same naming convention as SWE-Bench
+        if DATASET_TYPE == 'SWE-bench-Live':
+            docker_image_prefix = 'docker.io/starryzhang/'
+        elif DATASET_TYPE == 'SWE-bench':
+            docker_image_prefix = 'docker.io/swebench/'
        repo, name = instance_id.split('__')
-        image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
+        image_name = f'{docker_image_prefix.rstrip("/")}/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
        logger.debug(f'Using official SWE-Bench image: {image_name}')
        return image_name
    else:
@@ -223,7 +200,8 @@ def get_config(
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
    # We use a different instance image for the each instance of swe-bench eval
-    use_swebench_official_image = 'swe-gym' not in metadata.dataset.lower()
+    use_swebench_official_image = DATASET_TYPE != 'SWE-Gym'
+
    base_container_image = get_instance_docker_image(
        instance['instance_id'],
        swebench_official_image=use_swebench_official_image,
@@ -340,8 +318,12 @@ def initialize_runtime(
        runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')

        # inject the instance swe entry
+        if DATASET_TYPE == 'SWE-bench-Live':
+            entry_script_path = 'instance_swe_entry_live.sh'
+        else:
+            entry_script_path = 'instance_swe_entry.sh'
        runtime.copy_to(
-            str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
+            str(os.path.join(script_dir, f'scripts/setup/{entry_script_path}')),
            '/swe_util/',
        )

@@ -361,14 +343,14 @@ def initialize_runtime(
        logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
    assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')

-    action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
+    action = CmdRunAction(command=f'source /swe_util/{entry_script_path}')
    action.set_hard_timeout(600)
    logger.info(action, extra={'msg_type': 'ACTION'})
    obs = runtime.run_action(action)
    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
    assert_and_raise(
        obs.exit_code == 0,
-        f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
+        f'Failed to source /swe_util/{entry_script_path}: {str(obs)}',
    )

    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
@@ -421,9 +403,9 @@ def initialize_runtime(
            obs = runtime.run_action(action)
            logger.info(obs, extra={'msg_type': 'OBSERVATION'})

-    if 'multimodal' not in metadata.dataset.lower():
+    if DATASET_TYPE != 'Multimodal' and DATASET_TYPE != 'SWE-bench-Live':
        # Only for non-multimodal datasets, we need to activate the testbed environment for Python
-        # SWE-Bench multimodal datasets are not using the testbed environment
+        # SWE-Bench multimodal datasets and SWE-bench-Live are not using the testbed environment
        action = CmdRunAction(command='which python')
        action.set_hard_timeout(600)
        logger.info(action, extra={'msg_type': 'ACTION'})
@@ -665,7 +647,13 @@ def process_instance(

        # ======= THIS IS SWE-Bench specific =======
        # Get git patch
-        return_val = complete_runtime(runtime, instance)
+        if DATASET_TYPE == 'SWE-bench-Live':
+            from evaluation.benchmarks.swe_bench.live_utils import (
+                complete_runtime as complete_runtime_fn,
+            )
+        else:
+            complete_runtime_fn = complete_runtime
+        return_val = complete_runtime_fn(runtime, instance)
        git_patch = return_val['git_patch']
        logger.info(
            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
@@ -770,11 +758,15 @@ if __name__ == '__main__':
    # NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
    # so we don't need to manage file uploading to OpenHands's repo
    dataset = load_dataset(args.dataset, split=args.split)
+
+    # Set the global dataset type based on dataset name
+    set_dataset_type(args.dataset)
+
    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
    logger.info(
        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
    )
-    if 'SWE-Gym' in args.dataset:
+    if DATASET_TYPE == 'SWE-Gym':
        with open(
            os.path.join(
                os.path.dirname(os.path.abspath(__file__)),
--- a/evaluation/benchmarks/swe_bench/run_localize.py
+++ b/evaluation/benchmarks/swe_bench/run_localize.py
@@ -192,6 +192,8 @@ def get_config(
        dataset_name=metadata.dataset,
        instance_id=instance['instance_id'],
    )
+    oh_aci_li_cmd = '/openhands/micromamba/bin/micromamba run -n openhands poetry run pip install openhands-aci[llama]'
+    sandbox_config.runtime_extra_deps = oh_aci_li_cmd
    workspace_dir_name = _get_swebench_workspace_dir_name(instance)
    sandbox_config.runtime_startup_env_vars = {
        'REPO_PATH': f'/workspace/{workspace_dir_name}/',
@@ -216,6 +218,7 @@ def get_config(
        enable_jupyter=False,
        enable_browsing=RUN_WITH_BROWSING,
        enable_llm_editor=False,
+        enable_mcp=os.environ.get('ENABLE_MCP', False),
        condenser=metadata.condenser_config,
        enable_prompt_extensions=False,
    )
--- a/evaluation/benchmarks/swe_bench/scripts/live/convert.py
+++ b/evaluation/benchmarks/swe_bench/scripts/live/convert.py
@@ -0,0 +1,33 @@
+import argparse
+import json
+
+
+def main(output_jsonl: str):
+    with open(output_jsonl, 'r') as f:
+        for line in f:
+            try:
+                output = json.loads(line)
+                pred = {
+                    'instance_id': output['instance_id'],
+                    'model_name_or_path': output['metadata']['llm_config']['model'],
+                    'model_patch': output['test_result']['git_patch'],
+                }
+            except Exception as e:
+                print(
+                    f'Error while reading output of instance {output["instance_id"]}: {e}'
+                )
+
+            print(json.dumps(pred))
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--output_jsonl',
+        type=str,
+        required=True,
+        help='Path to the prediction file (.../outputs.jsonl)',
+    )
+    args = parser.parse_args()
+
+    main(args.output_jsonl)
--- a/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
+++ b/evaluation/benchmarks/swe_bench/scripts/setup/instance_swe_entry_live.sh
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+
+source ~/.bashrc
+SWEUTIL_DIR=/swe_util
+
+# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
+# SWE_INSTANCE_ID=django__django-11099
+if [ -z "$SWE_INSTANCE_ID" ]; then
+    echo "Error: SWE_INSTANCE_ID is not set." >&2
+    exit 1
+fi
+
+# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
+item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
+
+if [[ -z "$item" ]]; then
+  echo "No item found for the provided instance ID."
+  exit 1
+fi
+
+
+echo "WORKSPACE_NAME: $SWE_INSTANCE_ID"
+
+# Clear the workspace
+if [ -d /workspace ]; then
+    rm -rf /workspace/*
+else
+    mkdir /workspace
+fi
+# Copy repo to workspace
+if [ -d /workspace/$SWE_INSTANCE_ID ]; then
+    rm -rf /workspace/$SWE_INSTANCE_ID
+fi
+mkdir -p /workspace
+cp -r /testbed /workspace/$SWE_INSTANCE_ID
+
+# SWE-bench-Live does not use conda to manage Python
+# if [ -d /opt/miniconda3 ]; then
+#     . /opt/miniconda3/etc/profile.d/conda.sh
+#     conda activate testbed
+# fi
--- a/evaluation/benchmarks/testgeneval/constants.py
+++ b/evaluation/benchmarks/testgeneval/constants.py
@@ -921,7 +921,7 @@ SPECS_PYDICOM.update(

 SPECS_HUMANEVAL = {k: {'python': '3.9', 'test_cmd': 'python'} for k in ['1.0']}

-# Constants - Task Instance Instllation Environment
+# Constants - Task Instance Installation Environment
 MAP_REPO_VERSION_TO_SPECS: dict[str, dict[str, Any]] = {
    'astropy/astropy': SPECS_ASTROPY,
    'dbt-labs/dbt-core': SPECS_DBT_CORE,
--- a/evaluation/benchmarks/testgeneval/run_infer.py
+++ b/evaluation/benchmarks/testgeneval/run_infer.py
@@ -539,7 +539,7 @@ if __name__ == '__main__':
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
        llm_config.log_completions = True
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False

    if llm_config is None:
--- a/evaluation/benchmarks/versicode/README.md
+++ b/evaluation/benchmarks/versicode/README.md
@@ -0,0 +1,102 @@
+# VersiCode benchmark
+
+This project is used to evaluate the performance of the model on VersiCode. It includes:
+
+- data: the test data needed and the model outputs
+- inference_utils: inference scripts for ours tasks and models
+- metric: scripts for calculating various metric
+- output_processing: process the model output to facilitate the calculation of model metrics
+
+# Details
+
+1. **Prepare the environment**
+
+   ```shell
+   #create conda environment
+   conda create -n VersiCode python==3.12
+
+   #install requirements
+   pip install -r requirements.txt
+   ```
+
+2. **Experiment Data**
+
+    To obtain the experimental data, please visit the Hugging Face link: https://huggingface.co/datasets/AstoneNg/VersiCode.
+    Locate the files `VersiCode_block_completion.json` and `VersiCode_migration.json` under the `experiment_data` directory, and place them in the `/data/test_data directory` of this project.
+
+
+3. **Model inference**
+
+   ```shell
+   #cd inference_utils directory
+   cd inference_utils
+
+   #The script file starting with 'test' is used to test the local model
+   #The script file at the beginning of the API is used to test the API call model
+
+   #block level code completipn
+   #Modify the 10th and 12th lines of code to specify the base URL and model name
+   python api_test_block_completion.py
+   #Modify the 30th line of code to specify the local model path
+   python test_block.py
+
+   # code migration (migration order is 'old_to_new')
+   #Modify the 10th and 12th lines of code to specify the base URL and model name
+   python api_code_migration.py
+   #Modify the 30th line of code to specify the local model path
+   python test_migration.py
+   ```
+
+4. **Process output**
+   Process the output content of the model, remove redundant content, extract specified content for easy calculation of indicators.
+
+   ```shell
+   #cd output_processing
+   cd output_processing
+
+   #Extract content from<start> and <end>
+   #Modify the 8th and 9th lines of code to specify the model and task granularity
+   python clear_ans.py
+
+   #In the block completion task and migration task, cdc@k The calculation of indicators needs to be targeted at key rows,
+   #Modify lines 76 and 79 to specify the data path
+   python choose_core_line_from_block_versicode.py
+   python choose_core_line_from_migration_versicode.py
+   ```
+
+5. **Metric**
+   We have three metrics pass@k，em@k and cdc@k Due to our inability to automatically build a dynamic evaluation environment, we have not provided pass@k .
+
+   ```shell
+   #cd metric
+   cd metric
+
+   #Modify lines 137-140 in migration task (compute_migration_cdc_score.py) or 143-145 in block and line completion task (compute_versicode_cdc_score.py and compute_versicode_em_score.py) of the code to specify the data path and calculate the k-value of the metric
+   python compute_migration_cdc_score.py
+   python compute_versicode_cdc_score.py
+   python compute_versicode_em_score.py
+
+   #Notes
+   #We found limitations in the ISM@k and PM@k metrics for evaluating code generation, so they are used only as reference in our experiments.
+   #Modify lines 261-265 in block and line completion task of the code to specify the data path and calculate the k-value of the metric
+   python compute_ism_pm_score.py
+   ```
+
+# Citation
+
+```
+@article{versicode,
+  author={Tongtong Wu and Weigang Wu and Xingyu Wang and Kang Xu and Suyu Ma and Bo Jiang and Ping Yang and Zhenchang Xing and Yuan-Fang Li and Gholamreza Haffari},
+  title        = {VersiCode: Towards Version-controllable Code Generation},
+  journal      = {CoRR},
+  volume       = {abs/2406.07411},
+  year         = {2024},
+  url          = {https://arxiv.org/abs/2406.07411},
+}
+```
+
+**Github url**: https://github.com/wutong8023/VersiCode
+
+# Contributor
+
+[Tongtong Wu](https://scholar.google.com/citations?hl=zh-CN&user=u1Qp8lUAAAAJ&view_op=list_works&sortby=pubdate), [Weigang Wu](https://scholar.google.com/citations?hl=zh-CN&user=UneIZo8AAAAJ), [Xingyu Wang](https://scholar.google.com/citations?hl=zh-CN&user=wqPJcxcAAAAJ), [Kang Xu](https://scholar.google.com/citations?hl=zh-CN&user=N1UUDi0AAAAJ), [Suyu Ma](https://scholar.google.com/citations?hl=zh-CN&user=NJHR1ukAAAAJ), [Bo Jiang](https://wutong8023.site/VersiCode/), [Ping Yang](https://scholar.google.com/citations?view_op=list_works&hl=en&hl=en&user=hrogvxoAAAAJ), [Zhenchang Xing](https://scholar.google.com/citations?hl=zh-CN&user=0vCxuH4AAAAJ), [Yuan-Fang Li](https://scholar.google.com/citations?hl=zh-CN&user=wufXO1kAAAAJ), [Gholamreza Haffari](https://scholar.google.com/citations?hl=zh-CN&user=Perjx5EAAAAJ)
--- a/evaluation/benchmarks/versicode/inference_utils/api_code_migration.py
+++ b/evaluation/benchmarks/versicode/inference_utils/api_code_migration.py
@@ -0,0 +1,134 @@
+"""
+GPT performs line level generation prediction and truncates overly long tokens
+"""
+
+import json
+import os
+
+import tiktoken
+from openai import OpenAI
+
+max_tokens = 127000  # gpt3.5 is 16ktoken    gpt4o is 128k
+model_name = ''
+
+os.environ['OPENAI_API_KEY'] = ''
+client = OpenAI()
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+def predict(content, model_name):
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=[{'role': 'user', 'content': content}],
+        frequency_penalty=0.1,
+        max_tokens=128,
+        logit_bias=None,
+        logprobs=None,
+        n=6,
+        presence_penalty=0.0,
+        seed=None,
+        stop=None,
+        stream=False,
+        temperature=0.8,
+        top_p=0.95,
+    )
+    ans_list = []
+    choices_list = response.choices
+    for c in choices_list:
+        content = c.message.content
+        ans_list.append(content)
+    final_ans = str(ans_list)
+    return final_ans
+
+
+def bulid_prompt(description, old_version, old_code, new_version) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+    You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
+    including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
+    Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
+    Please note that you only need to return the refactored code and enclose it with <start> and <end>:
+    ###Functionality description of the code
+    {description}
+    ###Dependency and old version
+    {old_version}
+    ###Old version code
+    {old_code}
+    ###Dependency and new version
+    {new_version}
+    ###Refactored new code
+    """
+
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_migration.json'
+
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+data_dict = lodict
+data_list = data_dict
+
+
+for data in data_list:
+    if 'model_output' in data:
+        print(
+            f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
+        )
+        continue
+    try:
+        print(f'Predicting {data_list.index(data) + 1} ')
+        old_version = data['dependency'] + data['old_version']  # package == x.x.x
+        new_version = data['dependency'] + data['new_version']  # package == x.x.x
+        description = data['description']  # 功能描述
+        old_code = data['old_code']  # mask后的代码
+
+        instruction = bulid_prompt(description, old_version, old_code, new_version)
+        truncated_text = truncate_text(instruction, max_tokens)
+        prediction = predict(truncated_text, model_name)
+
+        data['model_output'] = prediction
+    except Exception as e:
+        print(f'error：{e}')
+        print('save current data')
+        save_folder_path = os.path.join(
+            '../data/result_data/code_migration', model_name
+        )
+        if not os.path.exists(save_folder_path):
+            os.makedirs(save_folder_path)
+        save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+        with open(save_json_path, 'w', encoding='utf-8') as fw:
+            json.dump(data_dict, fw, indent=4, ensure_ascii=False)
+        break
+
+
+save_folder_path = os.path.join('../data/result_data/code_migration', model_name)
+if not os.path.exists(save_folder_path):
+    os.makedirs(save_folder_path)
+save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+with open(save_json_path, 'w', encoding='utf-8') as fw:
+    json.dump(data_dict, fw, indent=4, ensure_ascii=False)
--- a/evaluation/benchmarks/versicode/inference_utils/api_test_block_completion.py
+++ b/evaluation/benchmarks/versicode/inference_utils/api_test_block_completion.py
@@ -0,0 +1,141 @@
+"""
+GPT performs line level generation prediction and truncates overly long tokens
+"""
+
+import json
+import os
+
+import tiktoken
+from openai import OpenAI
+
+max_tokens = 127000  # gpt3.5 is 16ktoken    gpt4o is 128k
+model_name = ''
+
+os.environ['OPENAI_API_KEY'] = ''
+client = OpenAI()
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+def predict(content, model_name):
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=[{'role': 'user', 'content': content}],
+        frequency_penalty=0.1,
+        max_tokens=128,
+        logit_bias=None,
+        logprobs=None,
+        n=6,
+        presence_penalty=0.0,
+        seed=None,
+        stop=None,
+        stream=False,
+        temperature=0.8,
+        top_p=0.95,
+    )
+    ans_list = []
+    choices_list = response.choices
+    for c in choices_list:
+        content = c.message.content
+        ans_list.append(content)
+    final_ans = str(ans_list)
+    return final_ans
+
+
+def bulid_prompt(version, description) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+            You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
+            You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
+            Please note that you only need to return the code that implements the function, and do not return any other content.
+            Please use <start> and <end> to enclose the generated code. Here is an example:
+            ###Function Description：
+            The function of this code is to print the results predicted by calling the model using vllm.
+            ###dependeny and version：
+            vllm==0.3.3
+            ###response:
+            <start>
+            for output in outputs:
+                prompt = output.prompt
+                generated_text = output.outputs[0].text
+                print("Prompt,Generated text")
+            <end>
+
+            ###Function Description：
+            {description}
+            ###dependeny and version：
+            {version}
+            ###response:
+
+
+        """
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_block_completion.json'
+
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+data_dict = lodict
+data_list = data_dict
+
+
+for data in data_list:
+    if 'model_output' in data:
+        print(
+            f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
+        )
+        continue
+    try:
+        print(f'Predicting {data_list.index(data) + 1} ')
+        version = data['dependency'] + data['version']  # package == x.x.x
+        description = data['description']  # func description
+
+        instruction = bulid_prompt(version, description)
+        truncated_text = truncate_text(instruction, max_tokens)
+        prediction = predict(truncated_text, model_name)
+
+        data['model_output'] = prediction
+    except Exception as e:
+        print(f'error：{e}')
+        print('save current data')
+        save_folder_path = os.path.join(
+            '../data/result_data/block_completion', model_name
+        )
+        if not os.path.exists(save_folder_path):
+            os.makedirs(save_folder_path)
+        save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+        with open(save_json_path, 'w', encoding='utf-8') as fw:
+            json.dump(data_dict, fw, indent=4, ensure_ascii=False)
+        break
+
+
+save_folder_path = os.path.join('../data/result_data/block_completion', model_name)
+if not os.path.exists(save_folder_path):
+    os.makedirs(save_folder_path)
+save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+with open(save_json_path, 'w', encoding='utf-8') as fw:
+    json.dump(data_dict, fw, indent=4, ensure_ascii=False)
--- a/evaluation/benchmarks/versicode/inference_utils/test_block.py
+++ b/evaluation/benchmarks/versicode/inference_utils/test_block.py
@@ -0,0 +1,129 @@
+"""
+block completion
+"""
+
+import copy
+import gc
+import json
+import os
+import time
+from multiprocessing import Process
+
+import tiktoken
+import torch
+from vllm import LLM, SamplingParams
+
+# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
+
+
+def run_inference(model_name, origin_data_list):
+    temp_data_list = copy.deepcopy(origin_data_list)
+    test_list = []
+    for data in temp_data_list:
+        version = data['dependency'] + data['version']  # package == x.x.x
+        description = data['description']  # func description
+
+        instruction = bulid_prompt(version, description)
+        test_list.append(instruction)
+
+    sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=64)
+    llm = LLM(
+        model=model_name,
+        tensor_parallel_size=4,
+        gpu_memory_utilization=0.9,
+        swap_space=20,
+    )
+
+    outputs = llm.generate(test_list, sampling_params)
+    for output in outputs:
+        requests_id = int(output.request_id)
+        temp_ans_list = []
+        output_list = output.outputs
+        for o in output_list:
+            text = o.text
+            temp_ans_list.append(text)
+
+        temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
+
+    save_folder_path = os.path.join(
+        '../data/result_data/block_completion', model_name.split('/')[-1]
+    )
+    if not os.path.exists(save_folder_path):
+        os.makedirs(save_folder_path)
+
+    save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+    with open(save_json_path, 'w', encoding='utf-8') as fw:
+        json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
+
+    gc.collect()
+    torch.cuda.empty_cache()
+
+
+def bulid_prompt(version, description) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+            You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
+            You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
+            Please note that you only need to return the code that implements the function, and do not return any other content.
+            Please use <start> and <end> to enclose the generated code. Here is an example:
+            ###Function Description：
+            The function of this code is to print the results predicted by calling the model using vllm.
+            ###dependeny and version：
+            vllm==0.3.3
+            ###response:
+            <start>
+            for output in outputs:
+                prompt = output.prompt
+                generated_text = output.outputs[0].text
+                print("Prompt,Generated text")
+            <end>
+
+            ###Function Description：
+            {description}
+            ###dependeny and version：
+            {version}
+            ###response:
+
+
+        """
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_block_completion.json'
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+
+origin_data_list = lodict
+
+for model_name in model_list:
+    process = Process(target=run_inference, args=(model_name, origin_data_list))
+    process.start()
+    process.join()
+    time.sleep(120)
--- a/evaluation/benchmarks/versicode/inference_utils/test_migration.py
+++ b/evaluation/benchmarks/versicode/inference_utils/test_migration.py
@@ -0,0 +1,122 @@
+"""
+code migration
+"""
+
+import copy
+import gc
+import json
+import os
+import time
+from multiprocessing import Process
+
+import tiktoken
+import torch
+from vllm import LLM, SamplingParams
+
+# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
+
+
+def truncate_text(text, max_tokens):
+    encoding = tiktoken.get_encoding('cl100k_base')
+    disallowed_special = ()
+
+    tokens = encoding.encode(text, disallowed_special=disallowed_special)
+    print(len(tokens))
+
+    if len(tokens) > max_tokens:
+        tokens = tokens[:max_tokens]
+
+    truncated_text = encoding.decode(tokens)
+
+    return truncated_text
+
+
+model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
+
+
+def run_inference(model_name, origin_data_list):
+    temp_data_list = copy.deepcopy(origin_data_list)
+    test_list = []
+    for data in temp_data_list:
+        old_version = data['dependency'] + data['old_version']  # package == x.x.x
+        new_version = data['dependency'] + data['new_version']  # package == x.x.x
+        description = data['description']  # 功能描述
+        old_code = data['old_code']  # mask后的代码
+
+        instruction = bulid_prompt(description, old_version, old_code, new_version)
+        test_list.append(instruction)
+
+    sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=512)
+    llm = LLM(
+        model=model_name,
+        tensor_parallel_size=4,
+        gpu_memory_utilization=0.6,
+        swap_space=40,
+    )
+
+    outputs = llm.generate(test_list, sampling_params)
+    for output in outputs:
+        requests_id = int(output.request_id)
+        temp_ans_list = []
+        output_list = output.outputs
+        for o in output_list:
+            text = o.text
+            temp_ans_list.append(text)
+
+        temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
+
+    save_folder_path = os.path.join(
+        '../data/result_data/code_migration', model_name.split('/')[-1]
+    )
+    if not os.path.exists(save_folder_path):
+        os.makedirs(save_folder_path)
+
+    save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
+
+    with open(save_json_path, 'w', encoding='utf-8') as fw:
+        json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
+
+    gc.collect()
+    torch.cuda.empty_cache()
+
+
+def bulid_prompt(description, old_version, old_code, new_version) -> str:
+    """
+    build prompt
+    :param version:
+    :param description:
+    :param masked_code:
+    :param options:
+    :return:
+    """
+    prompt = f"""
+    You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
+    including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
+    Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
+    Please note that you only need to return the refactored code and enclose it with <start> and <end>:
+    ###Functionality description of the code
+    {description}
+    ###Dependency and old version
+    {old_version}
+    ###Old version code
+    {old_code}
+    ###Dependency and new version
+    {new_version}
+    ###Refactored new code
+    """
+
+    return prompt
+
+
+json_path = '../data/test_data/VersiCode_migration.json'
+
+with open(json_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+
+origin_data_list = lodict
+
+for model_name in model_list:
+    process = Process(target=run_inference, args=(model_name, origin_data_list))
+    process.start()
+    process.join()
+    time.sleep(120)
--- a/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_ism_pm_score.py
@@ -0,0 +1,356 @@
+"""
+评测block的预测能力
+1、判断是否包含正确的函数名
+2、判断是否合法
+3、计算ISM，和PM
+"""
+
+import io
+import json
+import math
+import os
+import re
+import tokenize
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def longest_common_prefix_between_lists_with_elements(list1, list2):
+    """
+    计算两个字符串列表中元素的最长前缀匹配长度
+    :param list1:
+    :param list2:
+    :return:
+    """
+    max_prefix_length = 0
+    max_prefix_elements = ()
+    for str1 in list1:
+        for str2 in list2:
+            prefix_length = 0
+            min_len = min(len(str1), len(str2))
+            for i in range(min_len):
+                if str1[i] == str2[i]:
+                    prefix_length += 1
+                else:
+                    break
+            if prefix_length > max_prefix_length:
+                max_prefix_length = prefix_length
+                max_prefix_elements = (str1, str2)
+    return max_prefix_length, max_prefix_elements
+
+
+def get_token(ans_code: str, output_code: str):
+    """
+    对代码进行词法分析，分解成标识符，返回两个标识符列表
+    :param ans_code:
+    :param output_code:
+    :return:
+    """
+    output_flag = True
+    ans_flag = True
+    try:
+        tokens_ans = tokenize.tokenize(io.BytesIO(ans_code.encode('utf-8')).readline)
+    except Exception:
+        tokens_ans = ans_code.splitlines()
+        ans_flag = False
+
+    try:
+        tokens_output = tokenize.tokenize(
+            io.BytesIO(output_code.encode('utf-8')).readline
+        )
+    except Exception:
+        tokens_output = output_code.splitlines()
+        output_flag = False
+
+    identifiers_ans = []
+    identifiers_output = []
+    if ans_flag:
+        try:
+            for token in tokens_ans:
+                if token.type == tokenize.NAME:
+                    identifiers_ans.append(token.string)
+        except Exception:
+            identifiers_ans = tokens_ans
+    else:
+        identifiers_ans = tokens_ans
+
+    if output_flag:
+        try:
+            for to in tokens_output:
+                if to.type == tokenize.NAME:
+                    identifiers_output.append(to.string)
+        except Exception:
+            identifiers_output = tokens_output
+    else:
+        identifiers_output = tokens_output
+
+    return identifiers_ans, identifiers_output
+
+
+def get_token_per_line(code: str):
+    """
+    对每一行代码进行词法分析，记录每一行的标识符
+    :param code: 代码字符串
+    :return: 每一行的标识符列表组成的列表
+    """
+    lines = code.split('\n')  # 将代码按行分割成列表
+    identifiers_per_line = []  # 用于存储每一行的标识符列表的列表
+
+    for line in lines:
+        tokens = tokenize.tokenize(io.BytesIO(line.encode('utf-8')).readline)
+        identifiers = []
+        try:
+            for token in tokens:
+                if token.type == tokenize.NAME:
+                    identifiers.append(token.string)
+        except Exception:
+            identifiers = line.split(' ')
+        identifiers_per_line.append(identifiers)
+
+    return identifiers_per_line
+
+
+def get_ISM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
+    """
+    计算ISM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if '```python' in code:
+            code = code.replace('```python', '')
+            code = code.replace('```', '')
+        if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
+            code
+        ):
+            score_list.append(0)
+            continue
+
+        # if asnwer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        identifiers_ans, identifiers_output = get_token(answer_code, code)
+        max_len, elements = longest_common_prefix_between_lists_with_elements(
+            identifiers_ans, identifiers_output
+        )
+        if max_len != 0:
+            base_element_len = max(len(elements[0]), len(elements[1]))
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+        # base_element_len = max(len(elements[0]), len(elements[1]))
+        # temp_score = max_len/base_element_len
+        # score_list.append(temp_score)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def get_ISM_without_verification(
+    answer_code: str, model_output_list: list, asnwer_name: str
+) -> list:
+    """
+    计算ISM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if asnwer_name not in code:
+            score_list.append(0)
+            continue
+
+        # if asnwer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        identifiers_ans, identifiers_output = get_token(answer_code, code)
+        max_len, elements = longest_common_prefix_between_lists_with_elements(
+            identifiers_ans, identifiers_output
+        )
+        if max_len != 0:
+            base_element_len = max(len(elements[0]), len(elements[1]))
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+        # base_element_len = max(len(elements[0]), len(elements[1]))
+        # temp_score = max_len/base_element_len
+        # score_list.append(temp_score)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def longest_common_prefix_with_lengths(list1, list2):
+    """
+    计算两个二维列表中每个子列表的最长前缀匹配长度，并记录拥有最长前缀匹配长度的两个子列表的长度
+    :param list1: 第一个二维列表
+    :param list2: 第二个二维列表
+    :return: 最长前缀匹配长度以及拥有最长前缀匹配长度的两个子列表的长度
+    """
+    max_length = 0
+    len_list1 = 0
+    len_list2 = 0
+    for i, sublist1 in enumerate(list1):
+        for j, sublist2 in enumerate(list2):
+            match_length = 0
+            min_length = min(len(sublist1), len(sublist2))
+            for k in range(min_length):
+                if sublist1[k] == sublist2[k]:
+                    match_length += 1
+                else:
+                    break
+            if match_length > max_length:
+                max_length = match_length
+                len_list1 = len(sublist1)
+                len_list2 = len(sublist2)
+    return max_length, len_list1, len_list2
+
+
+def get_PM(answer_code: str, model_output_list: list, asnwer_name: str) -> list:
+    """
+    计算PM，返回一个有序的得分列表
+    :return:
+    """
+    score_list = []
+    for code in model_output_list:
+        if '```python' in code:
+            code = code.replace('```python', '')
+            code = code.replace('```', '')
+        if not re.search(rf'\b{re.escape(asnwer_name)}\b', code) or not is_code_valid(
+            code
+        ):
+            # if asnwer_name not in code or is_code_valid(code) == False:
+            score_list.append(0)
+            continue
+
+        # if asnwer_name not in code:
+        #     score_list.append(0)
+        #     continue
+
+        ans_list = get_token_per_line(answer_code)
+        output_token_list = get_token_per_line(code)
+        max_len, len1, len2 = longest_common_prefix_with_lengths(
+            ans_list, output_token_list
+        )
+        base_element_len = max(len1, len2)
+
+        if base_element_len != 0:
+            temp_score = max_len / base_element_len
+            score_list.append(temp_score)
+        else:
+            score_list.append(0)
+
+    score_list = sorted(score_list, reverse=True)
+    return score_list
+
+
+def get_score(score_list: list, k):
+    """
+    计算score@n,k
+    :param score_list:
+    :param k:
+    :return:
+    """
+    n = len(score_list)
+    sum = 0
+    final = n - k + 1
+    for i in range(1, final + 1):
+        sum += math.comb(n - i, k - 1) * score_list[i - 1]
+
+    final_score = sum / math.comb(n, k)
+
+    return final_score
+
+
+k = 1
+task = 'block'  # block or line
+json_name = f'Versicode_{task}_completion.json'
+
+folder_path = f'../data/result_data/{task}_completion'
+model_list = os.listdir(folder_path)
+
+for model in model_list:
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_dict = lodict
+    data_list = data_dict
+    data_len = len(data_list)
+    sum_ISM = 0
+    sum_PM = 0
+
+    for data in data_list:
+        # model_output_list = eval(data['model_output'])
+        model_output_list = eval(data['model_output_clear'])[:1]
+        temp_list = []
+        for o in model_output_list:
+            temp_out = o.replace('```python', '')
+            temp_out = temp_out.replace('```', '')
+            temp_list.append(temp_out)
+        model_output_list = temp_list
+        answer_code = data['code']
+        answer_name = data['core_token']
+        #
+        # answer_code = data['new_code']  #code editing
+        # answer_name = data['new_name']    #code editing
+
+        # answer_code = data['old_code']  # code editing new to old
+        # answer_name = data['old_name']  # code editing new to old
+        #
+        ISM_score_list = get_ISM(answer_code, model_output_list, answer_name)
+        # ISM_score_without_verification_list = get_ISM_without_verification(answer_code, model_output_list, answer_name)     #新增
+        PM_score_list = get_PM(answer_code, model_output_list, answer_name)
+
+        # if not ISM_score_without_verification_list == ISM_score_list:#新增
+        #     for s in ISM_score_list:#新增
+        #         if s != ISM_score_without_verification_list[ISM_score_list.index(s)]:#新增
+        #             print('元数据如下')#新增
+        #             print(data)#新增
+        #             print('答案如下')#新增
+        #             print(model_output_list[ISM_score_list.index(s)])#新增
+
+        # flag = int(input('输入1继续，0退出'))#新增
+        # if flag == 1:
+        #     continue
+
+        ISM_score = get_score(ISM_score_list, k)
+        PM_score = get_score(PM_score_list, k)
+
+        sum_ISM += ISM_score
+        sum_PM += PM_score
+        # print(f"ISM分数：{ISM_score}")
+        # print(f"PM分数：{PM_score}")
+
+    print(f'{model}, {task} completion task, ISM@{k} score: {sum_ISM / data_len}')
+    print(f'{model}, {task} completion task, PM@{k} score: {sum_PM / data_len}')
+
+
+# def get_token(ans_code:str, output_code:str):
+#     """
+#     对代码进行词法分析，分解成标识符，返回两个标识符列表
+#     :param ans_code:
+#     :param output_code:
+#     :return:
+#     """
+#     tokens_ans = tokenize.tokenize(io.BytesIO(ans_code.encode('utf-8')).readline)
+#     tokens_output = tokenize.tokenize(io.BytesIO(output_code.encode('utf-8')).readline)
+#     identifiers_ans = []
+#     identifiers_output = []
+#     for token in tokens_ans:
+#         if token.type == tokenize.NAME:
+#             identifiers_ans.append(token.string)
+#
+#     for to in tokens_output:
+#         if to.type == tokenize.NAME:
+#             identifiers_output.append(to.string)
+#
+#     return identifiers_ans, identifiers_output
--- a/evaluation/benchmarks/versicode/metric/compute_migration_cdc_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_migration_cdc_score.py
@@ -0,0 +1,198 @@
+"""
+Calculate the cdc score for migration
+"""
+
+import json
+import math
+import os
+import re
+
+# warnings.filterwarnings("ignore", category=SyntaxWarning)
+
+
+def is_correct_parameter_count(function_name, correct_code, test_code):
+    """
+    判断参数数量是否一致
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 获取正确代码中的参数数量
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+        expected_count = len(correct_param_list)
+    else:
+        expected_count = 0  # 如果没有参数，期望数量为0
+
+    # 在需要判断的代码中查找函数调用
+    test_match = re.search(pattern, test_code)
+
+    if test_match:
+        test_params = test_match.group(1).strip()
+        test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+        return len(test_param_list) == expected_count  # 检查参数数量
+    else:
+        # 如果没有括号，检查函数名是否在字符串中
+        return expected_count == 0 and function_name in test_code
+
+
+def check_keyword_parameters(function_name, correct_code, test_code):
+    """
+    判断关键词参数赋值是否正确使用
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 正则表达式匹配正确代码中的函数调用
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+
+        # 检查待检测代码中的函数调用
+        test_match = re.search(pattern, test_code)
+
+        if test_match:
+            test_params = test_match.group(1).strip()
+            test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+
+            # 确保待检测的每个参数都以关键字参数形式赋值
+            for correct_param in correct_param_list:
+                if '=' in correct_param:  # 仅当正确代码中有关键词参数
+                    param_name = correct_param.split('=')[0].strip()
+                    if not any(
+                        param_name in test_param and '=' in test_param
+                        for test_param in test_param_list
+                    ):
+                        return False  # 如果对应参数不是关键词参数，则返回False
+
+            return True  # 所有关键字参数匹配
+
+    return False  # 如果没有匹配，返回False
+
+
+def with_correct(answer_code: str, model_output: str) -> bool:
+    """
+    当answer是with结构时，判断模型生成的是不是with结构
+    :param answer_code:
+    :param model_output:
+    :return:
+    """
+    # return True
+    if not answer_code.startswith('with') and not model_output.startswith('with'):
+        return True
+    elif answer_code.startswith('with') and model_output.startswith('with'):
+        return True
+    else:
+        return False
+
+
+def compute_block_score_k(
+    answer: str,
+    model_output: list,
+    k: int,
+    model_filled_code,
+    core_line_in_core_block,
+    core_line_in_output_clear,
+):
+    """
+    cdc需要满足五个条件，em只需要满足第一个条件
+    """
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if (
+            re.search(rf'\b{re.escape(answer)}\b', code)
+            and is_code_valid(model_filled_code[index])
+            and is_correct_parameter_count(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+            and with_correct(core_line_in_core_block, core_line_in_output_clear[index])
+            and check_keyword_parameters(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+        ):  # block
+            # if re.search(rf'\b{re.escape(answer)}\b', code):#block
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def compute_score_k(answer: str, model_output: list, k: int):
+    c = 0
+    n = len(model_output)
+    for output in model_output:
+        if '```python' in output:
+            output = output.replace('```python', '')
+            output = output.replace('```', '')
+        # if answer == output:
+
+        if re.search(rf'\b{re.escape(answer)}\b', output) and is_code_valid(output):
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+k = 1  # cdc@k
+json_name = 'VersiCode_migration.json'
+task = 'migration'
+folder_path = '../data/result_data/code_migration'
+
+model_list = os.listdir(folder_path)
+for model in model_list:
+    # if model != 'gpt-4o':
+    #     continue
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_list = lodict
+
+    score_list = []
+    for data in data_list:
+        answer = data['new_name']  # old -> new
+        model_output = data['model_output_clear']  # old -> new
+
+        model_filled_code = model_output
+        # core_line_in_core_block = data['core_line_in_new_core_block']# old -> new
+        core_line_in_core_block = data['core_line_in_code']  # old -> new
+        core_line_in_output_clear = data['core_line_in_output_clear']  # old -> new
+
+        score_list.append(
+            compute_block_score_k(
+                answer,
+                model_output,
+                k,
+                model_filled_code,
+                core_line_in_core_block,
+                core_line_in_output_clear,
+            )
+        )
+
+    final_score = sum(score_list) / len(score_list)
+    print(f'{model}, {task} task, cdc@{k} score: {final_score}')
--- a/evaluation/benchmarks/versicode/metric/compute_versicode_cdc_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_versicode_cdc_score.py
@@ -0,0 +1,225 @@
+"""
+Calculate the cdc score for line and block
+"""
+
+import json
+import math
+import os
+import re
+
+# warnings.filterwarnings("ignore", category=SyntaxWarning)
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def is_correct_parameter_count(function_name, correct_code, test_code):
+    """
+    判断参数数量是否一致
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 获取正确代码中的参数数量
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+        expected_count = len(correct_param_list)
+    else:
+        expected_count = 0  # 如果没有参数，期望数量为0
+
+    # 在需要判断的代码中查找函数调用
+    test_match = re.search(pattern, test_code)
+
+    if test_match:
+        test_params = test_match.group(1).strip()
+        test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+        return len(test_param_list) == expected_count  # 检查参数数量
+    else:
+        # 如果没有括号，检查函数名是否在字符串中
+        return expected_count == 0 and function_name in test_code
+
+
+def check_keyword_parameters(function_name, correct_code, test_code):
+    """
+    判断关键词参数赋值是否正确使用
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 正则表达式匹配正确代码中的函数调用
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+
+        # 检查待检测代码中的函数调用
+        test_match = re.search(pattern, test_code)
+
+        if test_match:
+            test_params = test_match.group(1).strip()
+            test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+
+            # 确保待检测的每个参数都以关键字参数形式赋值
+            for correct_param in correct_param_list:
+                if '=' in correct_param:  # 仅当正确代码中有关键词参数
+                    param_name = correct_param.split('=')[0].strip()
+                    if not any(
+                        param_name in test_param and '=' in test_param
+                        for test_param in test_param_list
+                    ):
+                        return False  # 如果对应参数不是关键词参数，则返回False
+
+            return True  # 所有关键字参数匹配
+
+    return False  # 如果没有匹配，返回False
+
+
+def with_correct(answer_code: str, model_output: str) -> bool:
+    """
+    当answer是with结构时，判断模型生成的是不是with结构
+    :param answer_code:
+    :param model_output:
+    :return:
+    """
+    # return True
+    if not answer_code.startswith('with') and not model_output.startswith('with'):
+        return True
+    elif answer_code.startswith('with') and model_output.startswith('with'):
+        return True
+    else:
+        return False
+
+
+def compute_line_score_k(
+    answer: str, model_output: list, k: int, model_filled_code, core_line
+):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if (
+            re.search(rf'\b{re.escape(answer)}\b', code)
+            and is_code_valid(model_filled_code[index])
+            and is_correct_parameter_count(answer, core_line, code)
+            and with_correct(core_line, code)
+            and check_keyword_parameters(answer, core_line, code)
+        ):  # line
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def compute_block_score_k(
+    answer: str,
+    model_output: list,
+    k: int,
+    model_filled_code,
+    core_line_in_core_block,
+    core_line_in_output_clear,
+):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if (
+            re.search(rf'\b{re.escape(answer)}\b', code)
+            and is_code_valid(model_filled_code[index])
+            and is_correct_parameter_count(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+            and with_correct(core_line_in_core_block, core_line_in_output_clear[index])
+            and check_keyword_parameters(
+                answer, core_line_in_core_block, core_line_in_output_clear[index]
+            )
+        ):  # block
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def compute_score_k(answer: str, model_output: list, k: int):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if re.search(rf'\b{re.escape(answer)}\b', code) and is_code_valid(
+            code
+        ):  # block
+            # if re.search(rf'\b{re.escape(answer)}\b', code):#line
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+k = 3  # cdc@k
+task = 'block'  # line or block
+json_name = f'Versicode_{task}_completion.json'
+
+folder_path = f'../data/result_data/{task}_completion'
+model_list = os.listdir(folder_path)
+
+for model in model_list:
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_list = lodict
+
+    if task == 'line':
+        score_list = []
+        for data in data_list:
+            answer = data['core_token']
+            model_output = eval(data['model_output_clear'])
+            model_filled_code = [
+                data['masked_code'].replace('<mask>', i) for i in model_output
+            ]
+            core_line = data['core_line']
+            score_list.append(
+                compute_line_score_k(
+                    answer, model_output, k, model_filled_code, core_line
+                )
+            )
+    else:
+        score_list = []
+        for data in data_list:
+            answer = data['core_token']
+            model_output = eval(data['model_output_clear'])
+            model_filled_code = eval(data['model_output_clear'])
+            core_line = data['core_line']
+            core_line_in_output_clear = data['core_line_in_output_clear']
+            score_list.append(
+                compute_block_score_k(
+                    answer,
+                    model_output,
+                    k,
+                    model_filled_code,
+                    core_line,
+                    core_line_in_output_clear,
+                )
+            )
+
+    final_score = sum(score_list) / len(score_list)
+    print(f'{model}, {task} completion task, cdc@{k} score: {final_score}')
--- a/evaluation/benchmarks/versicode/metric/compute_versicode_em_score.py
+++ b/evaluation/benchmarks/versicode/metric/compute_versicode_em_score.py
@@ -0,0 +1,209 @@
+"""
+Calculate the cdc score for line and block
+"""
+
+import json
+import math
+import os
+import re
+
+# warnings.filterwarnings("ignore", category=SyntaxWarning)
+
+
+def is_code_valid(code):
+    try:
+        compile(code, '<string>', 'exec')
+        return True
+    except Exception:
+        return False
+
+
+def is_correct_parameter_count(function_name, correct_code, test_code):
+    """
+    判断参数数量是否一致
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 获取正确代码中的参数数量
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+        expected_count = len(correct_param_list)
+    else:
+        expected_count = 0  # 如果没有参数，期望数量为0
+
+    # 在需要判断的代码中查找函数调用
+    test_match = re.search(pattern, test_code)
+
+    if test_match:
+        test_params = test_match.group(1).strip()
+        test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+        return len(test_param_list) == expected_count  # 检查参数数量
+    else:
+        # 如果没有括号，检查函数名是否在字符串中
+        return expected_count == 0 and function_name in test_code
+
+
+def check_keyword_parameters(function_name, correct_code, test_code):
+    """
+    判断关键词参数赋值是否正确使用
+    :param function_name:
+    :param correct_code:
+    :param test_code:
+    :return:
+    """
+    # 正则表达式匹配正确代码中的函数调用
+    # return True
+    pattern = rf'{function_name}\((.*?)\)'
+    correct_match = re.search(pattern, correct_code)
+
+    if correct_match:
+        correct_params = correct_match.group(1).strip()
+        correct_param_list = [p.strip() for p in correct_params.split(',') if p.strip()]
+
+        # 检查待检测代码中的函数调用
+        test_match = re.search(pattern, test_code)
+
+        if test_match:
+            test_params = test_match.group(1).strip()
+            test_param_list = [p.strip() for p in test_params.split(',') if p.strip()]
+
+            # 确保待检测的每个参数都以关键字参数形式赋值
+            for correct_param in correct_param_list:
+                if '=' in correct_param:  # 仅当正确代码中有关键词参数
+                    param_name = correct_param.split('=')[0].strip()
+                    if not any(
+                        param_name in test_param and '=' in test_param
+                        for test_param in test_param_list
+                    ):
+                        return False  # 如果对应参数不是关键词参数，则返回False
+
+            return True  # 所有关键字参数匹配
+
+    return False  # 如果没有匹配，返回False
+
+
+def with_correct(answer_code: str, model_output: str) -> bool:
+    """
+    当answer是with结构时，判断模型生成的是不是with结构
+    :param answer_code:
+    :param model_output:
+    :return:
+    """
+    # return True
+    if not answer_code.startswith('with') and not model_output.startswith('with'):
+        return True
+    elif answer_code.startswith('with') and model_output.startswith('with'):
+        return True
+    else:
+        return False
+
+
+def compute_line_score_k(
+    answer: str, model_output: list, k: int, model_filled_code, core_line
+):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if re.search(rf'\b{re.escape(answer)}\b', code):  # line
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def compute_block_score_k(
+    answer: str,
+    model_output: list,
+    k: int,
+    model_filled_code,
+    core_line_in_core_block,
+    core_line_in_output_clear,
+):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if re.search(rf'\b{re.escape(answer)}\b', code):  # block
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+def compute_score_k(answer: str, model_output: list, k: int):
+    c = 0
+    n = len(model_output)
+    for index, code in enumerate(model_output):
+        if re.search(rf'\b{re.escape(answer)}\b', code) and is_code_valid(
+            code
+        ):  # block
+            # if re.search(rf'\b{re.escape(answer)}\b', code):#line
+            c += 1
+    if n - c < k:
+        return 1.0
+
+    score = 1 - (math.comb(n - c, k)) / (math.comb(n, k))
+
+    return score
+
+
+k = 3  # em@k
+task = 'block'  # line or block
+json_name = f'Versicode_{task}_completion.json'
+
+folder_path = f'../data/result_data/{task}_completion'
+model_list = os.listdir(folder_path)
+
+for model in model_list:
+    model_json_path = os.path.join(folder_path, model, json_name)
+    with open(model_json_path, 'r', encoding='utf-8') as fr:
+        lodict = json.load(fr)
+    data_list = lodict
+
+    if task == 'line':
+        score_list = []
+        for data in data_list:
+            answer = data['core_token']
+            model_output = eval(data['model_output_clear'])
+            model_filled_code = [
+                data['masked_code'].replace('<mask>', i) for i in model_output
+            ]
+            core_line = data['core_line']
+            score_list.append(
+                compute_line_score_k(
+                    answer, model_output, k, model_filled_code, core_line
+                )
+            )
+    else:
+        score_list = []
+        for data in data_list:
+            answer = data['core_token']
+            model_output = eval(data['model_output_clear'])
+            model_filled_code = eval(data['model_output_clear'])
+            core_line = data['core_line']
+            core_line_in_output_clear = data['core_line_in_output_clear']
+            score_list.append(
+                compute_block_score_k(
+                    answer,
+                    model_output,
+                    k,
+                    model_filled_code,
+                    core_line,
+                    core_line_in_output_clear,
+                )
+            )
+
+    final_score = sum(score_list) / len(score_list)
+    print(f'{model}, {task} completion task, em@{k} score: {final_score}')
--- a/evaluation/benchmarks/versicode/output_processing/choose_core_line_from_block_versicode.py
+++ b/evaluation/benchmarks/versicode/output_processing/choose_core_line_from_block_versicode.py
@@ -0,0 +1,99 @@
+"""
+Find the line of code generated by the model using the block in the version code
+"""
+
+import json
+import os
+import random
+import re
+
+
+def process_line_mask(code_snippet, core_token):
+    if not core_token:
+        return None, None
+
+    replaced_lines = {}
+    lines = code_snippet.split('\n')
+
+    in_multi_line_comment = False
+
+    for i, line in enumerate(lines):
+        if in_multi_line_comment:
+            if ('"""' in line or "'''" in line) and not re.findall(
+                r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line
+            ):
+                in_multi_line_comment = False
+            continue
+        elif line.strip().startswith('#'):
+            continue
+        elif re.findall(r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line):
+            continue
+        elif ('"""' in line or "'''" in line) and not re.findall(
+            r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line
+        ):
+            in_multi_line_comment = True
+            continue
+        else:
+            if re.search(r'\bdef\s+task_function\b', line):
+                continue
+
+            if re.search(r'\b{}\b(?!\s*=)'.format(re.escape(core_token)), line):
+                replaced_lines.update({i: line})
+
+    if replaced_lines:
+        random_line_location = random.choice(list(replaced_lines.keys()))
+
+        masked_line = lines[random_line_location]
+        leading_spaces = re.match(r'^\s*', masked_line).group(0)
+        masked_line = masked_line.strip()
+        lines[random_line_location] = leading_spaces + '<line_mask>'
+
+        masked_code = '\n'.join(lines)
+
+        return masked_code, masked_line
+
+    return None, None
+
+
+def load_json(file_path):
+    with open(file_path, 'r', encoding='utf-8') as f:
+        data = json.load(f)
+    return data
+
+
+def save_json(file_path, data):
+    with open(file_path, 'w', encoding='utf-8') as f:
+        json.dump(data, f, ensure_ascii=False, indent=4)
+
+
+if __name__ == '__main__':
+    model_list = os.listdir('../data/result_data/block_completion')
+    for model in model_list:
+        input_json_file = f'../data/result_data/block_completion/{model}/VersiCode_block_completion.json'
+        output_json_file = input_json_file
+        data = load_json(input_json_file)
+
+        for item in data:
+            core_token = item['core_token']
+            code = item['code']
+
+            _, core_line_in_code = process_line_mask(code, core_token)
+            if core_line_in_code:
+                item['core_line_in_code'] = core_line_in_code
+            else:
+                item['core_line_in_code'] = 'N/A'
+
+            model_output_clear = item['model_output_clear']
+            core_line_in_output_list = []
+
+            for entry in eval(model_output_clear):
+                _, core_line_in_output = process_line_mask(entry, core_token)
+                if core_line_in_output:
+                    core_line_in_output_list.append(core_line_in_output)
+                else:
+                    core_line_in_output_list.append('N/A')
+
+            item['core_line_in_output_clear'] = core_line_in_output_list
+
+        save_json(output_json_file, data)
+        print('Done!')
--- a/evaluation/benchmarks/versicode/output_processing/choose_core_line_from_migration_versicode.py
+++ b/evaluation/benchmarks/versicode/output_processing/choose_core_line_from_migration_versicode.py
@@ -0,0 +1,102 @@
+"""
+Find the line of code generated by the model using the block in the version code
+"""
+
+import json
+import os
+import random
+import re
+
+
+def process_line_mask(code_snippet, core_token):
+    if not core_token:
+        return None, None
+
+    replaced_lines = {}
+    lines = code_snippet.split('\n')
+
+    in_multi_line_comment = False
+
+    for i, line in enumerate(lines):
+        if in_multi_line_comment:
+            if ('"""' in line or "'''" in line) and not re.findall(
+                r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line
+            ):
+                in_multi_line_comment = False
+            continue
+        elif line.strip().startswith('#'):
+            continue
+        elif re.findall(r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line):
+            continue
+        elif ('"""' in line or "'''" in line) and not re.findall(
+            r"'''(.*?)'''|\"\"\"(.*?)\"\"\"", line
+        ):
+            in_multi_line_comment = True
+            continue
+        else:
+            if re.search(r'\bdef\s+task_function\b', line):
+                continue
+
+            if re.search(r'\b{}\b(?!\s*=)'.format(re.escape(core_token)), line):
+                replaced_lines.update({i: line})
+
+    if replaced_lines:
+        random_line_location = random.choice(list(replaced_lines.keys()))
+
+        masked_line = lines[random_line_location]
+        leading_spaces = re.match(r'^\s*', masked_line).group(0)
+        masked_line = masked_line.strip()
+        lines[random_line_location] = leading_spaces + '<line_mask>'
+
+        masked_code = '\n'.join(lines)
+
+        return masked_code, masked_line
+
+    return None, None
+
+
+def load_json(file_path):
+    with open(file_path, 'r', encoding='utf-8') as f:
+        data = json.load(f)
+    return data
+
+
+def save_json(file_path, data):
+    with open(file_path, 'w', encoding='utf-8') as f:
+        json.dump(data, f, ensure_ascii=False, indent=4)
+
+
+if __name__ == '__main__':
+    model_list = os.listdir('../data/result_data/code_migration')
+    for model in model_list:
+        input_json_file = (
+            f'../data/result_data/code_migration/{model}/VersiCode_migration.json'
+        )
+        output_json_file = input_json_file
+        data = load_json(input_json_file)
+
+        for item in data:
+            core_token = item['old_name']
+            code = item['old_code']
+
+            _, core_line_in_code = process_line_mask(code, core_token)
+            if core_line_in_code:
+                item['core_line_in_code'] = core_line_in_code
+            else:
+                item['core_line_in_code'] = 'N/A'
+
+            model_output_clear = item['model_output_clear']
+            core_line_in_output_list = []
+
+            core_token = item['new_name']
+            for entry in eval(model_output_clear):
+                _, core_line_in_output = process_line_mask(entry, core_token)
+                if core_line_in_output:
+                    core_line_in_output_list.append(core_line_in_output)
+                else:
+                    core_line_in_output_list.append('N/A')
+
+            item['core_line_in_output_clear'] = core_line_in_output_list
+
+        save_json(output_json_file, data)
+        print('Done!')
--- a/evaluation/benchmarks/versicode/output_processing/clear_ans.py
+++ b/evaluation/benchmarks/versicode/output_processing/clear_ans.py
@@ -0,0 +1,38 @@
+"""
+Clear the<start>and<end>generated by the model in inference
+"""
+
+import json
+
+model_name = ''
+task = 'block_completion'
+
+result_path = f'../data/result_data/{task}/{model_name}/VersiCode_block_completion.json'  # Modify the file according to the task format
+
+
+with open(result_path, 'r', encoding='utf-8') as fr:
+    lodict = json.load(fr)
+data_dict = lodict
+data_list = data_dict
+
+for data in data_list:
+    temp_list = []
+    model_output_list = eval(data['model_output'])
+    for output in model_output_list:
+        if '<start>' in output and '<end>' in output:
+            start_index = output.find('<start>') + len('<start>')
+            end_index = output.find('<end>')
+            content = (
+                output[start_index:end_index]
+                .replace('```python', '')
+                .replace('```', '')
+            )
+        else:
+            content = 'no_answer'
+
+        temp_list.append(content)
+
+    data['model_output_clear'] = str(temp_list)
+
+with open(result_path, 'w', encoding='utf-8') as fw:
+    json.dump(data_dict, fw, indent=4, ensure_ascii=False)
--- a/evaluation/benchmarks/versicode/requirements.txt
+++ b/evaluation/benchmarks/versicode/requirements.txt
@@ -0,0 +1,146 @@
+aiohappyeyeballs==2.6.1
+aiohttp==3.11.18
+aiosignal==1.3.2
+airportsdata==20250224
+annotated-types==0.7.0
+anyio==4.9.0
+astor==0.8.1
+attrs==25.3.0
+blake3==1.0.4
+cachetools==5.5.2
+certifi==2025.1.31
+charset-normalizer==3.4.1
+click==8.1.8
+cloudpickle==3.1.1
+compressed-tensors==0.9.3
+cupy-cuda12x==13.4.1
+Deprecated==1.2.18
+depyf==0.18.0
+dill==0.4.0
+diskcache==5.6.3
+distro==1.9.0
+dnspython==2.7.0
+einops==0.8.1
+email_validator==2.2.0
+fastapi==0.115.12
+fastapi-cli==0.0.7
+fastrlock==0.8.3
+filelock==3.18.0
+frozenlist==1.6.0
+fsspec==2025.3.2
+gguf==0.16.2
+googleapis-common-protos==1.70.0
+grpcio==1.71.0
+h11==0.14.0
+hf-xet==1.0.3
+httpcore==1.0.8
+httptools==0.6.4
+httpx==0.28.1
+huggingface-hub==0.30.2
+idna==3.10
+importlib_metadata==8.0.0
+interegular==0.3.3
+Jinja2==3.1.6
+jiter==0.9.0
+jsonschema==4.23.0
+jsonschema-specifications==2024.10.1
+lark==1.2.2
+llguidance==0.7.16
+llvmlite==0.44.0
+lm-format-enforcer==0.10.11
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
+mdurl==0.1.2
+mistral_common==1.5.4
+mpmath==1.3.0
+msgpack==1.1.0
+msgspec==0.19.0
+multidict==6.4.3
+nest-asyncio==1.6.0
+networkx==3.4.2
+ninja==1.11.1.4
+numba==0.61.2
+numpy==2.2.5
+nvidia-cublas-cu12==12.4.5.8
+nvidia-cuda-cupti-cu12==12.4.127
+nvidia-cuda-nvrtc-cu12==12.4.127
+nvidia-cuda-runtime-cu12==12.4.127
+nvidia-cudnn-cu12==9.1.0.70
+nvidia-cufft-cu12==11.2.1.3
+nvidia-curand-cu12==10.3.5.147
+nvidia-cusolver-cu12==11.6.1.9
+nvidia-cusparse-cu12==12.3.1.170
+nvidia-cusparselt-cu12==0.6.2
+nvidia-nccl-cu12==2.21.5
+nvidia-nvjitlink-cu12==12.4.127
+nvidia-nvtx-cu12==12.4.127
+openai==1.75.0
+opencv-python-headless==4.11.0.86
+opentelemetry-api==1.26.0
+opentelemetry-exporter-otlp==1.26.0
+opentelemetry-exporter-otlp-proto-common==1.26.0
+opentelemetry-exporter-otlp-proto-grpc==1.26.0
+opentelemetry-exporter-otlp-proto-http==1.26.0
+opentelemetry-proto==1.26.0
+opentelemetry-sdk==1.26.0
+opentelemetry-semantic-conventions==0.47b0
+opentelemetry-semantic-conventions-ai==0.4.3
+outlines==0.1.11
+outlines_core==0.1.26
+packaging==25.0
+partial-json-parser==0.2.1.1.post5
+pillow==11.2.1
+prometheus-fastapi-instrumentator==7.1.0
+prometheus_client==0.21.1
+propcache==0.3.1
+protobuf==4.25.6
+psutil==7.0.0
+py-cpuinfo==9.0.0
+pycountry==24.6.1
+pydantic==2.11.3
+pydantic_core==2.33.1
+Pygments==2.19.1
+python-dotenv==1.1.0
+python-json-logger==3.3.0
+python-multipart==0.0.20
+PyYAML==6.0.2
+pyzmq==26.4.0
+ray==2.43.0
+referencing==0.36.2
+regex==2024.11.6
+requests==2.32.3
+rich==14.0.0
+rich-toolkit==0.14.1
+rpds-py==0.24.0
+safetensors==0.5.3
+scipy==1.15.2
+sentencepiece==0.2.0
+setuptools==75.8.0
+shellingham==1.5.4
+six==1.17.0
+sniffio==1.3.1
+starlette==0.46.2
+sympy==1.13.1
+tiktoken==0.9.0
+tokenizers==0.21.1
+torch==2.6.0
+torchaudio==2.6.0
+torchvision==0.21.0
+tqdm==4.67.1
+transformers==4.51.3
+triton==3.2.0
+typer==0.15.2
+typing-inspection==0.4.0
+typing_extensions==4.13.2
+urllib3==2.4.0
+uvicorn==0.34.2
+uvloop==0.21.0
+vllm==0.8.4
+watchfiles==1.0.5
+websockets==15.0.1
+wheel==0.45.1
+wrapt==1.17.2
+xformers==0.0.29.post2
+xgrammar==0.1.18
+yarl==1.20.0
+zipp==3.21.0
--- a/evaluation/benchmarks/webarena/run_infer.py
+++ b/evaluation/benchmarks/webarena/run_infer.py
@@ -212,7 +212,7 @@ if __name__ == '__main__':
    llm_config = None
    if args.llm_config:
        llm_config = get_llm_config_arg(args.llm_config)
-        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        # modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
        llm_config.modify_params = False
    if llm_config is None:
        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
--- a/evaluation/utils/shared.py
+++ b/evaluation/utils/shared.py
@@ -263,8 +263,19 @@ def prepare_dataset(
            f'Randomly sampling {eval_n_limit} unique instances with random seed 42.'
        )

+    def make_serializable(instance: pd.Series) -> dict:
+        import numpy as np
+
+        instance_dict = instance.to_dict()
+        for k, v in instance_dict.items():
+            if isinstance(v, np.ndarray):
+                instance_dict[k] = v.tolist()
+            elif isinstance(v, pd.Timestamp):
+                instance_dict[k] = str(v)
+        return instance_dict
+
    new_dataset = [
-        instance
+        make_serializable(instance)
        for _, instance in dataset.iterrows()
        if str(instance[id_column]) not in finished_ids
    ]
--- a/frontend/tests/components/features/conversation-panel/conversation-card.test.tsx
+++ b/frontend/tests/components/features/conversation-panel/conversation-card.test.tsx
@@ -478,7 +478,7 @@ describe("ConversationCard", () => {
          title="Conversation 1"
          selectedRepository={null}
          lastUpdatedAt="2021-10-01T12:00:00Z"
-          status="RUNNING"
+          conversationStatus="RUNNING"
        />,
      );

--- a/frontend/tests/components/features/conversation-panel/conversation-panel.test.tsx
+++ b/frontend/tests/components/features/conversation-panel/conversation-panel.test.tsx
@@ -48,6 +48,7 @@ describe("ConversationPanel", () => {
      last_updated_at: "2021-10-01T12:00:00Z",
      created_at: "2021-10-01T12:00:00Z",
      status: "STOPPED" as const,
+      runtime_status: null,
      url: null,
      session_api_key: null,
    },
@@ -60,6 +61,7 @@ describe("ConversationPanel", () => {
      last_updated_at: "2021-10-02T12:00:00Z",
      created_at: "2021-10-02T12:00:00Z",
      status: "STOPPED" as const,
+      runtime_status: null,
      url: null,
      session_api_key: null,
    },
@@ -72,6 +74,7 @@ describe("ConversationPanel", () => {
      last_updated_at: "2021-10-03T12:00:00Z",
      created_at: "2021-10-03T12:00:00Z",
      status: "STOPPED" as const,
+      runtime_status: null,
      url: null,
      session_api_key: null,
    },
@@ -158,6 +161,7 @@ describe("ConversationPanel", () => {
        last_updated_at: "2021-10-01T12:00:00Z",
        created_at: "2021-10-01T12:00:00Z",
        status: "STOPPED" as const,
+        runtime_status: null,
        url: null,
        session_api_key: null,
      },
@@ -170,6 +174,7 @@ describe("ConversationPanel", () => {
        last_updated_at: "2021-10-02T12:00:00Z",
        created_at: "2021-10-02T12:00:00Z",
        status: "STOPPED" as const,
+        runtime_status: null,
        url: null,
        session_api_key: null,
      },
@@ -182,6 +187,7 @@ describe("ConversationPanel", () => {
        last_updated_at: "2021-10-03T12:00:00Z",
        created_at: "2021-10-03T12:00:00Z",
        status: "STOPPED" as const,
+        runtime_status: null,
        url: null,
        session_api_key: null,
      },
--- a/frontend/tests/components/features/settings/api-keys-manager.test.tsx
+++ b/frontend/tests/components/features/settings/api-keys-manager.test.tsx
@@ -16,8 +16,8 @@ vi.mock("react-i18next", async () => {
      if (i18nKey === "SETTINGS$API_KEYS_DESCRIPTION") {
        return (
          <span>
-            API keys allow you to authenticate with the OpenHands API programmatically. 
-            Keep your API keys secure; anyone with your API key can access your account. 
+            API keys allow you to authenticate with the OpenHands API programmatically.
+            Keep your API keys secure; anyone with your API key can access your account.
            For more information on how to use the API, see our {components.a}
          </span>
        );
@@ -48,7 +48,7 @@ describe("ApiKeysManager", () => {

  it("should render the API documentation link", () => {
    renderComponent();
-    
+
    // Find the link to the API documentation
    const link = screen.getByRole("link");
    expect(link).toBeInTheDocument();
@@ -56,4 +56,4 @@ describe("ApiKeysManager", () => {
    expect(link).toHaveAttribute("target", "_blank");
    expect(link).toHaveAttribute("rel", "noopener noreferrer");
  });
-});
+});
--- a/frontend/tests/context/ws-client-provider.test.tsx
+++ b/frontend/tests/context/ws-client-provider.test.tsx
@@ -65,6 +65,7 @@ describe("WsClientProvider", () => {
        last_updated_at: "2021-10-01T12:00:00Z",
        created_at: "2021-10-01T12:00:00Z",
        status: "RUNNING" as const,
+        runtime_status: "STATUS$READY",
        url: null,
        session_api_key: null,
      }}},
--- a/frontend/tests/routes/home-screen.test.tsx
+++ b/frontend/tests/routes/home-screen.test.tsx
@@ -334,10 +334,7 @@ describe("Settings 404", () => {

    renderHomeScreen();

-    // small hack to wait for the modal to not appear
-    await expect(
-      screen.findByTestId("ai-config-modal", {}, { timeout: 1000 }),
-    ).rejects.toThrow();
+    expect(screen.queryByTestId("ai-config-modal")).not.toBeInTheDocument();
  });
 });

--- a/frontend/tests/utils/check-home-hardcoded-strings.test.tsx
+++ b/frontend/tests/utils/check-home-hardcoded-strings.test.tsx
@@ -39,4 +39,4 @@ describe("Check for hardcoded English strings in Home components", () => {
      expect(text).not.toContain(str);
    });
  });
-});
+});
--- a/frontend/hero.ts
+++ b/frontend/hero.ts
@@ -0,0 +1,18 @@
+import { heroui } from "@heroui/react";
+
+export default heroui({
+  defaultTheme: "dark",
+  layout: {
+    radius: {
+      small: "5px",
+      large: "20px",
+    },
+  },
+  themes: {
+    dark: {
+      colors: {
+        primary: "#4465DB",
+      },
+    },
+  },
+});
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -1,37 +1,39 @@
 {
  "name": "openhands-frontend",
-  "version": "0.41.0",
+  "version": "0.43.0",
  "private": true,
  "type": "module",
  "engines": {
    "node": ">=20.0.0"
  },
  "dependencies": {
-    "@heroui/react": "2.7.8",
+    "@heroui/react": "^2.8.0-beta.7",
    "@microlink/react-json-view": "^1.26.2",
    "@monaco-editor/react": "^4.7.0-rc.0",
-    "@react-router/node": "^7.6.1",
-    "@react-router/serve": "^7.6.1",
+    "@react-router/node": "^7.6.2",
+    "@react-router/serve": "^7.6.2",
    "@react-types/shared": "^3.29.1",
    "@reduxjs/toolkit": "^2.8.2",
    "@stripe/react-stripe-js": "^3.7.0",
-    "@stripe/stripe-js": "^7.3.0",
-    "@tanstack/react-query": "^5.77.2",
-    "@vitejs/plugin-react": "^4.4.0",
+    "@stripe/stripe-js": "^7.3.1",
+    "@tailwindcss/postcss": "^4.1.10",
+    "@tailwindcss/vite": "^4.1.10",
+    "@tanstack/react-query": "^5.80.7",
+    "@vitejs/plugin-react": "^4.5.2",
    "@xterm/addon-fit": "^0.10.0",
    "@xterm/xterm": "^5.4.0",
    "axios": "^1.9.0",
    "clsx": "^2.1.1",
    "eslint-config-airbnb-typescript": "^18.0.0",
-    "framer-motion": "^12.14.0",
+    "framer-motion": "^12.17.3",
    "i18next": "^25.2.1",
-    "i18next-browser-languagedetector": "^8.1.0",
+    "i18next-browser-languagedetector": "^8.2.0",
    "i18next-http-backend": "^3.0.2",
    "isbot": "^5.1.28",
    "jose": "^6.0.11",
-    "lucide-react": "^0.511.0",
+    "lucide-react": "^0.514.0",
    "monaco-editor": "^0.52.2",
-    "posthog-js": "^1.245.2",
+    "posthog-js": "^1.251.0",
    "react": "^19.1.0",
    "react-dom": "^19.1.0",
    "react-highlight": "^0.15.0",
@@ -40,15 +42,15 @@
    "react-icons": "^5.5.0",
    "react-markdown": "^10.1.0",
    "react-redux": "^9.2.0",
-    "react-router": "^7.6.1",
+    "react-router": "^7.6.2",
    "react-syntax-highlighter": "^15.6.1",
    "react-textarea-autosize": "^8.5.9",
    "remark-gfm": "^4.0.1",
    "sirv-cli": "^3.0.1",
    "socket.io-client": "^4.8.1",
-    "tailwind-merge": "^3.3.0",
+    "tailwind-merge": "^3.3.1",
    "vite": "^6.3.5",
-    "web-vitals": "^5.0.1",
+    "web-vitals": "^5.0.3",
    "ws": "^8.18.2"
  },
  "scripts": {
@@ -82,23 +84,23 @@
    "@babel/traverse": "^7.27.1",
    "@babel/types": "^7.27.0",
    "@mswjs/socket.io-binding": "^0.1.1",
-    "@playwright/test": "^1.52.0",
-    "@react-router/dev": "^7.6.1",
+    "@playwright/test": "^1.53.0",
+    "@react-router/dev": "^7.6.2",
    "@tailwindcss/typography": "^0.5.16",
    "@tanstack/eslint-plugin-query": "^5.78.0",
    "@testing-library/dom": "^10.4.0",
    "@testing-library/jest-dom": "^6.6.1",
    "@testing-library/react": "^16.3.0",
    "@testing-library/user-event": "^14.6.1",
-    "@types/node": "^22.15.21",
-    "@types/react": "^19.1.5",
-    "@types/react-dom": "^19.1.5",
+    "@types/node": "^24.0.1",
+    "@types/react": "^19.1.8",
+    "@types/react-dom": "^19.1.6",
    "@types/react-highlight": "^0.12.8",
    "@types/react-syntax-highlighter": "^15.5.13",
    "@types/ws": "^8.18.1",
    "@typescript-eslint/eslint-plugin": "^7.18.0",
    "@typescript-eslint/parser": "^7.18.0",
-    "@vitest/coverage-v8": "^3.1.4",
+    "@vitest/coverage-v8": "^3.2.3",
    "autoprefixer": "^10.4.21",
    "cross-env": "^7.0.3",
    "eslint": "^8.57.0",
@@ -113,12 +115,11 @@
    "eslint-plugin-unused-imports": "^4.1.4",
    "husky": "^9.1.7",
    "jsdom": "^26.1.0",
-    "lint-staged": "^16.0.0",
+    "lint-staged": "^16.1.0",
    "msw": "^2.6.6",
-    "postcss": "^8.5.2",
    "prettier": "^3.5.3",
-    "stripe": "^18.1.1",
-    "tailwindcss": "^3.4.17",
+    "stripe": "^18.2.1",
+    "tailwindcss": "^4.1.8",
    "typescript": "^5.8.3",
    "vite-plugin-svgr": "^4.2.0",
    "vite-tsconfig-paths": "^5.1.4",
--- a/frontend/postcss.config.js
+++ b/frontend/postcss.config.js
@@ -1,6 +1,5 @@
 export default {
  plugins: {
-    tailwindcss: {},
-    autoprefixer: {},
+    "@tailwindcss/postcss": {},
  },
-}
+};
--- a/frontend/public/mockServiceWorker.js
+++ b/frontend/public/mockServiceWorker.js
@@ -5,24 +5,23 @@
 * Mock Service Worker.
 * @see https://github.com/mswjs/msw
 * - Please do NOT modify this file.
- * - Please do NOT serve this file on production.
 */

-const PACKAGE_VERSION = '2.8.4'
-const INTEGRITY_CHECKSUM = '00729d72e3b82faf54ca8b9621dbb96f'
+const PACKAGE_VERSION = '2.10.2'
+const INTEGRITY_CHECKSUM = 'f5825c521429caf22a4dd13b66e243af'
 const IS_MOCKED_RESPONSE = Symbol('isMockedResponse')
 const activeClientIds = new Set()

-self.addEventListener('install', function () {
+addEventListener('install', function () {
  self.skipWaiting()
 })

-self.addEventListener('activate', function (event) {
+addEventListener('activate', function (event) {
  event.waitUntil(self.clients.claim())
 })

-self.addEventListener('message', async function (event) {
-  const clientId = event.source.id
+addEventListener('message', async function (event) {
+  const clientId = Reflect.get(event.source || {}, 'id')

  if (!clientId || !self.clients) {
    return
@@ -94,17 +93,18 @@ self.addEventListener('message', async function (event) {
  }
 })

-self.addEventListener('fetch', function (event) {
-  const { request } = event
-
+addEventListener('fetch', function (event) {
  // Bypass navigation requests.
-  if (request.mode === 'navigate') {
+  if (event.request.mode === 'navigate') {
    return
  }

  // Opening the DevTools triggers the "only-if-cached" request
  // that cannot be handled by the worker. Bypass such requests.
-  if (request.cache === 'only-if-cached' && request.mode !== 'same-origin') {
+  if (
+    event.request.cache === 'only-if-cached' &&
+    event.request.mode !== 'same-origin'
+  ) {
    return
  }

@@ -115,48 +115,62 @@ self.addEventListener('fetch', function (event) {
    return
  }

-  // Generate unique request ID.
  const requestId = crypto.randomUUID()
  event.respondWith(handleRequest(event, requestId))
 })

+/**
+ * @param {FetchEvent} event
+ * @param {string} requestId
+ */
 async function handleRequest(event, requestId) {
  const client = await resolveMainClient(event)
+  const requestCloneForEvents = event.request.clone()
  const response = await getResponse(event, client, requestId)

  // Send back the response clone for the "response:*" life-cycle events.
  // Ensure MSW is active and ready to handle the message, otherwise
  // this message will pend indefinitely.
  if (client && activeClientIds.has(client.id)) {
-    ;(async function () {
-      const responseClone = response.clone()
+    const serializedRequest = await serializeRequest(requestCloneForEvents)

-      sendToClient(
-        client,
-        {
-          type: 'RESPONSE',
-          payload: {
-            requestId,
-            isMockedResponse: IS_MOCKED_RESPONSE in response,
+    // Clone the response so both the client and the library could consume it.
+    const responseClone = response.clone()
+
+    sendToClient(
+      client,
+      {
+        type: 'RESPONSE',
+        payload: {
+          isMockedResponse: IS_MOCKED_RESPONSE in response,
+          request: {
+            id: requestId,
+            ...serializedRequest,
+          },
+          response: {
            type: responseClone.type,
            status: responseClone.status,
            statusText: responseClone.statusText,
-            body: responseClone.body,
            headers: Object.fromEntries(responseClone.headers.entries()),
+            body: responseClone.body,
          },
        },
-        [responseClone.body],
-      )
-    })()
+      },
+      responseClone.body ? [serializedRequest.body, responseClone.body] : [],
+    )
  }

  return response
 }

-// Resolve the main client for the given event.
-// Client that issues a request doesn't necessarily equal the client
-// that registered the worker. It's with the latter the worker should
-// communicate with during the response resolving phase.
+/**
+ * Resolve the main client for the given event.
+ * Client that issues a request doesn't necessarily equal the client
+ * that registered the worker. It's with the latter the worker should
+ * communicate with during the response resolving phase.
+ * @param {FetchEvent} event
+ * @returns {Promise<Client | undefined>}
+ */
 async function resolveMainClient(event) {
  const client = await self.clients.get(event.clientId)

@@ -184,12 +198,16 @@ async function resolveMainClient(event) {
    })
 }

+/**
+ * @param {FetchEvent} event
+ * @param {Client | undefined} client
+ * @param {string} requestId
+ * @returns {Promise<Response>}
+ */
 async function getResponse(event, client, requestId) {
-  const { request } = event
-
  // Clone the request because it might've been already used
  // (i.e. its body has been read and sent to the client).
-  const requestClone = request.clone()
+  const requestClone = event.request.clone()

  function passthrough() {
    // Cast the request headers to a new Headers instance
@@ -230,29 +248,17 @@ async function getResponse(event, client, requestId) {
  }

  // Notify the client that a request has been intercepted.
-  const requestBuffer = await request.arrayBuffer()
+  const serializedRequest = await serializeRequest(event.request)
  const clientMessage = await sendToClient(
    client,
    {
      type: 'REQUEST',
      payload: {
        id: requestId,
-        url: request.url,
-        mode: request.mode,
-        method: request.method,
-        headers: Object.fromEntries(request.headers.entries()),
-        cache: request.cache,
-        credentials: request.credentials,
-        destination: request.destination,
-        integrity: request.integrity,
-        redirect: request.redirect,
-        referrer: request.referrer,
-        referrerPolicy: request.referrerPolicy,
-        body: requestBuffer,
-        keepalive: request.keepalive,
+        ...serializedRequest,
      },
    },
-    [requestBuffer],
+    [serializedRequest.body],
  )

  switch (clientMessage.type) {
@@ -268,6 +274,12 @@ async function getResponse(event, client, requestId) {
  return passthrough()
 }

+/**
+ * @param {Client} client
+ * @param {any} message
+ * @param {Array<Transferable>} transferrables
+ * @returns {Promise<any>}
+ */
 function sendToClient(client, message, transferrables = []) {
  return new Promise((resolve, reject) => {
    const channel = new MessageChannel()
@@ -280,14 +292,18 @@ function sendToClient(client, message, transferrables = []) {
      resolve(event.data)
    }

-    client.postMessage(
-      message,
-      [channel.port2].concat(transferrables.filter(Boolean)),
-    )
+    client.postMessage(message, [
+      channel.port2,
+      ...transferrables.filter(Boolean),
+    ])
  })
 }

-async function respondWithMock(response) {
+/**
+ * @param {Response} response
+ * @returns {Response}
+ */
+function respondWithMock(response) {
  // Setting response status code to 0 is a no-op.
  // However, when responding with a "Response.error()", the produced Response
  // instance will have status code set to 0. Since it's not possible to create
@@ -305,3 +321,24 @@ async function respondWithMock(response) {

  return mockedResponse
 }
+
+/**
+ * @param {Request} request
+ */
+async function serializeRequest(request) {
+  return {
+    url: request.url,
+    mode: request.mode,
+    method: request.method,
+    headers: Object.fromEntries(request.headers.entries()),
+    cache: request.cache,
+    credentials: request.credentials,
+    destination: request.destination,
+    integrity: request.integrity,
+    redirect: request.redirect,
+    referrer: request.referrer,
+    referrerPolicy: request.referrerPolicy,
+    body: await request.arrayBuffer(),
+    keepalive: request.keepalive,
+  }
+}
--- a/frontend/scripts/check-translation-completeness.cjs
+++ b/frontend/scripts/check-translation-completeness.cjs
@@ -60,11 +60,11 @@ Object.entries(translationJson).forEach(([key, translations]) => {
 if (Object.keys(missingTranslations).length > 0) {
  console.error('\x1b[31m%s\x1b[0m', 'ERROR: Missing translations detected');
  console.error(`Found ${Object.keys(missingTranslations).length} translation keys with missing languages:`);
-  
+
  Object.entries(missingTranslations).forEach(([key, langs]) => {
    console.error(`- Key "${key}" is missing translations for: ${langs.join(', ')}`);
  });
-  
+
  console.error('\nPlease add the missing translations before committing.');
 }

@@ -72,11 +72,11 @@ if (Object.keys(missingTranslations).length > 0) {
 if (Object.keys(extraLanguages).length > 0) {
  console.error('\x1b[31m%s\x1b[0m', 'ERROR: Extra languages detected');
  console.error(`Found ${Object.keys(extraLanguages).length} translation keys with extra languages not in AvailableLanguages:`);
-  
+
  Object.entries(extraLanguages).forEach(([key, langs]) => {
    console.error(`- Key "${key}" has translations for unsupported languages: ${langs.join(', ')}`);
  });
-  
+
  console.error('\nPlease remove the extra languages before committing.');
 }

@@ -85,4 +85,4 @@ if (hasErrors) {
  process.exit(1);
 } else {
  console.log('\x1b[32m%s\x1b[0m', 'All translation keys have complete language coverage!');
-}
+}
--- a/frontend/scripts/check-unlocalized-strings.cjs
+++ b/frontend/scripts/check-unlocalized-strings.cjs
@@ -117,6 +117,9 @@ const EXCLUDED_TECHNICAL_STRINGS = [
  "edit-secret-form", // Test ID for secret form
  "search-api-key-input", // Input name for search API key
  "noopener,noreferrer", // Options for window.open
+  "STATUS$READY",
+  "STATUS$STOPPED",
+  "STATUS$ERROR",
 ];

 function isExcludedTechnicalString(str) {
--- a/frontend/src/api/open-hands.ts
+++ b/frontend/src/api/open-hands.ts
@@ -11,6 +11,8 @@ import {
  GetTrajectoryResponse,
  GitChangeDiff,
  GitChange,
+  GetMicroagentsResponse,
+  GetMicroagentPromptResponse,
 } from "./open-hands.types";
 import { openHands } from "./open-hands-axios";
 import { ApiSettings, PostApiSettings, Provider } from "#/types/settings";
@@ -393,6 +395,35 @@ class OpenHands {

    return data;
  }
+
+  /**
+   * Get the available microagents associated with a conversation
+   * @param conversationId The ID of the conversation
+   * @returns The available microagents associated with the conversation
+   */
+  static async getMicroagents(
+    conversationId: string,
+  ): Promise<GetMicroagentsResponse> {
+    const url = `${this.getConversationUrl(conversationId)}/microagents`;
+    const { data } = await openHands.get<GetMicroagentsResponse>(url, {
+      headers: this.getConversationHeaders(),
+    });
+    return data;
+  }
+
+  static async getMicroagentPrompt(
+    conversationId: string,
+    eventId: number,
+  ): Promise<string> {
+    const { data } = await openHands.get<GetMicroagentPromptResponse>(
+      `/api/conversations/${conversationId}/remember_prompt`,
+      {
+        params: { event_id: eventId },
+      },
+    );
+
+    return data.prompt;
+  }
 }

 export default OpenHands;
--- a/frontend/src/api/open-hands.types.ts
+++ b/frontend/src/api/open-hands.types.ts
@@ -1,4 +1,5 @@
-import { ProjectStatus } from "#/components/features/conversation-panel/conversation-state-indicator";
+import { ConversationStatus } from "#/types/conversation-status";
+import { RuntimeStatus } from "#/types/runtime-status";

 export interface ErrorResponse {
  error: string;
@@ -80,7 +81,8 @@ export interface Conversation {
  git_provider: string | null;
  last_updated_at: string;
  created_at: string;
-  status: ProjectStatus;
+  status: ConversationStatus;
+  runtime_status: RuntimeStatus | null;
  trigger?: ConversationTrigger;
  url: string | null;
  session_api_key: string | null;
@@ -102,3 +104,24 @@ export interface GitChangeDiff {
  modified: string;
  original: string;
 }
+
+export interface InputMetadata {
+  name: string;
+  description: string;
+}
+
+export interface Microagent {
+  name: string;
+  type: "repo" | "knowledge";
+  content: string;
+  triggers: string[];
+}
+
+export interface GetMicroagentsResponse {
+  microagents: Microagent[];
+}
+
+export interface GetMicroagentPromptResponse {
+  status: string;
+  prompt: string;
+}
--- a/frontend/src/components/agent-status-map.constant.ts
+++ b/frontend/src/components/agent-status-map.constant.ts
@@ -1,68 +0,0 @@
-import { I18nKey } from "#/i18n/declaration";
-import { AgentState } from "#/types/agent-state";
-
-export enum IndicatorColor {
-  BLUE = "bg-blue-500",
-  GREEN = "bg-green-500",
-  ORANGE = "bg-orange-500",
-  YELLOW = "bg-yellow-500",
-  RED = "bg-red-500",
-  DARK_ORANGE = "bg-orange-800",
-}
-
-export const AGENT_STATUS_MAP: {
-  [k: string]: { message: string; indicator: IndicatorColor };
-} = {
-  [AgentState.INIT]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_INIT_MESSAGE,
-    indicator: IndicatorColor.BLUE,
-  },
-  [AgentState.RUNNING]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_RUNNING_MESSAGE,
-    indicator: IndicatorColor.GREEN,
-  },
-  [AgentState.AWAITING_USER_INPUT]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_AWAITING_USER_INPUT_MESSAGE,
-    indicator: IndicatorColor.BLUE,
-  },
-  [AgentState.PAUSED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_PAUSED_MESSAGE,
-    indicator: IndicatorColor.YELLOW,
-  },
-  [AgentState.LOADING]: {
-    message: I18nKey.CHAT_INTERFACE$INITIALIZING_AGENT_LOADING_MESSAGE,
-    indicator: IndicatorColor.DARK_ORANGE,
-  },
-  [AgentState.STOPPED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_STOPPED_MESSAGE,
-    indicator: IndicatorColor.RED,
-  },
-  [AgentState.FINISHED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_FINISHED_MESSAGE,
-    indicator: IndicatorColor.GREEN,
-  },
-  [AgentState.REJECTED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_REJECTED_MESSAGE,
-    indicator: IndicatorColor.YELLOW,
-  },
-  [AgentState.ERROR]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_ERROR_MESSAGE,
-    indicator: IndicatorColor.RED,
-  },
-  [AgentState.AWAITING_USER_CONFIRMATION]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_AWAITING_USER_CONFIRMATION_MESSAGE,
-    indicator: IndicatorColor.ORANGE,
-  },
-  [AgentState.USER_CONFIRMED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_ACTION_USER_CONFIRMED_MESSAGE,
-    indicator: IndicatorColor.GREEN,
-  },
-  [AgentState.USER_REJECTED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_ACTION_USER_REJECTED_MESSAGE,
-    indicator: IndicatorColor.RED,
-  },
-  [AgentState.RATE_LIMITED]: {
-    message: I18nKey.CHAT_INTERFACE$AGENT_RATE_LIMITED_MESSAGE,
-    indicator: IndicatorColor.YELLOW,
-  },
-};
--- a/frontend/src/components/features/chat/chat-input.tsx
+++ b/frontend/src/components/features/chat/chat-input.tsx
@@ -132,7 +132,7 @@ export function ChatInput({
        maxRows={maxRows}
        data-dragging-over={isDraggingOver}
        className={cn(
-          "grow text-sm self-center placeholder:text-neutral-400 text-white resize-none outline-none ring-0",
+          "grow text-sm self-center placeholder:text-neutral-400 text-white resize-none outline-hidden ring-0",
          "transition-all duration-200 ease-in-out",
          isDraggingOver
            ? "bg-neutral-600/50 rounded-lg px-2"
--- a/frontend/src/components/features/chat/expandable-message.tsx
+++ b/frontend/src/components/features/chat/expandable-message.tsx
@@ -114,7 +114,7 @@ export function ExpandableMessage({
            {t(I18nKey.STATUS$ERROR_LLM_OUT_OF_CREDITS)}
          </div>
          <Link
-            className="mt-2 mb-2 w-full h-10 rounded flex items-center justify-center gap-2 bg-primary text-[#0D0F11]"
+            className="mt-2 mb-2 w-full h-10 rounded-sm flex items-center justify-center gap-2 bg-primary text-[#0D0F11]"
            to="/settings/billing"
          >
            {t(I18nKey.BILLING$CLICK_TO_TOP_UP)}
--- a/frontend/src/components/features/controls/agent-status-bar.tsx
+++ b/frontend/src/components/features/controls/agent-status-bar.tsx
@@ -1,18 +1,14 @@
 import React from "react";
 import { useTranslation } from "react-i18next";
 import { useSelector } from "react-redux";
-import { I18nKey } from "#/i18n/declaration";
 import { showErrorToast } from "#/utils/error-handler";
 import { RootState } from "#/store";
 import { AgentState } from "#/types/agent-state";
-import {
-  AGENT_STATUS_MAP,
-  IndicatorColor,
-} from "../../agent-status-map.constant";
 import { useWsClient } from "#/context/ws-client-provider";
 import { useNotification } from "#/hooks/useNotification";
 import { browserTab } from "#/utils/browser-tab";
 import { useActiveConversation } from "#/hooks/query/use-active-conversation";
+import { getIndicatorColor, getStatusCode } from "#/utils/status";

 const notificationStates = [
  AgentState.AWAITING_USER_INPUT,
@@ -24,39 +20,61 @@ export function AgentStatusBar() {
  const { t, i18n } = useTranslation();
  const { curAgentState } = useSelector((state: RootState) => state.agent);
  const { curStatusMessage } = useSelector((state: RootState) => state.status);
-  const { status } = useWsClient();
-  const { notify } = useNotification();
+  const { webSocketStatus } = useWsClient();
  const { data: conversation } = useActiveConversation();
+  const indicatorColor = getIndicatorColor(
+    webSocketStatus,
+    conversation?.status || null,
+    conversation?.runtime_status || null,
+    curAgentState,
+  );
+  const statusCode = getStatusCode(
+    curStatusMessage,
+    webSocketStatus,
+    conversation?.status || null,
+    conversation?.runtime_status || null,
+    curAgentState,
+  );
+  const { notify } = useNotification();

-  const [statusMessage, setStatusMessage] = React.useState<string>("");
-
-  const updateStatusMessage = () => {
+  // Show error toast if required
+  React.useEffect(() => {
+    if (curStatusMessage?.type !== "error") {
+      return;
+    }
    let message = curStatusMessage.message || "";
    if (curStatusMessage?.id) {
      const id = curStatusMessage.id.trim();
+      if (id === "STATUS$READY") {
+        message = "awaiting_user_input";
+      }
      if (i18n.exists(id)) {
        message = t(curStatusMessage.id.trim()) || message;
      }
    }
-    if (curStatusMessage?.type === "error") {
-      showErrorToast({
-        message,
-        source: "agent-status",
-        metadata: { ...curStatusMessage },
-      });
-      return;
-    }
-    if (message.trim()) {
-      setStatusMessage(message);
-    } else {
-      setStatusMessage(AGENT_STATUS_MAP[curAgentState].message);
-    }
-  };
-
-  React.useEffect(() => {
-    updateStatusMessage();
+    showErrorToast({
+      message,
+      source: "agent-status",
+      metadata: { ...curStatusMessage },
+    });
  }, [curStatusMessage.id]);

+  // Handle notify
+  React.useEffect(() => {
+    if (notificationStates.includes(curAgentState)) {
+      const message = t(statusCode);
+      notify(message, {
+        body: t(`Agent state changed to ${curAgentState}`),
+        playSound: true,
+      });
+
+      // Update browser tab if window exists and is not focused
+      if (typeof document !== "undefined" && !document.hasFocus()) {
+        browserTab.startNotification(message);
+      }
+    }
+  }, [curAgentState, statusCode]);
+
  // Handle window focus/blur
  React.useEffect(() => {
    if (typeof window === "undefined") return undefined;
@@ -72,45 +90,13 @@ export function AgentStatusBar() {
    };
  }, []);

-  const [indicatorColor, setIndicatorColor] = React.useState<string>(
-    AGENT_STATUS_MAP[curAgentState].indicator,
-  );
-
-  React.useEffect(() => {
-    if (conversation?.status === "CONNECTING") {
-      setStatusMessage(t(I18nKey.STATUS$CONNECTING_TO_RUNTIME));
-      setIndicatorColor(IndicatorColor.YELLOW);
-    } else if (conversation?.status === "STARTING") {
-      setStatusMessage(t(I18nKey.STATUS$STARTING_RUNTIME));
-      setIndicatorColor(IndicatorColor.RED);
-    } else if (status === "DISCONNECTED") {
-      setStatusMessage(t(I18nKey.STATUS$WEBSOCKET_CLOSED));
-      setIndicatorColor(IndicatorColor.RED);
-    } else {
-      setStatusMessage(AGENT_STATUS_MAP[curAgentState].message);
-      setIndicatorColor(AGENT_STATUS_MAP[curAgentState].indicator);
-      if (notificationStates.includes(curAgentState)) {
-        const message = t(AGENT_STATUS_MAP[curAgentState].message);
-        notify(t(AGENT_STATUS_MAP[curAgentState].message), {
-          body: t(`Agent state changed to ${curAgentState}`),
-          playSound: true,
-        });
-
-        // Update browser tab if window exists and is not focused
-        if (typeof document !== "undefined" && !document.hasFocus()) {
-          browserTab.startNotification(message);
-        }
-      }
-    }
-  }, [curAgentState, status, notify, t, conversation?.status]);
-
  return (
    <div className="flex flex-col items-center">
      <div className="flex items-center bg-base-secondary px-2 py-1 text-gray-400 rounded-[100px] text-sm gap-[6px]">
        <div
          className={`w-2 h-2 rounded-full animate-pulse ${indicatorColor}`}
        />
-        <span className="text-sm text-stone-400">{t(statusMessage)}</span>
+        <span className="text-sm text-stone-400">{t(statusCode)}</span>
      </div>
    </div>
  );
--- a/frontend/src/components/features/controls/controls.tsx
+++ b/frontend/src/components/features/controls/controls.tsx
@@ -30,7 +30,7 @@ export function Controls({ setSecurityOpen, showSecurityLock }: ControlsProps) {
        title={conversation?.title ?? ""}
        lastUpdatedAt={conversation?.created_at ?? ""}
        selectedRepository={conversation?.selected_repository ?? null}
-        status={conversation?.status}
+        conversationStatus={conversation?.status}
        conversationId={conversation?.conversation_id}
      />
    </div>
--- a/frontend/src/components/features/conversation-panel/conversation-card-context-menu.tsx
+++ b/frontend/src/components/features/conversation-panel/conversation-card-context-menu.tsx
@@ -1,7 +1,9 @@
+import { useTranslation } from "react-i18next";
 import { useClickOutsideElement } from "#/hooks/use-click-outside-element";
 import { cn } from "#/utils/utils";
 import { ContextMenu } from "../context-menu/context-menu";
 import { ContextMenuListItem } from "../context-menu/context-menu-list-item";
+import { I18nKey } from "#/i18n/declaration";

 interface ConversationCardContextMenuProps {
  onClose: () => void;
@@ -9,6 +11,7 @@ interface ConversationCardContextMenuProps {
  onEdit?: (event: React.MouseEvent<HTMLButtonElement>) => void;
  onDisplayCost?: (event: React.MouseEvent<HTMLButtonElement>) => void;
  onShowAgentTools?: (event: React.MouseEvent<HTMLButtonElement>) => void;
+  onShowMicroagents?: (event: React.MouseEvent<HTMLButtonElement>) => void;
  onDownloadViaVSCode?: (event: React.MouseEvent<HTMLButtonElement>) => void;
  position?: "top" | "bottom";
 }
@@ -19,9 +22,11 @@ export function ConversationCardContextMenu({
  onEdit,
  onDisplayCost,
  onShowAgentTools,
+  onShowMicroagents,
  onDownloadViaVSCode,
  position = "bottom",
 }: ConversationCardContextMenuProps) {
+  const { t } = useTranslation();
  const ref = useClickOutsideElement<HTMLUListElement>(onClose);

  return (
@@ -68,6 +73,14 @@ export function ConversationCardContextMenu({
          Show Agent Tools & Metadata
        </ContextMenuListItem>
      )}
+      {onShowMicroagents && (
+        <ContextMenuListItem
+          testId="show-microagents-button"
+          onClick={onShowMicroagents}
+        >
+          {t(I18nKey.CONVERSATION$SHOW_MICROAGENTS)}
+        </ContextMenuListItem>
+      )}
    </ContextMenu>
  );
 }
--- a/frontend/src/components/features/conversation-panel/conversation-card.tsx
+++ b/frontend/src/components/features/conversation-panel/conversation-card.tsx
@@ -4,13 +4,11 @@ import posthog from "posthog-js";
 import { useTranslation } from "react-i18next";
 import { formatTimeDelta } from "#/utils/format-time-delta";
 import { ConversationRepoLink } from "./conversation-repo-link";
-import {
-  ProjectStatus,
-  ConversationStateIndicator,
-} from "./conversation-state-indicator";
+import { ConversationStateIndicator } from "./conversation-state-indicator";
 import { EllipsisButton } from "./ellipsis-button";
 import { ConversationCardContextMenu } from "./conversation-card-context-menu";
 import { SystemMessageModal } from "./system-message-modal";
+import { MicroagentsModal } from "./microagents-modal";
 import { cn } from "#/utils/utils";
 import { BaseModal } from "../../shared/modals/base-modal/base-modal";
 import { RootState } from "#/store";
@@ -19,6 +17,7 @@ import { transformVSCodeUrl } from "#/utils/vscode-url-helper";
 import OpenHands from "#/api/open-hands";
 import { useWsClient } from "#/context/ws-client-provider";
 import { isSystemMessage } from "#/types/core/guards";
+import { ConversationStatus } from "#/types/conversation-status";

 interface ConversationCardProps {
  onClick?: () => void;
@@ -30,7 +29,7 @@ interface ConversationCardProps {
  selectedRepository: string | null;
  lastUpdatedAt: string; // ISO 8601
  createdAt?: string; // ISO 8601
-  status?: ProjectStatus;
+  conversationStatus?: ConversationStatus;
  variant?: "compact" | "default";
  conversationId?: string; // Optional conversation ID for VS Code URL
 }
@@ -49,7 +48,7 @@ export function ConversationCard({
  // eslint-disable-next-line @typescript-eslint/no-unused-vars
  lastUpdatedAt,
  createdAt,
-  status = "STOPPED",
+  conversationStatus = "STOPPED",
  variant = "default",
  conversationId,
 }: ConversationCardProps) {
@@ -59,6 +58,8 @@ export function ConversationCard({
  const [titleMode, setTitleMode] = React.useState<"view" | "edit">("view");
  const [metricsModalVisible, setMetricsModalVisible] = React.useState(false);
  const [systemModalVisible, setSystemModalVisible] = React.useState(false);
+  const [microagentsModalVisible, setMicroagentsModalVisible] =
+    React.useState(false);
  const inputRef = React.useRef<HTMLInputElement>(null);

  const systemMessage = parsedEvents.find(isSystemMessage);
@@ -142,6 +143,13 @@ export function ConversationCard({
    setSystemModalVisible(true);
  };

+  const handleShowMicroagents = (
+    event: React.MouseEvent<HTMLButtonElement>,
+  ) => {
+    event.stopPropagation();
+    setMicroagentsModalVisible(true);
+  };
+
  React.useEffect(() => {
    if (titleMode === "edit") {
      inputRef.current?.focus();
@@ -196,7 +204,9 @@ export function ConversationCard({
          </div>

          <div className="flex items-center">
-            <ConversationStateIndicator status={status} />
+            <ConversationStateIndicator
+              conversationStatus={conversationStatus}
+            />
            {hasContextMenu && (
              <div className="pl-2">
                <EllipsisButton
@@ -225,6 +235,11 @@ export function ConversationCard({
                      ? handleShowAgentTools
                      : undefined
                  }
+                  onShowMicroagents={
+                    showOptions && conversationId
+                      ? handleShowMicroagents
+                      : undefined
+                  }
                  position={variant === "compact" ? "top" : "bottom"}
                />
              )}
@@ -367,6 +382,13 @@ export function ConversationCard({
        onClose={() => setSystemModalVisible(false)}
        systemMessage={systemMessage ? systemMessage.args : null}
      />
+
+      {microagentsModalVisible && (
+        <MicroagentsModal
+          onClose={() => setMicroagentsModalVisible(false)}
+          conversationId={conversationId}
+        />
+      )}
    </>
  );
 }
--- a/frontend/src/components/features/conversation-panel/conversation-panel.tsx
+++ b/frontend/src/components/features/conversation-panel/conversation-panel.tsx
@@ -91,7 +91,7 @@ export function ConversationPanel({ onClose }: ConversationPanelProps) {
              selectedRepository={project.selected_repository}
              lastUpdatedAt={project.last_updated_at}
              createdAt={project.created_at}
-              status={project.status}
+              conversationStatus={project.status}
              conversationId={project.conversation_id}
            />
          )}
--- a/frontend/src/components/features/conversation-panel/conversation-state-indicator.tsx
+++ b/frontend/src/components/features/conversation-panel/conversation-state-indicator.tsx
@@ -1,38 +1,27 @@
-import ColdIcon from "./state-indicators/cold.svg?react";
+import { ConversationStatus } from "#/types/conversation-status";
 import RunningIcon from "./state-indicators/running.svg?react";
+import StartingIcon from "./state-indicators/starting.svg?react";
+import StoppedIcon from "./state-indicators/stopped.svg?react";

 type SVGIcon = React.FunctionComponent<React.SVGProps<SVGSVGElement>>;
-export type ProjectStatus =
-  | "RUNNING"
-  | "STOPPED"
-  | "STARTING"
-  | "CONNECTING"
-  | "CONNECTED"
-  | "DISCONNECTED";

-type ProjectStatusWithIcon = Exclude<
-  ProjectStatus,
-  "CONNECTING" | "CONNECTED" | "DISCONNECTED"
->;
-
-const INDICATORS: Record<ProjectStatusWithIcon, SVGIcon> = {
-  STOPPED: ColdIcon,
+const CONVERSATION_STATUS_INDICATORS: Record<ConversationStatus, SVGIcon> = {
+  STOPPED: StoppedIcon,
  RUNNING: RunningIcon,
-  STARTING: ColdIcon,
+  STARTING: StartingIcon,
 };

 interface ConversationStateIndicatorProps {
-  status: ProjectStatus;
+  conversationStatus: ConversationStatus;
 }

 export function ConversationStateIndicator({
-  status,
+  conversationStatus,
 }: ConversationStateIndicatorProps) {
-  // @ts-expect-error - Type 'ProjectStatus' is not assignable to type 'ProjectStatusWithIcon'.
-  const StateIcon = INDICATORS[status];
+  const StateIcon = CONVERSATION_STATUS_INDICATORS[conversationStatus];

  return (
-    <div data-testid={`${status}-indicator`}>
+    <div data-testid={`${conversationStatus}-indicator`}>
      <StateIcon />
    </div>
  );
--- a/frontend/src/components/features/conversation-panel/microagents-modal.tsx
+++ b/frontend/src/components/features/conversation-panel/microagents-modal.tsx
@@ -0,0 +1,142 @@
+import React, { useState } from "react";
+import { useTranslation } from "react-i18next";
+import { ChevronDown, ChevronRight } from "lucide-react";
+import { BaseModalTitle } from "#/components/shared/modals/confirmation-modals/base-modal";
+import { ModalBackdrop } from "#/components/shared/modals/modal-backdrop";
+import { ModalBody } from "#/components/shared/modals/modal-body";
+import { I18nKey } from "#/i18n/declaration";
+import { useConversationMicroagents } from "#/hooks/query/use-conversation-microagents";
+
+interface MicroagentsModalProps {
+  onClose: () => void;
+  conversationId: string | undefined;
+}
+
+export function MicroagentsModal({
+  onClose,
+  conversationId,
+}: MicroagentsModalProps) {
+  const { t } = useTranslation();
+  const [expandedAgents, setExpandedAgents] = useState<Record<string, boolean>>(
+    {},
+  );
+
+  const {
+    data: microagents,
+    isLoading,
+    isError,
+  } = useConversationMicroagents({
+    conversationId,
+    enabled: true,
+  });
+
+  const toggleAgent = (agentName: string) => {
+    setExpandedAgents((prev) => ({
+      ...prev,
+      [agentName]: !prev[agentName],
+    }));
+  };
+
+  return (
+    <ModalBackdrop onClose={onClose}>
+      <ModalBody
+        width="medium"
+        className="max-h-[80vh] flex flex-col items-start"
+        testID="microagents-modal"
+      >
+        <div className="flex flex-col gap-6 w-full">
+          <BaseModalTitle title={t(I18nKey.MICROAGENTS_MODAL$TITLE)} />
+        </div>
+
+        <div className="w-full h-[60vh] overflow-auto rounded-md">
+          {isLoading && (
+            <div className="flex justify-center items-center py-8">
+              <div className="animate-spin rounded-full h-8 w-8 border-t-2 border-b-2 border-primary" />
+            </div>
+          )}
+
+          {!isLoading &&
+            (isError || !microagents || microagents.length === 0) && (
+              <div className="flex items-center justify-center h-full p-4">
+                <p className="text-gray-400">
+                  {isError
+                    ? t(I18nKey.MICROAGENTS_MODAL$FETCH_ERROR)
+                    : t(I18nKey.CONVERSATION$NO_MICROAGENTS)}
+                </p>
+              </div>
+            )}
+
+          {!isLoading && microagents && microagents.length > 0 && (
+            <div className="p-2 space-y-3">
+              {microagents.map((agent) => {
+                const isExpanded = expandedAgents[agent.name] || false;
+
+                return (
+                  <div key={agent.name} className="rounded-md overflow-hidden">
+                    <button
+                      type="button"
+                      onClick={() => toggleAgent(agent.name)}
+                      className="w-full py-3 px-2 text-left flex items-center justify-between hover:bg-gray-700 transition-colors"
+                    >
+                      <div className="flex items-center">
+                        <h3 className="font-bold text-gray-100">
+                          {agent.name}
+                        </h3>
+                      </div>
+                      <div className="flex items-center">
+                        <span className="px-2 py-1 text-xs rounded-full bg-gray-800 mr-2">
+                          {agent.type === "repo" ? "Repository" : "Knowledge"}
+                        </span>
+                        <span className="text-gray-300">
+                          {isExpanded ? (
+                            <ChevronDown size={18} />
+                          ) : (
+                            <ChevronRight size={18} />
+                          )}
+                        </span>
+                      </div>
+                    </button>
+
+                    {isExpanded && (
+                      <div className="px-2 pb-3 pt-1">
+                        {agent.triggers && agent.triggers.length > 0 && (
+                          <div className="mt-2 mb-3">
+                            <h4 className="text-sm font-semibold text-gray-300 mb-2">
+                              {t(I18nKey.MICROAGENTS_MODAL$TRIGGERS)}
+                            </h4>
+                            <div className="flex flex-wrap gap-1">
+                              {agent.triggers.map((trigger) => (
+                                <span
+                                  key={trigger}
+                                  className="px-2 py-1 text-xs rounded-full bg-blue-900"
+                                >
+                                  {trigger}
+                                </span>
+                              ))}
+                            </div>
+                          </div>
+                        )}
+
+                        <div className="mt-2">
+                          <h4 className="text-sm font-semibold text-gray-300 mb-2">
+                            {t(I18nKey.MICROAGENTS_MODAL$CONTENT)}
+                          </h4>
+                          <div className="text-sm mt-2 p-3 bg-gray-900 rounded-md overflow-auto text-gray-300 max-h-[400px] shadow-inner">
+                            <pre className="whitespace-pre-wrap font-mono text-sm leading-relaxed">
+                              {agent.content ||
+                                t(I18nKey.MICROAGENTS_MODAL$NO_CONTENT)}
+                            </pre>
+                          </div>
+                        </div>
+                      </div>
+                    )}
+                  </div>
+                );
+              })}
+            </div>
+          )}
+        </div>
+      </ModalBody>
+    </ModalBackdrop>
+  );
+}
--- a/frontend/src/components/features/conversation-panel/state-indicators/finished.svg
+++ b/frontend/src/components/features/conversation-panel/state-indicators/finished.svg
@@ -1,4 +0,0 @@
-<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
-<path d="M9 16.8599C13.4183 16.8599 17 13.2781 17 8.85986C17 4.44159 13.4183 0.859863 9 0.859863C4.58172 0.859863 1 4.44159 1 8.85986C1 13.2781 4.58172 16.8599 9 16.8599Z" fill="#779FD4"/>
-<path d="M4.61035 8.43014L7.86035 12.0301L13.3904 6.64014" stroke="#231F20" stroke-width="2" stroke-miterlimit="10" stroke-linecap="round"/>
-</svg>
--- a/frontend/src/components/features/conversation-panel/state-indicators/starting.svg
+++ b/frontend/src/components/features/conversation-panel/state-indicators/starting.svg
--- a/frontend/src/components/features/conversation-panel/state-indicators/stopped.svg
+++ b/frontend/src/components/features/conversation-panel/state-indicators/stopped.svg
--- a/frontend/src/components/features/conversation-panel/state-indicators/waiting.svg
+++ b/frontend/src/components/features/conversation-panel/state-indicators/waiting.svg
@@ -1,4 +0,0 @@
-<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
-<path d="M6.76039 6.99002C8.478 6.99002 9.87039 5.59763 9.87039 3.88002C9.87039 2.16241 8.478 0.77002 6.76039 0.77002C5.04279 0.77002 3.65039 2.16241 3.65039 3.88002C3.65039 5.59763 5.04279 6.99002 6.76039 6.99002Z" fill="#FFE165"/>
-<path d="M1.0802 17.0799C1.0802 17.0799 0.610196 11.5499 3.0102 9.67992C4.7902 8.29992 7.3302 9.44992 9.7802 7.95992C11.5802 6.86992 13.6102 4.10992 14.5202 2.49992C14.9302 1.77992 15.9102 1.62992 16.6102 2.05992C17.3802 2.51992 17.6102 3.53992 17.1102 4.28992C16.2302 5.58992 14.1802 8.85992 13.1202 10.3699C10.7602 13.7599 11.4302 17.0799 11.4302 17.0799H1.0702H1.0802Z" fill="#FFE165"/>
-</svg>
--- a/frontend/src/components/features/conversation-panel/state-indicators/warm.svg
+++ b/frontend/src/components/features/conversation-panel/state-indicators/warm.svg
@@ -1,4 +0,0 @@
-<svg width="18" height="18" viewBox="0 0 18 18" fill="none" xmlns="http://www.w3.org/2000/svg">
-<path d="M9.87012 2.08984C9.87012 1.53756 9.4224 1.08984 8.87012 1.08984C8.31783 1.08984 7.87012 1.53756 7.87012 2.08984V8.08984C7.87012 8.64213 8.31783 9.08984 8.87012 9.08984C9.4224 9.08984 9.87012 8.64213 9.87012 8.08984V2.08984Z" fill="#60BB46"/>
-<path d="M10.8702 2.50988V2.64988C10.8702 3.01988 11.0702 3.36988 11.4102 3.51988C13.6802 4.51988 15.2202 6.88988 14.9702 9.56988C14.7002 12.5599 12.1002 14.9599 9.09021 15.0099C5.74021 15.0599 2.99021 12.3499 2.99021 9.00988C2.99021 6.65988 4.35021 4.62988 6.31021 3.64988C6.64021 3.48988 6.86021 3.16988 6.86021 2.80988V2.63988C6.86021 1.94988 6.14021 1.51988 5.51021 1.81988C2.42021 3.30988 0.430214 6.71988 1.12021 10.5199C1.69021 13.6799 4.22021 16.2499 7.37021 16.8699C12.4902 17.8699 16.9802 13.9699 16.9802 9.02988C16.9802 5.71988 14.9702 2.88988 12.1002 1.66988C11.5102 1.41988 10.8502 1.88988 10.8502 2.52988L10.8702 2.50988Z" fill="#60BB46"/>
-</svg>
--- a/frontend/src/components/features/feedback/feedback-form.tsx
+++ b/frontend/src/components/features/feedback/feedback-form.tsx
@@ -101,7 +101,7 @@ export function FeedbackForm({ onClose, polarity }: FeedbackFormProps) {
          name="email"
          type="email"
          placeholder={t(I18nKey.FEEDBACK$EMAIL_PLACEHOLDER)}
-          className="bg-[#27272A] px-3 py-[10px] rounded"
+          className="bg-[#27272A] px-3 py-[10px] rounded-sm"
        />
      </label>

--- a/frontend/src/components/features/home/repository-selection/branch-error-state.tsx
+++ b/frontend/src/components/features/home/repository-selection/branch-error-state.tsx
@@ -6,7 +6,7 @@ export function BranchErrorState() {
  return (
    <div
      data-testid="branch-dropdown-error"
-      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded text-red-500"
+      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded-sm text-red-500"
    >
      <span className="text-sm">{t("HOME$FAILED_TO_LOAD_BRANCHES")}</span>
    </div>
--- a/frontend/src/components/features/home/repository-selection/branch-loading-state.tsx
+++ b/frontend/src/components/features/home/repository-selection/branch-loading-state.tsx
@@ -7,7 +7,7 @@ export function BranchLoadingState() {
  return (
    <div
      data-testid="branch-dropdown-loading"
-      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded"
+      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded-sm"
    >
      <Spinner size="sm" />
      <span className="text-sm">{t("HOME$LOADING_BRANCHES")}</span>
--- a/frontend/src/components/features/home/repository-selection/repository-error-state.tsx
+++ b/frontend/src/components/features/home/repository-selection/repository-error-state.tsx
@@ -6,7 +6,7 @@ export function RepositoryErrorState() {
  return (
    <div
      data-testid="repo-dropdown-error"
-      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded text-red-500"
+      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded-sm text-red-500"
    >
      <span className="text-sm">{t("HOME$FAILED_TO_LOAD_REPOSITORIES")}</span>
    </div>
--- a/frontend/src/components/features/home/repository-selection/repository-loading-state.tsx
+++ b/frontend/src/components/features/home/repository-selection/repository-loading-state.tsx
@@ -7,7 +7,7 @@ export function RepositoryLoadingState() {
  return (
    <div
      data-testid="repo-dropdown-loading"
-      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded"
+      className="flex items-center gap-2 max-w-[500px] h-10 px-3 bg-tertiary border border-[#717888] rounded-sm"
    >
      <Spinner size="sm" />
      <span className="text-sm">{t("HOME$LOADING_REPOSITORIES")}</span>
--- a/frontend/src/components/features/images/thumbnail.tsx
+++ b/frontend/src/components/features/images/thumbnail.tsx
@@ -12,7 +12,7 @@ export function Thumbnail({ src, size = "small" }: ThumbnailProps) {
      alt=""
      src={src}
      className={cn(
-        "rounded object-cover",
+        "rounded-sm object-cover",
        size === "small" && "w-[62px] h-[62px]",
        size === "large" && "w-[100px] h-[100px]",
      )}
--- a/frontend/src/components/features/payment/payment-form.tsx
+++ b/frontend/src/components/features/payment/payment-form.tsx
@@ -43,7 +43,7 @@ export function PaymentForm() {
    >
      <div
        className={cn(
-          "flex items-center justify-between w-[680px] bg-[#7F7445] rounded px-3 py-2",
+          "flex items-center justify-between w-[680px] bg-[#7F7445] rounded-sm px-3 py-2",
          "text-[28px] leading-8 -tracking-[0.02em] font-bold",
        )}
      >
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
openhands	4cbe46b56c	Fix unit tests for new CLI LLM arguments - Update test_help_message to include new CLI arguments (--llm-model, --llm-base-url, --llm-api-key) - Update expected option count from 20 to 23 - Add tests for default and custom values of new CLI arguments - All arg parser tests now pass	2025-06-17 14:34:37 +00:00
openhands	f777029546	Fix formatting issues from pre-commit hooks	2025-06-16 12:11:14 +00:00
better629	432d8829dc	disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151 )	2025-06-16 11:03:17 +00:00
Graham Neubig	24f891687d	Fix CLI displaying claude-2 as default model for anthropic provider (#9101 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-15 21:21:33 -04:00
Graham Neubig	2d2ccf1329	Fix conversation URL format in pull request links (#9143 )	2025-06-15 15:41:08 -04:00
FT	e5bff91e8e	Fix Typo: Change "accurancy" to "accuracy" in Evaluation Benchmark Comments (#9139 )	2025-06-15 12:48:26 +00:00
Linghao Zhang	a93b0457c6	feat(eval): Support evaluation on SWE-bench-Live (#9137 )	2025-06-15 12:30:47 +00:00
Graham Neubig	98e0f5509c	Update CLI mode docs to accurately reflect settings workflow (#9134 )	2025-06-14 19:21:18 +00:00
kilavvy	4e99aabcb2	Minor Code Comment Corrections and Clarifications (#9129 )	2025-06-14 18:57:14 +00:00
Graham Neubig	0c307ea12e	Lint all files in the repo (#9131 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-14 16:25:59 +00:00
Graham Neubig	5134a7d938	Add secrets manager documentation to GUI mode docs (#9084 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-14 12:13:24 -04:00
Graham Neubig	a1627914ad	Fix broken link to LLMs section in GUI mode documentation (#9121 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-14 23:26:41 +08:00
Graham Neubig	ccdd86e476	docs: remove 'coming soon' mentions from Slack app installation page (#9112 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>	2025-06-14 14:35:04 +00:00
ASTONE	be62ba6b35	add_versicode (#8221 )	2025-06-14 13:17:18 +00:00
leopardracer	13c298d35f	Minor Typo Fixes in Comments and Documentation (#9058 )	2025-06-14 12:51:38 +00:00
llamantino	47b0dc548e	feat: support dev container networking without host mode (#9122 )	2025-06-14 08:38:18 -04:00
Graham Neubig	90ae4bda0d	Restore Windows without WSL documentation (#9090 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-14 08:35:30 -04:00
dependabot[bot]	8963644fb4	chore(deps): bump the version-all group across 1 directory with 14 updates (#9107 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-14 07:58:24 -04:00
Engel Nyst	fd3b4ac8e6	Refactor SWE-bench instruction (#8010 )	2025-06-13 23:27:52 +02:00
Rohit Malhotra	53623c76b5	[Fix]: allow agent to configure draft status for opened prs/mrs via git mcp (#9117 )	2025-06-13 21:06:23 +00:00
Ray Myers	e6036b8346	Bump version for 0.43.0 release (#9109 )	2025-06-13 14:47:26 -05:00
jpelletier1	144d09a578	Code review microagent (#9093 ) Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-13 01:35:44 +00:00
llamantino	f97a837d46	fix: fix unreachable runtime container in make docker-dev (#9072 )	2025-06-12 12:46:10 -04:00
dependabot[bot]	eadec4ce9e	chore(deps): bump the version-all group in /frontend with 8 updates (#9095 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-12 15:17:45 +00:00
dependabot[bot]	49e8737779	chore(deps): bump the version-all group across 1 directory with 24 updates (#9066 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>	2025-06-12 14:31:35 +00:00
Graham Neubig	4711e74101	Fix default provider in CLI to be 'anthropic' instead of 'openai' (#9004 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-06-12 03:02:03 +00:00
mamoodi	c87f1cc8c0	Move Advanced Configurations under Running OpenHands on your Own (#9082 )	2025-06-11 16:36:17 -04:00
Rohit Malhotra	33b64786b0	[Docs]: add info about lower scope tokens for gitlab (#9017 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-06-11 19:34:06 +00:00
Rohit Malhotra	12fc50299b	[Docs]: add slack integration docs (#8903 ) Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-11 19:32:54 +00:00
Tim O'Farrell	57fee17348	Fix VSCode workspace dir (#9080 )	2025-06-11 13:31:59 -06:00
Engel Nyst	77517d8ba0	Save CLI settings directly under ~/.openhands (#9079 )	2025-06-11 21:07:40 +02:00
Calvin Smith	a356f56237	fix: Context window truncation makes progress (#9052 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev> Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-11 12:47:34 -06:00
chuckbutkus	7dede37fd8	Make sure redirect URI is HTTPS unless it is for localhost (#9076 )	2025-06-11 18:19:15 +00:00
Ray Myers	c11dcad309	Add more log context on key events (#9056 )	2025-06-11 11:34:16 -05:00
Tim O'Farrell	47209e794a	Runtime Status Fixes (#9050 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-11 09:28:17 -06:00
Xingyao Wang	3f50eb0079	feat: Add microagents UI to conversation context menu (#8984 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-06-11 23:12:27 +08:00
sp.wack	f27b02411b	chore: Add deprecated tag to `ActionMessage` type (#9063 )	2025-06-11 18:34:07 +04:00
llamantino	d151093872	docs: added devstral to llms list, added local llms in local setup (#9062 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-06-11 10:22:15 -04:00
neo	ea7294b7f9	docs: add links to other language versions of README (#9038 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-06-11 09:49:40 -04:00
Xingyao Wang	9097f487a6	Move get_agent_obs_text function to browser utils and add return_all option (#9019 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-06-11 12:32:38 +08:00
Rohit Malhotra	fd921a4f88	[Fix]: model tracking in convo metadata (#9053 )	2025-06-10 22:19:33 -04:00
Xingyao Wang	96fe5a50d6	Update repo.md (#9054 )	2025-06-10 21:51:13 -04:00
Howie Zhou	b634e10b45	Add JSON serialization for array and object parameters when converting tools (#8780 )	2025-06-10 16:48:49 -04:00
Xingyao Wang	73f01657eb	docs: Add TanStack Query state management documentation (#9047 )	2025-06-10 16:44:00 -04:00
mamoodi	5d328183d5	Release 0.42.0 (#9046 )	2025-06-10 16:34:10 -04:00
Mislav Lukach	b7da65d373	chore(ui): update tailwind (#9049 ) Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-06-10 18:20:04 +00:00
sp.wack	dca9c7bdc6	feat(backend): New "update microagent prompt" API (#8357 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Engel Nyst <engel.nyst@gmail.com>	2025-06-10 22:10:55 +04:00