Merge branch 'main' into chuck-build

test
issue #9388 , this will fix the issue (#10450 )
2026-04-29 03:00:45 -04:00 · 2025-09-23 14:25:16 -04:00 · 2025-09-23 14:19:20 -04:00 · 2025-09-22 16:56:53 -04:00 · 2025-09-22 20:35:30 +00:00 · 2025-09-22 15:56:26 -04:00
381 changed files with 11852 additions and 5897 deletions
@@ -15,7 +15,7 @@ jobs:
          stale-issue-message: 'This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
          stale-pr-message: 'This PR is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
          days-before-stale: 40
-          exempt-issue-labels: roadmap,backlog
+          exempt-issue-labels: roadmap,backlog,app-team
          close-issue-message: 'This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.'
          close-pr-message: 'This PR was closed because it had no activity for 50 days. If you feel this was closed in error, and you would like to continue the PR, please resubmit or let us know.'
          days-before-close: 10
@@ -159,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
 To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
 container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.

-Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.56-nikolaik`
+Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.57-nikolaik`

 ## Develop inside Docker container

@@ -79,17 +79,17 @@ You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)
 You can also run OpenHands directly with Docker:

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56
+    docker.all-hands.dev/all-hands-ai/openhands:0.57
 ```

 </details>
@@ -51,17 +51,17 @@ OpenHands也可以使用Docker在本地系统上运行。


 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56
+    docker.all-hands.dev/all-hands-ai/openhands:0.57
 ```

 > **注意**: 如果您在0.44版本之前使用过OpenHands，您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
@@ -42,17 +42,17 @@ OpenHandsはDockerを利用してローカル環境でも実行できます。
 > 公共ネットワークで実行していますか？[Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)を参照して、ネットワークバインディングの制限や追加のセキュリティ対策を実施してください。

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56
+    docker.all-hands.dev/all-hands-ai/openhands:0.57
 ```

 **注**: バージョン0.44以前のOpenHandsを使用していた場合は、会話履歴を移行するために `mv ~/.openhands-state ~/.openhands` を実行してください。
@@ -12,7 +12,7 @@ services:
      - SANDBOX_API_HOSTNAME=host.docker.internal
      - DOCKER_HOST_ADDR=host.docker.internal
      #
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.56-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.57-nikolaik}
      - SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
@@ -7,7 +7,7 @@ services:
    image: openhands:latest
    container_name: openhands-app-${DATE:-}
    environment:
-      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik}
+      - SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik}
      #- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
      - WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
    ports:
@@ -8,6 +8,11 @@ description: This page outlines all available configuration options for OpenHand
   In GUI Mode, any settings applied through the Settings UI will take precedence.
 </Note>

+<Note>
+   **Looking for Environment Variables?** All configuration options can also be set using environment variables. 
+   See the [Environment Variables Reference](./environment-variables) for a complete list with examples.
+</Note>
+
 ## Location of the `config.toml` File

 When running OpenHands in CLI, headless, or development mode, you can use a project-specific `config.toml` file for configuration, which must be
@@ -18,6 +23,11 @@ specify a different path to the `config.toml` file.

 The core configuration options are defined in the `[core]` section of the `config.toml` file.

+Core configuration options can be set as environment variables by converting to uppercase. For example:
+- `debug` → `DEBUG`
+- `cache_dir` → `CACHE_DIR`
+- `runtime` → `RUNTIME`
+
 ### Workspace
 - `workspace_base` **(Deprecated)**
  - Type: `str`
@@ -141,6 +151,11 @@ The LLM (Large Language Model) configuration options are defined in the `[llm]`

 To use these with the docker command, pass in `-e LLM_<option>`. Example: `-e LLM_NUM_RETRIES`.

+All LLM configuration options can be set as environment variables by prefixing with `LLM_` and converting to uppercase. For example:
+- `model` → `LLM_MODEL`
+- `api_key` → `LLM_API_KEY`
+- `base_url` → `LLM_BASE_URL`
+
 <Note>
 For development setups, you can also define custom named LLM configurations. See [Custom LLM Configurations](./llms/custom-llm-configs) for details.
 </Note>
@@ -277,6 +292,11 @@ For development setups, you can also define custom named LLM configurations. See

 The agent configuration options are defined in the `[agent]` and `[agent.<agent_name>]` sections of the `config.toml` file.

+Agent configuration options can be set as environment variables by prefixing with `AGENT_` and converting to uppercase. For example:
+- `enable_browsing` → `AGENT_ENABLE_BROWSING`
+- `function_calling` → `AGENT_FUNCTION_CALLING`
+- `llm_config` → `AGENT_LLM_CONFIG`
+
 ### LLM Configuration
 - `llm_config`
  - Type: `str`
@@ -328,6 +348,11 @@ The sandbox configuration options are defined in the `[sandbox]` section of the

 To use these with the docker command, pass in `-e SANDBOX_<option>`. Example: `-e SANDBOX_TIMEOUT`.

+All sandbox configuration options can be set as environment variables by prefixing with `SANDBOX_` and converting to uppercase. For example:
+- `timeout` → `SANDBOX_TIMEOUT`
+- `user_id` → `SANDBOX_USER_ID`
+- `base_container_image` → `SANDBOX_BASE_CONTAINER_IMAGE`
+
 ### Execution
 - `timeout`
  - Type: `int`
@@ -390,6 +415,10 @@ The security configuration options are defined in the `[security]` section of th

 To use these with the docker command, pass in `-e SECURITY_<option>`. Example: `-e SECURITY_CONFIRMATION_MODE`.

+All security configuration options can be set as environment variables by prefixing with `SECURITY_` and converting to uppercase. For example:
+- `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`
+- `security_analyzer` → `SECURITY_SECURITY_ANALYZER`
+
 ### Confirmation Mode
 - `confirmation_mode`
  - Type: `bool`
@@ -0,0 +1,251 @@
+---
+title: Environment Variables Reference
+description: Complete reference of all environment variables supported by OpenHands
+---
+
+This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments.
+
+## Environment Variable Naming Convention
+
+OpenHands follows a consistent naming pattern for environment variables:
+
+- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`)
+- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`)
+- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`)
+- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`)
+- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`)
+
+## Core Configuration Variables
+
+These variables correspond to the `[core]` section in `config.toml`:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `DEBUG` | boolean | `false` | Enable debug logging throughout the application |
+| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal |
+| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching |
+| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories |
+| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file |
+| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path |
+| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) |
+| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) |
+| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types |
+| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads |
+| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) |
+| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task |
+| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) |
+| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use |
+| `JWT_SECRET` | string | auto-generated | JWT secret for authentication |
+| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user |
+| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` |
+
+## LLM Configuration Variables
+
+These variables correspond to the `[llm]` section in `config.toml`:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use |
+| `LLM_API_KEY` | string | `""` | API key for the LLM provider |
+| `LLM_BASE_URL` | string | `""` | Custom API base URL |
+| `LLM_API_VERSION` | string | `""` | API version to use |
+| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature |
+| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter |
+| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) |
+| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) |
+| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content |
+| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) |
+| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts |
+| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) |
+| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) |
+| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier |
+| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error |
+| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported |
+| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction |
+| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name |
+| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API |
+| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token |
+| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token |
+| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) |
+
+### AWS Configuration
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID |
+| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key |
+| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name |
+
+## Agent Configuration Variables
+
+These variables correspond to the `[agent]` section in `config.toml`:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use |
+| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling |
+| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate |
+| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor |
+| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration |
+| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation |
+| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable microagents (prompt extensions) |
+| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of microagents to disable |
+
+## Sandbox Configuration Variables
+
+These variables correspond to the `[sandbox]` section in `config.toml`:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds |
+| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes |
+| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image |
+| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking |
+| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address |
+| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting |
+| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins |
+| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install |
+| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime |
+| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment |
+| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) |
+| `SANDBOX_RUNTIME_CONTAINER_IMAGE` | string | `""` | Pre-built runtime container image |
+| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends |
+| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes |
+| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) |
+| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping |
+| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support |
+| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID |
+| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server |
+
+### Sandbox Environment Variables
+Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment:
+
+| Environment Variable | Description |
+|---------------------|-------------|
+| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) |
+
+## Security Configuration Variables
+
+These variables correspond to the `[security]` section in `config.toml`:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions |
+| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) |
+| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis |
+
+## Debug and Logging Variables
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `DEBUG` | boolean | `false` | Enable general debug logging |
+| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging |
+| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging |
+| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) |
+
+## Runtime-Specific Variables
+
+### Docker Runtime
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations |
+
+### Remote Runtime
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime |
+| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL |
+
+### Local Runtime
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime |
+| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern |
+| `RUNTIME_ID` | string | `""` | Runtime identifier |
+| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) |
+
+## Integration Variables
+
+### GitHub Integration
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `GITHUB_TOKEN` | string | `""` | GitHub personal access token |
+
+### Third-Party API Keys
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `OPENAI_API_KEY` | string | `""` | OpenAI API key |
+| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key |
+| `GOOGLE_API_KEY` | string | `""` | Google API key |
+| `AZURE_API_KEY` | string | `""` | Azure API key |
+| `TAVILY_API_KEY` | string | `""` | Tavily search API key |
+
+## Server Configuration Variables
+
+These are primarily used when running OpenHands as a server:
+
+| Environment Variable | Type | Default | Description |
+|---------------------|------|---------|-------------|
+| `FRONTEND_PORT` | integer | `3000` | Frontend server port |
+| `BACKEND_PORT` | integer | `8000` | Backend server port |
+| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address |
+| `BACKEND_HOST` | string | `"localhost"` | Backend host address |
+| `WEB_HOST` | string | `"localhost"` | Web server host |
+| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend |
+
+## Deprecated Variables
+
+These variables are deprecated and should be replaced:
+
+| Environment Variable | Replacement | Description |
+|---------------------|-------------|-------------|
+| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead |
+| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead |
+| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead |
+| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead |
+
+## Usage Examples
+
+### Basic Setup with OpenAI
+```bash
+export LLM_MODEL="gpt-4o"
+export LLM_API_KEY="your-openai-api-key"
+export DEBUG=true
+```
+
+### Docker Deployment with Custom Volumes
+```bash
+export RUNTIME="docker"
+export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro"
+export SANDBOX_TIMEOUT=300
+```
+
+### Remote Runtime Configuration
+```bash
+export RUNTIME="remote"
+export SANDBOX_API_KEY="your-remote-api-key"
+export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com"
+```
+
+### Security-Enhanced Setup
+```bash
+export SECURITY_CONFIRMATION_MODE=true
+export SECURITY_SECURITY_ANALYZER="llm"
+export DEBUG_RUNTIME=true
+```
+
+## Notes
+
+1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive).
+
+2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["microagent1", "microagent2"]'`.
+
+3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`.
+
+4. **Precedence**: Environment variables take precedence over TOML configuration files.
+
+5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag:
+   ```bash
+   docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands
+   ```
+
+6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults.
@@ -113,7 +113,7 @@ The conversation history will be saved in `~/.openhands/sessions`.
 ```bash
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
@@ -122,7 +122,7 @@ docker run -it \
    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.57 \
    python -m openhands.cli.entry --override-cli-mode true
 ```

@@ -61,7 +61,7 @@ export GITHUB_TOKEN="your-token"  # Required for repository operations
 # Run OpenHands
 docker run -it \
    --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e SANDBOX_USER_ID=$(id -u) \
    -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
    -e LLM_API_KEY=$LLM_API_KEY \
@@ -73,7 +73,7 @@ docker run -it \
    -v ~/.openhands:/.openhands \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app-$(date +%Y%m%d%H%M%S) \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56 \
+    docker.all-hands.dev/all-hands-ai/openhands:0.57 \
    python -m openhands.core.main -t "write a bash script that prints hi"
 ```

@@ -68,23 +68,23 @@ Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstud
 1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56
+    docker.all-hands.dev/all-hands-ai/openhands:0.57
 ```

 2. Wait until the server is running (see log below):
 ```
 Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
-Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.56
+Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.57
 Starting OpenHands...
 Running OpenHands as root
 14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
@@ -30,6 +30,20 @@ When running OpenHands, you'll need to set the following in the OpenHands UI thr

 ## Pricing

-Pricing follows official API provider rates. [You can view model prices here.](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
+Pricing follows official API provider rates. Below are the current pricing details for OpenHands models:

-For `qwen3-coder-480b`, we charge the cheapest FP8 rate available on openrouter: \$0.4 per million input tokens and \$1.6 per million output tokens.
+| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens |
+|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------|
+| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 |
+| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 |
+| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 |
+| devstral-small-2505 | $0.10 | N/A | $0.30 | 128,000 | 128,000 |
+| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 |
+| gemini-2.5-pro | $1.25 | $0.31 | $10.00 | 1,048,576 | 65,535 |
+| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 400,000 | 128,000 |
+| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 400,000 | 128,000 |
+| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 |
+| o4-mini | $1.10 | $0.28 | $4.40 | 200,000 | 100,000 |
+| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A |
+
+**Note:** Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost.
@@ -116,17 +116,17 @@ Note that you'll still need `uv` installed for the default MCP servers to work p
 <Accordion title="Docker Command (Click to expand)">

 ```bash
-docker pull docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik
+docker pull docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik

 docker run -it --rm --pull=always \
-    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.56-nikolaik \
+    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.57-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
-    docker.all-hands.dev/all-hands-ai/openhands:0.56
+    docker.all-hands.dev/all-hands-ai/openhands:0.57
 ```

 </Accordion>
@@ -7,14 +7,28 @@ LABEL com.datadoghq.tags.service="deploy"
 LABEL com.datadoghq.tags.env="${DD_ENV}"

 # Install Node.js v20+ and npm (which includes npx)
+# Apply security updates to fix CVEs
 RUN apt-get update && \
    apt-get install -y curl && \
    curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
    apt-get install -y nodejs && \
    apt-get install -y jq gettext && \
-    apt-get clean
+    # Apply security updates for packages with available fixes
+    apt-get upgrade -y \
+        libc-bin \
+        libc6 \
+        libgnutls30 \
+        libsqlite3-0 \
+        perl-base && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*

-RUN pip install alembic psycopg2-binary cloud-sql-python-connector pg8000 gspread stripe python-keycloak asyncpg sqlalchemy[asyncio] resend tenacity slack-sdk ddtrace posthog "limits==5.2.0" coredis prometheus-client shap scikit-learn pandas numpy
+# Install Python packages with security fixes
+RUN pip install alembic psycopg2-binary cloud-sql-python-connector pg8000 gspread stripe python-keycloak asyncpg sqlalchemy[asyncio] resend tenacity slack-sdk ddtrace posthog "limits==5.2.0" coredis prometheus-client shap scikit-learn pandas numpy && \
+    # Update packages with known CVE fixes
+    pip install --upgrade \
+        "mcp>=1.10.0" \
+        "pillow>=11.3.0"

 WORKDIR /app
 COPY enterprise .
@@ -46,7 +46,8 @@ repos:
          - types-toml
          - types-redis
          - lxml
-          # TODO: Add OpenHands in parent
+          # OpenHands package in repo root
+          - ./
          - stripe==11.5.0
          - pygithub==2.6.1
        # To see gaps add `--html-report mypy-report/`
@@ -7,15 +7,11 @@ warn_unreachable = True
 warn_redundant_casts = True
 no_implicit_optional = True
 strict_optional = True
-exclude = (^enterprise/migrations/.*|^openhands/.*)
+disable_error_code = type-abstract
+exclude = (^enterprise/migrations/.*)

 [mypy-enterprise.tests.unit.test_auth_routes.*]
 disable_error_code = union-attr

 [mypy-enterprise.sync.install_gitlab_webhooks.*]
 disable_error_code = redundant-cast
-
-# Let the other config check base openhands packages
-[mypy-openhands.*]
-follow_imports = skip
-ignore_missing_imports = True
@@ -2,7 +2,6 @@ from experiments.constants import (
    ENABLE_EXPERIMENT_MANAGER,
 )
 from experiments.experiment_versions import (
-    handle_claude4_vs_gpt5_experiment,
    handle_condenser_max_step_experiment,
    handle_system_prompt_experiment,
 )
@@ -44,9 +43,6 @@ class SaaSExperimentManager(ExperimentManager):
            return conversation_settings

        # Apply conversation-scoped experiments
-        conversation_settings = handle_claude4_vs_gpt5_experiment(
-            user_id, conversation_id, conversation_settings
-        )
        conversation_settings = handle_condenser_max_step_experiment(
            user_id, conversation_id, conversation_settings
        )
@@ -55,7 +51,7 @@ class SaaSExperimentManager(ExperimentManager):

    @staticmethod
    def run_config_variant_test(
-        user_id: str, conversation_id: str, config: OpenHandsConfig
+        user_id: str | None, conversation_id: str, config: OpenHandsConfig
    ) -> OpenHandsConfig:
        """
        Run agent config variant test and potentially modify the OpenHands config
@@ -62,7 +62,13 @@ class GitlabManager(Manager):
            logger.warning(f'Got invalid keyloak user id for GitLab User {user_id}')
            return False

-        gitlab_service = GitLabServiceImpl(external_auth_id=keycloak_user_id)
+        # Importing here prevents circular import
+        from integrations.gitlab.gitlab_service import SaaSGitLabService
+
+        gitlab_service: SaaSGitLabService = GitLabServiceImpl(
+            external_auth_id=keycloak_user_id
+        )
+
        return await gitlab_service.user_has_write_access(project_id)

    async def receive_message(self, message: Message):
@@ -119,7 +125,13 @@ class GitlabManager(Manager):
            gitlab_view: The GitLab view object containing issue/PR/comment info
        """
        keycloak_user_id = gitlab_view.user_info.keycloak_user_id
-        gitlab_service = GitLabServiceImpl(external_auth_id=keycloak_user_id)
+
+        # Importing here prevents circular import
+        from integrations.gitlab.gitlab_service import SaaSGitLabService
+
+        gitlab_service: SaaSGitLabService = GitLabServiceImpl(
+            external_auth_id=keycloak_user_id
+        )

        outgoing_message = message.message

@@ -47,14 +47,14 @@ class GitlabIssue(ResolverViewInterface):
        )

        self.previous_comments = await gitlab_service.get_issue_or_mr_comments(
-            self.project_id, self.issue_number, is_mr=self.is_mr
+            str(self.project_id), self.issue_number, is_mr=self.is_mr
        )

        (
            self.title,
            self.description,
        ) = await gitlab_service.get_issue_or_mr_title_and_body(
-            self.project_id, self.issue_number, is_mr=self.is_mr
+            str(self.project_id), self.issue_number, is_mr=self.is_mr
        )

    async def _get_instructions(self, jinja_env: Environment) -> tuple[str, str]:
@@ -199,11 +199,11 @@ class GitlabInlineMRComment(GitlabMRComment):
            self.title,
            self.description,
        ) = await gitlab_service.get_issue_or_mr_title_and_body(
-            self.project_id, self.issue_number, is_mr=self.is_mr
+            str(self.project_id), self.issue_number, is_mr=self.is_mr
        )

        self.previous_comments = await gitlab_service.get_review_thread_comments(
-            self.project_id, self.issue_number, self.discussion_id
+            str(self.project_id), self.issue_number, self.discussion_id
        )

    async def _get_instructions(self, jinja_env: Environment) -> tuple[str, str]:
@@ -172,6 +172,17 @@ def get_summary_for_agent_state(

        return f'OpenHands encountered an error: **{reason}**.\n\n[See the conversation]({conversation_link}) for more information.'

+    if state == AgentState.AWAITING_USER_INPUT:
+        logger.info(
+            'Agent is awaiting user input',
+            extra={
+                'agent_state': state.value,
+                'conversation_link': conversation_link,
+                'observation_reason': getattr(observation, 'reason', None),
+            },
+        )
+        return f'OpenHands is waiting for your input. [Continue the conversation]({conversation_link}) to provide additional instructions.'
+
    # Log unknown agent state as error
    logger.error(
        'Unknown error: Unhandled agent state',
@@ -0,0 +1,50 @@
+"""add cancellation fields to subscription_access
+
+Revision ID: 075
+Revises: 074
+Create Date: 2025-01-11
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = '075'
+down_revision: Union[str, None] = '074'
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # Add cancelled_at field to track cancellation timestamp
+    op.add_column(
+        'subscription_access',
+        sa.Column('cancelled_at', sa.DateTime(timezone=True), nullable=True),
+    )
+
+    # Add stripe_subscription_id field to enable cancellation via Stripe API
+    op.add_column(
+        'subscription_access',
+        sa.Column('stripe_subscription_id', sa.String(), nullable=True),
+    )
+
+    # Create index on stripe_subscription_id for efficient lookups
+    op.create_index(
+        'ix_subscription_access_stripe_subscription_id',
+        'subscription_access',
+        ['stripe_subscription_id'],
+    )
+
+
+def downgrade() -> None:
+    # Drop index
+    op.drop_index(
+        'ix_subscription_access_stripe_subscription_id', 'subscription_access'
+    )
+
+    # Drop columns
+    op.drop_column('subscription_access', 'stripe_subscription_id')
+    op.drop_column('subscription_access', 'cancelled_at')
@@ -17,11 +17,13 @@ from server.constants import (
    STRIPE_API_KEY,
    STRIPE_WEBHOOK_SECRET,
    SUBSCRIPTION_PRICE_DATA,
+    get_default_litellm_model,
 )
 from server.logger import logger
 from storage.billing_session import BillingSession
 from storage.database import session_maker
 from storage.subscription_access import SubscriptionAccess
+from storage.user_settings import UserSettings

 from openhands.server.user_auth import get_user_id

@@ -42,6 +44,8 @@ class SubscriptionAccessResponse(BaseModel):
    start_at: datetime
    end_at: datetime
    created_at: datetime
+    cancelled_at: datetime | None = None
+    stripe_subscription_id: str | None = None


 class CreateCheckoutSessionRequest(BaseModel):
@@ -85,7 +89,7 @@ async def get_credits(user_id: str = Depends(get_user_id)) -> GetCreditsResponse
 async def get_subscription_access(
    user_id: str = Depends(get_user_id),
 ) -> SubscriptionAccessResponse | None:
-    """Get details of the currently valid subscription for the user"""
+    """Get details of the currently valid subscription for the user."""
    with session_maker() as session:
        now = datetime.now(UTC)
        subscription_access = (
@@ -102,6 +106,8 @@ async def get_subscription_access(
            start_at=subscription_access.start_at,
            end_at=subscription_access.end_at,
            created_at=subscription_access.created_at,
+            cancelled_at=subscription_access.cancelled_at,
+            stripe_subscription_id=subscription_access.stripe_subscription_id,
        )


@@ -113,6 +119,78 @@ async def has_payment_method(user_id: str = Depends(get_user_id)) -> bool:
    return await stripe_service.has_payment_method(user_id)


+# Endpoint to cancel user's subscription
+@billing_router.post('/cancel-subscription')
+async def cancel_subscription(user_id: str = Depends(get_user_id)) -> JSONResponse:
+    """Cancel user's active subscription at the end of the current billing period."""
+    if not user_id:
+        raise HTTPException(status.HTTP_401_UNAUTHORIZED)
+
+    with session_maker() as session:
+        # Find the user's active subscription
+        now = datetime.now(UTC)
+        subscription_access = (
+            session.query(SubscriptionAccess)
+            .filter(SubscriptionAccess.status == 'ACTIVE')
+            .filter(SubscriptionAccess.user_id == user_id)
+            .filter(SubscriptionAccess.start_at <= now)
+            .filter(SubscriptionAccess.end_at >= now)
+            .filter(SubscriptionAccess.cancelled_at.is_(None))  # Not already cancelled
+            .first()
+        )
+
+        if not subscription_access:
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail='No active subscription found',
+            )
+
+        if not subscription_access.stripe_subscription_id:
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail='Cannot cancel subscription: missing Stripe subscription ID',
+            )
+
+        try:
+            # Cancel the subscription in Stripe at period end
+            await stripe.Subscription.modify_async(
+                subscription_access.stripe_subscription_id, cancel_at_period_end=True
+            )
+
+            # Update local database
+            subscription_access.cancelled_at = datetime.now(UTC)
+            session.merge(subscription_access)
+            session.commit()
+
+            logger.info(
+                'subscription_cancelled',
+                extra={
+                    'user_id': user_id,
+                    'stripe_subscription_id': subscription_access.stripe_subscription_id,
+                    'subscription_access_id': subscription_access.id,
+                    'end_at': subscription_access.end_at,
+                },
+            )
+
+            return JSONResponse(
+                {'status': 'success', 'message': 'Subscription cancelled successfully'}
+            )
+
+        except stripe.StripeError as e:
+            logger.error(
+                'stripe_cancellation_failed',
+                extra={
+                    'user_id': user_id,
+                    'stripe_subscription_id': subscription_access.stripe_subscription_id,
+                    'error': str(e),
+                },
+            )
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail=f'Failed to cancel subscription: {str(e)}',
+            )
+
+
 # Endpoint to create a new setup intent in stripe
@billing_router.post('/create-customer-setup-session')
 async def create_customer_setup_session(
@@ -190,9 +268,27 @@ async def create_subscription_checkout_session(
    billing_session_type: BillingSessionType = BillingSessionType.MONTHLY_SUBSCRIPTION,
    user_id: str = Depends(get_user_id),
 ) -> CreateBillingSessionResponse:
+    # Prevent duplicate subscriptions for the same user
+    with session_maker() as session:
+        now = datetime.now(UTC)
+        existing_active_subscription = (
+            session.query(SubscriptionAccess)
+            .filter(SubscriptionAccess.status == 'ACTIVE')
+            .filter(SubscriptionAccess.user_id == user_id)
+            .filter(SubscriptionAccess.start_at <= now)
+            .filter(SubscriptionAccess.end_at >= now)
+            .filter(SubscriptionAccess.cancelled_at.is_(None))  # Not cancelled
+            .first()
+        )
+
+        if existing_active_subscription:
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail='Cannot create subscription: User already has an active subscription that has not been cancelled',
+            )
+
    customer_id = await stripe_service.find_or_create_customer(user_id)
    subscription_price_data = SUBSCRIPTION_PRICE_DATA[billing_session_type.value]
-    # TODO: Prevent duplicate subscriptions for the same user
    checkout_session = await stripe.checkout.Session.create_async(
        customer=customer_id,
        line_items=[
@@ -246,7 +342,7 @@ async def create_subscription_checkout_session_via_get(
    billing_session_type: BillingSessionType = BillingSessionType.MONTHLY_SUBSCRIPTION,
    user_id: str = Depends(get_user_id),
 ) -> RedirectResponse:
-    """Create a subscription checkout session using a GET request (For easier copy / paste to URL bar)"""
+    """Create a subscription checkout session using a GET request (For easier copy / paste to URL bar)."""
    response = await create_subscription_checkout_session(
        request, billing_session_type, user_id
    )
@@ -278,7 +374,7 @@ async def success_callback(session_id: str, request: Request):
            != BillingSessionType.DIRECT_PAYMENT.value
        ):
            return RedirectResponse(
-                f'{request.base_url}settings/billing?checkout=success', status_code=302
+                f'{request.base_url}settings?checkout=success', status_code=302
            )

        stripe_session = stripe.checkout.Session.retrieve(session_id)
@@ -348,14 +444,29 @@ async def cancel_callback(session_id: str, request: Request):
            session.merge(billing_session)
            session.commit()

+            # Redirect credit purchases to billing screen, subscriptions to LLM settings
+            if (
+                billing_session.billing_session_type
+                == BillingSessionType.DIRECT_PAYMENT.value
+            ):
+                return RedirectResponse(
+                    f'{request.base_url}settings/billing?checkout=cancel',
+                    status_code=302,
+                )
+            else:
+                return RedirectResponse(
+                    f'{request.base_url}settings?checkout=cancel', status_code=302
+                )
+
+    # If no billing session found, default to LLM settings (subscription flow)
    return RedirectResponse(
-        f'{request.base_url}settings/billing?checkout=cancel', status_code=302
+        f'{request.base_url}settings?checkout=cancel', status_code=302
    )


@billing_router.post('/stripe-webhook')
 async def stripe_webhook(request: Request) -> JSONResponse:
-    """Endpoint for stripe webhooks"""
+    """Endpoint for stripe webhooks."""
    payload = await request.body()
    sig_header = request.headers.get('stripe-signature')

@@ -397,15 +508,111 @@ async def stripe_webhook(request: Request) -> JSONResponse:
                end_at=end_at,
                amount_paid=amount_paid,
                stripe_invoice_payment_id=invoice.payment_intent,
+                stripe_subscription_id=invoice.subscription,  # Store Stripe subscription ID
            )
            session.add(subscription_access)
            session.commit()
+    elif event_type == 'customer.subscription.updated':
+        subscription = event['data']['object']
+        subscription_id = subscription['id']
+
+        # Handle subscription cancellation
+        if subscription.get('cancel_at_period_end') is True:
+            with session_maker() as session:
+                subscription_access = (
+                    session.query(SubscriptionAccess)
+                    .filter(
+                        SubscriptionAccess.stripe_subscription_id == subscription_id
+                    )
+                    .filter(SubscriptionAccess.status == 'ACTIVE')
+                    .first()
+                )
+
+                if subscription_access and not subscription_access.cancelled_at:
+                    subscription_access.cancelled_at = datetime.now(UTC)
+                    session.merge(subscription_access)
+                    session.commit()
+
+                    logger.info(
+                        'subscription_cancelled_via_webhook',
+                        extra={
+                            'stripe_subscription_id': subscription_id,
+                            'user_id': subscription_access.user_id,
+                            'subscription_access_id': subscription_access.id,
+                        },
+                    )
+    elif event_type == 'customer.subscription.deleted':
+        subscription = event['data']['object']
+        subscription_id = subscription['id']
+
+        with session_maker() as session:
+            subscription_access = (
+                session.query(SubscriptionAccess)
+                .filter(SubscriptionAccess.stripe_subscription_id == subscription_id)
+                .filter(SubscriptionAccess.status == 'ACTIVE')
+                .first()
+            )
+
+            if subscription_access:
+                subscription_access.status = 'DISABLED'
+                subscription_access.updated_at = datetime.now(UTC)
+                session.merge(subscription_access)
+                session.commit()
+
+                # Reset user settings to free tier defaults
+                reset_user_to_free_tier_settings(subscription_access.user_id)
+
+                logger.info(
+                    'subscription_expired_reset_to_free_tier',
+                    extra={
+                        'stripe_subscription_id': subscription_id,
+                        'user_id': subscription_access.user_id,
+                        'subscription_access_id': subscription_access.id,
+                    },
+                )
    else:
        logger.info('stripe_webhook_unhandled_event_type', extra={'type': event_type})

    return JSONResponse({'status': 'success'})


+def reset_user_to_free_tier_settings(user_id: str) -> None:
+    """Reset user settings to free tier defaults when subscription ends."""
+    with session_maker() as session:
+        user_settings = (
+            session.query(UserSettings)
+            .filter(UserSettings.keycloak_user_id == user_id)
+            .first()
+        )
+
+        if user_settings:
+            user_settings.llm_model = get_default_litellm_model()
+            user_settings.llm_api_key = None
+            user_settings.llm_api_key_for_byor = None
+            user_settings.llm_base_url = LITE_LLM_API_URL
+            user_settings.max_budget_per_task = None
+            user_settings.confirmation_mode = False
+            user_settings.enable_solvability_analysis = False
+            user_settings.security_analyzer = 'llm'
+            user_settings.agent = 'CodeActAgent'
+            user_settings.language = 'en'
+            user_settings.enable_default_condenser = True
+            user_settings.enable_sound_notifications = False
+            user_settings.enable_proactive_conversation_starters = True
+            user_settings.user_consents_to_analytics = False
+
+            session.merge(user_settings)
+            session.commit()
+
+            logger.info(
+                'user_settings_reset_to_free_tier',
+                extra={
+                    'user_id': user_id,
+                    'reset_timestamp': datetime.now(UTC).isoformat(),
+                },
+            )
+
+
 async def _get_litellm_user(client: httpx.AsyncClient, user_id: str) -> dict:
    """Get a user from litellm with the id matching that given.

@@ -234,7 +234,7 @@ def _get_user_id(conversation_id: str) -> str:
        return conversation_metadata.user_id


-async def _get_session_api_key(user_id: str, conversation_id: str) -> str:
+async def _get_session_api_key(user_id: str, conversation_id: str) -> str | None:
    agent_loop_info = await conversation_manager.get_agent_loop_info(
        user_id, filter_to_sids={conversation_id}
    )
@@ -7,7 +7,7 @@ from storage.base import Base
 class SubscriptionAccess(Base):  # type: ignore
    """
    Represents a user's subscription access record.
-    Tracks subscription status, duration, and payment information.
+    Tracks subscription status, duration, payment information, and cancellation status.
    """

    __tablename__ = 'subscription_access'
@@ -27,6 +27,8 @@ class SubscriptionAccess(Base):  # type: ignore
    end_at = Column(DateTime(timezone=True), nullable=True)
    amount_paid = Column(DECIMAL(19, 4), nullable=True)
    stripe_invoice_payment_id = Column(String, nullable=False)
+    cancelled_at = Column(DateTime(timezone=True), nullable=True)
+    stripe_subscription_id = Column(String, nullable=True, index=True)
    created_at = Column(
        DateTime(timezone=True),
        default=lambda: datetime.now(UTC),  # type: ignore[attr-defined]
@@ -276,12 +276,12 @@ class VerifyWebhookStatus:
                    webhook
                )

-                gitlab_service = GitLabServiceImpl(external_auth_id=user_id)
+                gitlab_service_impl = GitLabServiceImpl(external_auth_id=user_id)

-                if not isinstance(gitlab_service, SaaSGitLabService):
+                if not isinstance(gitlab_service_impl, SaaSGitLabService):
                    raise Exception('Only SaaSGitLabService is supported')
                # Cast needed when mypy can see OpenHands
-                gitlab_service = cast(type[SaaSGitLabService], gitlab_service)
+                gitlab_service = cast(type[SaaSGitLabService], gitlab_service_impl)

                await self.verify_conditions_are_met(
                    gitlab_service=gitlab_service,
@@ -0,0 +1,159 @@
+"""Tests for enterprise integrations utils module."""
+
+import pytest
+from integrations.utils import get_summary_for_agent_state
+
+from openhands.core.schema.agent import AgentState
+from openhands.events.observation.agent import AgentStateChangedObservation
+
+
+class TestGetSummaryForAgentState:
+    """Test cases for get_summary_for_agent_state function."""
+
+    def setup_method(self):
+        """Set up test fixtures."""
+        self.conversation_link = 'https://example.com/conversation/123'
+
+    def test_empty_observations_list(self):
+        """Test handling of empty observations list."""
+        result = get_summary_for_agent_state([], self.conversation_link)
+
+        assert 'unknown error' in result.lower()
+        assert self.conversation_link in result
+
+    @pytest.mark.parametrize(
+        'state,expected_text,includes_link',
+        [
+            (AgentState.RATE_LIMITED, 'rate limited', False),
+            (AgentState.AWAITING_USER_INPUT, 'waiting for your input', True),
+        ],
+    )
+    def test_handled_agent_states(self, state, expected_text, includes_link):
+        """Test handling of states with specific behavior."""
+        observation = AgentStateChangedObservation(
+            content=f'Agent state: {state.value}', agent_state=state
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        assert expected_text in result.lower()
+        if includes_link:
+            assert self.conversation_link in result
+        else:
+            assert self.conversation_link not in result
+
+    @pytest.mark.parametrize(
+        'state',
+        [
+            AgentState.FINISHED,
+            AgentState.PAUSED,
+            AgentState.STOPPED,
+            AgentState.AWAITING_USER_CONFIRMATION,
+        ],
+    )
+    def test_unhandled_agent_states(self, state):
+        """Test handling of unhandled states (should all return unknown error)."""
+        observation = AgentStateChangedObservation(
+            content=f'Agent state: {state.value}', agent_state=state
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        assert 'unknown error' in result.lower()
+        assert self.conversation_link in result
+
+    @pytest.mark.parametrize(
+        'error_code,expected_text',
+        [
+            (
+                'STATUS$ERROR_LLM_AUTHENTICATION',
+                'authentication with the llm provider failed',
+            ),
+            (
+                'STATUS$ERROR_LLM_SERVICE_UNAVAILABLE',
+                'llm service is temporarily unavailable',
+            ),
+            (
+                'STATUS$ERROR_LLM_INTERNAL_SERVER_ERROR',
+                'llm provider encountered an internal error',
+            ),
+            ('STATUS$ERROR_LLM_OUT_OF_CREDITS', "you've run out of credits"),
+            ('STATUS$ERROR_LLM_CONTENT_POLICY_VIOLATION', 'content policy violation'),
+        ],
+    )
+    def test_error_state_readable_reasons(self, error_code, expected_text):
+        """Test all readable error reason mappings."""
+        observation = AgentStateChangedObservation(
+            content=f'Agent encountered error: {error_code}',
+            agent_state=AgentState.ERROR,
+            reason=error_code,
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        assert 'encountered an error' in result.lower()
+        assert expected_text in result.lower()
+        assert self.conversation_link in result
+
+    def test_error_state_with_custom_reason(self):
+        """Test handling of ERROR state with a custom reason."""
+        observation = AgentStateChangedObservation(
+            content='Agent encountered an error',
+            agent_state=AgentState.ERROR,
+            reason='Test error message',
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        assert 'encountered an error' in result.lower()
+        assert 'test error message' in result.lower()
+        assert self.conversation_link in result
+
+    def test_multiple_observations_uses_first(self):
+        """Test that when multiple observations are provided, only the first is used."""
+        observation1 = AgentStateChangedObservation(
+            content='Agent is awaiting user input',
+            agent_state=AgentState.AWAITING_USER_INPUT,
+        )
+        observation2 = AgentStateChangedObservation(
+            content='Agent encountered an error',
+            agent_state=AgentState.ERROR,
+            reason='Should not be used',
+        )
+
+        result = get_summary_for_agent_state(
+            [observation1, observation2], self.conversation_link
+        )
+
+        # Should handle the first observation (AWAITING_USER_INPUT), not the second (ERROR)
+        assert 'waiting for your input' in result.lower()
+        assert 'error' not in result.lower()
+
+    def test_awaiting_user_input_specific_message(self):
+        """Test that AWAITING_USER_INPUT returns the specific expected message."""
+        observation = AgentStateChangedObservation(
+            content='Agent is awaiting user input',
+            agent_state=AgentState.AWAITING_USER_INPUT,
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        # Test the exact message format
+        assert 'waiting for your input' in result.lower()
+        assert 'continue the conversation' in result.lower()
+        assert self.conversation_link in result
+        assert 'unknown error' not in result.lower()
+
+    def test_rate_limited_specific_message(self):
+        """Test that RATE_LIMITED returns the specific expected message."""
+        observation = AgentStateChangedObservation(
+            content='Agent was rate limited', agent_state=AgentState.RATE_LIMITED
+        )
+
+        result = get_summary_for_agent_state([observation], self.conversation_link)
+
+        # Test the exact message format
+        assert 'rate limited' in result.lower()
+        assert 'try again later' in result.lower()
+        # RATE_LIMITED doesn't include conversation link in response
+        assert self.conversation_link not in result
@@ -5,16 +5,16 @@ import pytest
 import stripe
 from fastapi import HTTPException, Request, status
 from httpx import HTTPStatusError, Response
-from server.routes import billing
+from integrations.stripe_service import has_payment_method
 from server.routes.billing import (
    CreateBillingSessionResponse,
    CreateCheckoutSessionRequest,
    GetCreditsResponse,
    cancel_callback,
+    cancel_subscription,
    create_checkout_session,
-    create_customer_setup_session,
+    create_subscription_checkout_session,
    get_credits,
-    has_payment_method,
    success_callback,
 )
 from sqlalchemy import create_engine
@@ -362,8 +362,7 @@ async def test_cancel_callback_session_not_found():
        response = await cancel_callback('test_session_id', mock_request)
        assert response.status_code == 302
        assert (
-            response.headers['location']
-            == 'http://test.com/settings/billing?checkout=cancel'
+            response.headers['location'] == 'http://test.com/settings?checkout=cancel'
        )

        # Verify no database updates occurred
@@ -389,8 +388,7 @@ async def test_cancel_callback_success():

        assert response.status_code == 302
        assert (
-            response.headers['location']
-            == 'http://test.com/settings/billing?checkout=cancel'
+            response.headers['location'] == 'http://test.com/settings?checkout=cancel'
        )

        # Verify database updates
@@ -402,51 +400,312 @@ async def test_cancel_callback_success():
@pytest.mark.asyncio
 async def test_has_payment_method_with_payment_method():
    """Test has_payment_method returns True when user has a payment method."""
-
-    mock_has_payment_method = AsyncMock(return_value=True)
-    with patch(
-        'integrations.stripe_service.has_payment_method', mock_has_payment_method
+    with (
+        patch('integrations.stripe_service.session_maker') as mock_session_maker,
+        patch(
+            'stripe.Customer.list_payment_methods_async',
+            AsyncMock(return_value=MagicMock(data=[MagicMock()])),
+        ) as mock_list_payment_methods,
    ):
+        # Setup mock session
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.first.return_value = (
+            MagicMock(stripe_customer_id='cus_test123')
+        )
+
        result = await has_payment_method('mock_user')
        assert result is True
-    mock_has_payment_method.assert_called_once_with('mock_user')
+        mock_list_payment_methods.assert_called_once_with('cus_test123')


@pytest.mark.asyncio
 async def test_has_payment_method_without_payment_method():
    """Test has_payment_method returns False when user has no payment method."""
-    mock_has_payment_method = AsyncMock(return_value=False)
-    with patch(
-        'integrations.stripe_service.has_payment_method', mock_has_payment_method
+    with (
+        patch('integrations.stripe_service.session_maker') as mock_session_maker,
+        patch(
+            'stripe.Customer.list_payment_methods_async',
+            AsyncMock(return_value=MagicMock(data=[])),
+        ) as mock_list_payment_methods,
    ):
-        mock_has_payment_method.return_value = False
+        # Setup mock session
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.first.return_value = (
+            MagicMock(stripe_customer_id='cus_test123')
+        )
+
        result = await has_payment_method('mock_user')
        assert result is False
-    mock_has_payment_method.assert_called_once_with('mock_user')
+        mock_list_payment_methods.assert_called_once_with('cus_test123')


@pytest.mark.asyncio
-async def test_create_customer_setup_session_success():
-    """Test successful creation of customer setup session."""
-    mock_request = Request(
-        scope={'type': 'http', 'state': {'user_id': 'mock_user'}, 'headers': []}
+async def test_cancel_subscription_success():
+    """Test successful subscription cancellation."""
+    from datetime import UTC, datetime
+
+    from storage.subscription_access import SubscriptionAccess
+
+    # Mock active subscription
+    mock_subscription_access = SubscriptionAccess(
+        id=1,
+        status='ACTIVE',
+        user_id='test_user',
+        start_at=datetime.now(UTC),
+        end_at=datetime.now(UTC),
+        amount_paid=2000,
+        stripe_invoice_payment_id='pi_test',
+        stripe_subscription_id='sub_test123',
+        cancelled_at=None,
    )

-    mock_customer = stripe.Customer(
-        id='mock-customer', metadata={'user_id': 'mock-user'}
-    )
-    mock_session = MagicMock()
-    mock_session.url = 'https://checkout.stripe.com/test-session'
-    mock_create = AsyncMock(return_value=mock_session)
+    # Mock Stripe subscription response
+    mock_stripe_subscription = MagicMock()
+    mock_stripe_subscription.cancel_at_period_end = True

    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+        patch(
+            'stripe.Subscription.modify_async',
+            AsyncMock(return_value=mock_stripe_subscription),
+        ) as mock_stripe_modify,
+    ):
+        # Setup mock session
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = mock_subscription_access
+
+        # Call the function
+        result = await cancel_subscription('test_user')
+
+        # Verify Stripe API was called
+        mock_stripe_modify.assert_called_once_with(
+            'sub_test123', cancel_at_period_end=True
+        )
+
+        # Verify database was updated
+        assert mock_subscription_access.cancelled_at is not None
+        mock_session.merge.assert_called_once_with(mock_subscription_access)
+        mock_session.commit.assert_called_once()
+
+        # Verify response
+        assert result.status_code == 200
+
+
+@pytest.mark.asyncio
+async def test_cancel_subscription_no_active_subscription():
+    """Test cancellation when no active subscription exists."""
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+    ):
+        # Setup mock session with no subscription found
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = None
+
+        # Call the function and expect HTTPException
+        with pytest.raises(HTTPException) as exc_info:
+            await cancel_subscription('test_user')
+
+        assert exc_info.value.status_code == 404
+        assert 'No active subscription found' in str(exc_info.value.detail)
+
+
+@pytest.mark.asyncio
+async def test_cancel_subscription_missing_stripe_id():
+    """Test cancellation when subscription has no Stripe ID."""
+    from datetime import UTC, datetime
+
+    from storage.subscription_access import SubscriptionAccess
+
+    # Mock subscription without Stripe ID
+    mock_subscription_access = SubscriptionAccess(
+        id=1,
+        status='ACTIVE',
+        user_id='test_user',
+        start_at=datetime.now(UTC),
+        end_at=datetime.now(UTC),
+        amount_paid=2000,
+        stripe_invoice_payment_id='pi_test',
+        stripe_subscription_id=None,  # Missing Stripe ID
+        cancelled_at=None,
+    )
+
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+    ):
+        # Setup mock session
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = mock_subscription_access
+
+        # Call the function and expect HTTPException
+        with pytest.raises(HTTPException) as exc_info:
+            await cancel_subscription('test_user')
+
+        assert exc_info.value.status_code == 400
+        assert 'missing Stripe subscription ID' in str(exc_info.value.detail)
+
+
+@pytest.mark.asyncio
+async def test_cancel_subscription_stripe_error():
+    """Test cancellation when Stripe API fails."""
+    from datetime import UTC, datetime
+
+    from storage.subscription_access import SubscriptionAccess
+
+    # Mock active subscription
+    mock_subscription_access = SubscriptionAccess(
+        id=1,
+        status='ACTIVE',
+        user_id='test_user',
+        start_at=datetime.now(UTC),
+        end_at=datetime.now(UTC),
+        amount_paid=2000,
+        stripe_invoice_payment_id='pi_test',
+        stripe_subscription_id='sub_test123',
+        cancelled_at=None,
+    )
+
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+        patch(
+            'stripe.Subscription.modify_async',
+            AsyncMock(side_effect=stripe.StripeError('API Error')),
+        ),
+    ):
+        # Setup mock session
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = mock_subscription_access
+
+        # Call the function and expect HTTPException
+        with pytest.raises(HTTPException) as exc_info:
+            await cancel_subscription('test_user')
+
+        assert exc_info.value.status_code == 500
+        assert 'Failed to cancel subscription' in str(exc_info.value.detail)
+
+
+@pytest.mark.asyncio
+async def test_create_subscription_checkout_session_duplicate_prevention():
+    """Test that creating a subscription when user already has active subscription raises error."""
+    from datetime import UTC, datetime
+
+    from storage.subscription_access import SubscriptionAccess
+
+    # Mock active subscription
+    mock_subscription_access = SubscriptionAccess(
+        id=1,
+        status='ACTIVE',
+        user_id='test_user',
+        start_at=datetime.now(UTC),
+        end_at=datetime.now(UTC),
+        amount_paid=2000,
+        stripe_invoice_payment_id='pi_test',
+        stripe_subscription_id='sub_test123',
+        cancelled_at=None,
+    )
+
+    mock_request = Request(scope={'type': 'http'})
+    mock_request._base_url = URL('http://test.com/')
+
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+    ):
+        # Setup mock session to return existing active subscription
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = mock_subscription_access
+
+        # Call the function and expect HTTPException
+        with pytest.raises(HTTPException) as exc_info:
+            await create_subscription_checkout_session(
+                mock_request, user_id='test_user'
+            )
+
+        assert exc_info.value.status_code == 400
+        assert (
+            'user already has an active subscription'
+            in str(exc_info.value.detail).lower()
+        )
+
+
+@pytest.mark.asyncio
+async def test_create_subscription_checkout_session_allows_after_cancellation():
+    """Test that creating a subscription is allowed when previous subscription was cancelled."""
+    mock_request = Request(scope={'type': 'http'})
+    mock_request._base_url = URL('http://test.com/')
+
+    mock_session_obj = MagicMock()
+    mock_session_obj.url = 'https://checkout.stripe.com/test-session'
+    mock_session_obj.id = 'test_session_id'
+
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
        patch(
            'integrations.stripe_service.find_or_create_customer',
-            AsyncMock(return_value=mock_customer),
+            AsyncMock(return_value='cus_test123'),
+        ),
+        patch(
+            'stripe.checkout.Session.create_async',
+            AsyncMock(return_value=mock_session_obj),
+        ),
+        patch(
+            'server.routes.billing.SUBSCRIPTION_PRICE_DATA',
+            {'MONTHLY_SUBSCRIPTION': {'unit_amount': 2000}},
        ),
-        patch('stripe.checkout.Session.create_async', mock_create),
    ):
-        result = await create_customer_setup_session(mock_request)
+        # Setup mock session - the query should return None because cancelled subscriptions are filtered out
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = None

-        assert isinstance(result, billing.CreateBillingSessionResponse)
+        # Should succeed
+        result = await create_subscription_checkout_session(
+            mock_request, user_id='test_user'
+        )
+
+        assert isinstance(result, CreateBillingSessionResponse)
+        assert result.redirect_url == 'https://checkout.stripe.com/test-session'
+
+
+@pytest.mark.asyncio
+async def test_create_subscription_checkout_session_success_no_existing():
+    """Test successful subscription creation when no existing subscription."""
+    mock_request = Request(scope={'type': 'http'})
+    mock_request._base_url = URL('http://test.com/')
+
+    mock_session_obj = MagicMock()
+    mock_session_obj.url = 'https://checkout.stripe.com/test-session'
+    mock_session_obj.id = 'test_session_id'
+
+    with (
+        patch('server.routes.billing.session_maker') as mock_session_maker,
+        patch(
+            'integrations.stripe_service.find_or_create_customer',
+            AsyncMock(return_value='cus_test123'),
+        ),
+        patch(
+            'stripe.checkout.Session.create_async',
+            AsyncMock(return_value=mock_session_obj),
+        ),
+        patch(
+            'server.routes.billing.SUBSCRIPTION_PRICE_DATA',
+            {'MONTHLY_SUBSCRIPTION': {'unit_amount': 2000}},
+        ),
+    ):
+        # Setup mock session to return no existing subscription
+        mock_session = MagicMock()
+        mock_session_maker.return_value.__enter__.return_value = mock_session
+        mock_session.query.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.filter.return_value.first.return_value = None
+
+        # Should succeed
+        result = await create_subscription_checkout_session(
+            mock_request, user_id='test_user'
+        )
+
+        assert isinstance(result, CreateBillingSessionResponse)
        assert result.redirect_url == 'https://checkout.stripe.com/test-session'
@@ -0,0 +1,152 @@
+<h1 align="center"> Training Software Engineering Agents and Verifiers with SWE-Gym </h1>
+
+A Multi-SWE-bench implementation of SWE-Gym.
+
+<p align="center">
+  <a href="https://www.jiayipan.com/" style="text-decoration: none;">Jiayi Pan<sup>*,1</sup></a>,
+  <a href="https://xwang.dev/" style="text-decoration: none;">Xingyao Wang<sup>*,2</sup></a>,
+  <a href="https://www.phontron.com/" style="text-decoration: none;">Graham Neubig<sup>3</sup></a>,
+  <a href="https://www.cs.toronto.edu/~ndjaitly/" style="text-decoration: none;">Navdeep Jaitly<sup>4</sup></a>,
+  <a href="https://blender.cs.illinois.edu/hengji.html" style="text-decoration: none;">Heng Ji<sup>2</sup></a>,
+  <a href="https://www.alanesuhr.com/" style="text-decoration: none;">Alane Suhr<sup>^,1</sup></a>,
+  <a href="https://dreasysnail.github.io/" style="text-decoration: none;">Yizhe Zhang<sup>^,4</sup></a>
+</p>
+
+<p align="center">
+  <sup>1</sup>UC Berkeley, <sup>2</sup>UIUC, <sup>3</sup>CMU, <sup>4</sup>Apple </br>
+  <sub><sup>*</sup>Equal contribution, <sup>^</sup>Equal supervision</sub>
+</p>
+
+<p align="center">
+<a href="https://arxiv.org/abs/2412.21139">📃 Paper</a>
+•
+<a href="https://huggingface.co/SWE-Gym" >🤗 Data & Models</a>
+</p>
+
+We present **SWE-Gym**, the first environment for training real-world software engineering agents.
+We use it to train strong LM agents that achieve state-of-the-art open results on SWE-Bench, with early, promising scaling characteristics as we increase training and inference-time compute.
+
+<p align="center">
+  <img src="https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/teaser.jpg?raw=true" width="100%" alt="teaser">
+</p>
+
+---
+# Run SWE-Gym with OpenHands
+
+The process of running SWE-Gym is very similar to how you'd run SWE-Bench evaluation.
+
+
+1. First, clone OpenHands repo `git clone https://github.com/All-Hands-AI/OpenHands.git`
+2. Then setup the repo following [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md)
+3. Then you can simply serve your own model as an OpenAI compatible endpoint, put those info in config.toml. You can do this by following instruction [here](../../README.md#setup).
+4. And then simply do the following to sample for 16x parallelism:
+
+```bash
+export ALLHANDS_API_KEY=ah-yourkey  # You don't need to set this when running these in local docker container
+./evaluation/benchmarks/multi_swe_bench/scripts/rollout_swegym.sh llm.mymodel-temp05 'train-t05' 16
+```
+
+NOTE: SWE-Gym sampling with parallelism is currently only tested with AllHands RemoteRuntime (limited beta). Fill [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply for access.
+
+
+5. When `rollout_swegym.sh` finishes, you will get a file called `output.with_completions.jsonl.gz`. Then you can use [`./scripts/swegym/convert_data.ipynb`](./scripts/swegym/convert_data.ipynb) to convert them into SFT data format.
+
+## Running the Jupyter Notebook
+
+To run the data conversion notebook, follow these steps:
+
+1. Navigate to the OpenHands repository root:
+```bash
+cd openhands_repo
+```
+
+2. Set the PYTHONPATH and start Jupyter notebook:
+```bash
+PYTHONPATH=$(pwd) jupyter notebook
+```
+
+3. In the Jupyter interface, navigate to `evaluation/benchmarks/swe_bench/scripts/swegym/convert_data.ipynb`
+
+4. Update the file paths in the notebook:
+   - Set `FILE_PATHS` to point to your `output.with_completions.jsonl.gz` files
+   - Set `YOUR_OUTPUT_FOLDER` to your desired output directory
+
+5. Run the notebook cells sequentially to process your data and generate the SFT training format.
+
+---
+# More info about SWE-Gym
+
+Progress in agents for software engineering has been limited by the lack of training environments that both include rigorous verification for reinforcement learning and cover the expansive tasks encountered in real-world repository-level engineering.
+
+We introduce SWE-Gym: An Open Environment for Training Software Engineering Agents & Verifiers.
+Our baselines achieve new open SOTA - 32%/26% on SWE-Bench Verified/Lite, with promising scaling trends.
+
+![SWE-Gym Scaling](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/scaling.jpg?raw=true)
+*SWE-Gym enables scalable improvements for software engineering agents at both training and inference time. Our current results is primarily bottlenecked by training and inference compute, rather than the size of our environment.*
+
+## SWE-Gym Environment
+
+We create SWE-Gym, the first environment for training SWE agents, with **2.4K real tasks from 11 Python repos** & a Lite split of 234 instances. SWE-Gym combines real-world Python tasks, repository context, executable environments, and test verification to train agents for solving software engineering problems.
+
+![SWE-Gym Repo Distribution](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/swe-gym.jpg?raw=true)
+
+
+## SWE-Gym trains LMs as agents
+
+When fine-tuned on less than 500 agent-environment interaction trajectories sampled from it from GPT-4o and Claude 3.5 Sonnet, we achieve **+14%** absolute gains on SWE-Bench Verified with an 32B LM-powered OpenHands agent.
+
+![OpenHands Performance diff before and after training](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/oh-agent.jpg?raw=true)
+
+
+## SWE-Gym enables self-improvement
+
+SWE-Gym is also effective across agent scaffolds. With rejection sampling fine-tuning and MoatlessTools scaffold, our 32B and 7B models achieve 20% and 10% respectively on SWE-Bench Lite through self-improvement.
+
+<p align="center">
+  <img src="https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/ml-agent.jpg?raw=true" width="80%" alt="Moatless self-improvement">
+</p>
+
+
+
+## SWE-Gym enables inference-time scaling
+
+SWE-Gym enables inference-time scaling through verifiers trained on agent trajectories.
+These verifiers identify most promising solutions via best-of-n selection, together with our learned agents, they achieve 32%/26% on SWE-Bench Verified/Lite, a new open SoTA.
+
+
+![Inference Time Scaling for Moatless Agent](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/inference-ml.jpg?raw=true)
+*Inference Time Scaling for Moatless Agent*
+
+![Inference Time Scaling for OpenHands Agent](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/inference-oh.jpg?raw=true)
+*Inference Time Scaling for OpenHands Agent*
+
+
+## Our baselines on SWE-Gym shows strong scaling trends
+
+Lastly, our ablations reveal strong scaling trends - performance is now bottlenecked by train and inference compute, rather than the size of our dataset. Pushing and improving these scaling trends further is an exciting direction for future work.
+
+![](https://github.com/SWE-Gym/SWE-Gym/blob/main/assets/images/scaling.jpg?raw=true)
+
+## Reproducing Results
+**The Dataset**
+
+To access SWE-Gym dataset, checkout our huggingface hub page [SWE-Gym](https://huggingface.co/SWE-Gym)
+
+The environment constants are currently saved at [SWE-Bench-Fork](https://github.com/SWE-Gym/SWE-Bench-Fork)
+
+We also have pre-built docker images for each instance under [xingyaoww/sweb.eval.x86_64](https://hub.docker.com/search?q=xingyaoww%2Fsweb.eval.x86_64.) prefix at docker hub.
+
+
+## 📚 Citation
+
+```bibtex
+@misc{pan2024trainingsoftwareengineeringagents,
+      title={Training Software Engineering Agents and Verifiers with SWE-Gym},
+      author={Jiayi Pan and Xingyao Wang and Graham Neubig and Navdeep Jaitly and Heng Ji and Alane Suhr and Yizhe Zhang},
+      year={2024},
+      eprint={2412.21139},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2412.21139},
+}
+```
@@ -51,8 +51,8 @@ RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'tru

 # TODO: migrate all swe-bench docker to ghcr.io/openhands
 # TODO: 适应所有的语言
-DOCKER_IMAGE_PREFIX = os.environ.get('EVAL_DOCKER_IMAGE_PREFIX', '')
-LANGUAGE = os.environ.get('LANGUAGE', 'python')
+DOCKER_IMAGE_PREFIX = os.environ.get('EVAL_DOCKER_IMAGE_PREFIX', 'mswebench')
+LANGUAGE = os.environ.get('LANGUAGE', 'java')
 logger.info(f'Using docker image prefix: {DOCKER_IMAGE_PREFIX}')


@@ -305,31 +305,19 @@ def get_instance_docker_image(instance: pd.Series):
        instance_id = instance.get('instance_id', '')
        tag_suffix = instance_id.split('-')[-1] if instance_id else ''
        container_tag = f'pr-{tag_suffix}'
-        # pdb.set_trace()
-        return f'mswebench/{container_name}:{container_tag}'
-        # return "kong/insomnia:pr-8284"
-        # return "'sweb.eval.x86_64.local_insomnia"
-        # return "local_insomnia_why"
-        # return "local/kong-insomnia:pr-8117"
+        return f'{DOCKER_IMAGE_PREFIX}/{container_name}:{container_tag}'


 def get_config(
    instance: pd.Series,
    metadata: EvalMetadata,
 ) -> OpenHandsConfig:
-    SWE_BENCH_CONTAINER_IMAGE = 'ghcr.io/opendevin/eval-swe-bench:full-v1.2.1'
-    if USE_INSTANCE_IMAGE:
-        # We use a different instance image for the each instance of swe-bench eval
-        # base_container_image = get_instance_docker_image(instance['instance_id'])
-        base_container_image = get_instance_docker_image(instance)
-        logger.info(
-            f'Using instance container image: {base_container_image}. '
-            f'Please make sure this image exists. '
-            f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
-        )
-    else:
-        base_container_image = SWE_BENCH_CONTAINER_IMAGE
-        logger.info(f'Using swe-bench container image: {base_container_image}')
+    base_container_image = get_instance_docker_image(instance)
+    logger.info(
+        f'Using instance container image: {base_container_image}. '
+        f'Please make sure this image exists. '
+        f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
+    )

    sandbox_config = get_default_sandbox_config_for_eval()
    sandbox_config.base_container_image = base_container_image
@@ -772,7 +760,6 @@ if __name__ == '__main__':
    parser.add_argument(
        '--dataset',
        type=str,
-        default='princeton-nlp/SWE-bench',
        help='data set to evaluate on, either full-test or lite-test',
    )
    parser.add_argument(
@@ -787,6 +774,7 @@ if __name__ == '__main__':
    # so we don't need to manage file uploading to OpenHands's repo
    # dataset = load_dataset(args.dataset, split=args.split)
    # dataset = load_dataset(args.dataset)
+    logger.info(f'Loading dataset {args.dataset} with split {args.split} ')
    dataset = load_dataset('json', data_files=args.dataset)
    dataset = dataset[args.split]
    swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
@@ -839,7 +827,7 @@ if __name__ == '__main__':
        args.eval_num_workers,
        process_instance,
        timeout_seconds=120 * 60,  # 2 hour PER instance should be more than enough
-        max_retries=5,
+        max_retries=3,
    )
    # Check if any instances reached maximum retries
    check_maximum_retries_exceeded(metadata.eval_output_dir)
@@ -1,37 +1,54 @@
+import argparse
 import json

-input_file = 'XXX.jsonl'
-output_file = 'YYY.jsonl'

-with (
-    open(input_file, 'r', encoding='utf-8') as fin,
-    open(output_file, 'w', encoding='utf-8') as fout,
-):
-    for line in fin:
-        line = line.strip()
-        if not line:
-            continue
+def main(input_file, output_file):
+    with (
+        open(input_file, 'r', encoding='utf-8') as fin,
+        open(output_file, 'w', encoding='utf-8') as fout,
+    ):
+        for line in fin:
+            line = line.strip()
+            if not line:
+                continue

-        data = json.loads(line)
-        item = data
+            data = json.loads(line)
+            item = data

-        # 提取原始数据
-        org = item.get('org', '')
-        repo = item.get('repo', '')
-        number = str(item.get('number', ''))
+            # Skip instances that don't have resolved_issues or have empty resolved_issues
+            if not item.get('resolved_issues') or len(item['resolved_issues']) == 0:
+                print(
+                    f'Skipping instance {item.get("org", "")}/{item.get("repo", "")}-{item.get("number", "")} - no resolved_issues'
+                )
+                continue

-        new_item = {}
-        new_item['repo'] = f'{org}/{repo}'
-        new_item['instance_id'] = f'{org}__{repo}-{number}'
-        new_item['problem_statement'] = (
-            item['resolved_issues'][0].get('title', '')
-            + '\n'
-            + item['resolved_issues'][0].get('body', '')
-        )
-        new_item['FAIL_TO_PASS'] = []
-        new_item['PASS_TO_PASS'] = []
-        new_item['base_commit'] = item['base'].get('sha', '')
-        new_item['version'] = '0.1'  # depends
+            # 提取原始数据
+            org = item.get('org', '')
+            repo = item.get('repo', '')
+            number = str(item.get('number', ''))

-        output_data = new_item
-        fout.write(json.dumps(output_data, ensure_ascii=False) + '\n')
+            new_item = {}
+            new_item['repo'] = f'{org}/{repo}'
+            new_item['instance_id'] = f'{org}__{repo}-{number}'
+
+            # Get the first resolved issue
+            resolved_issue = item['resolved_issues'][0]
+            title = resolved_issue.get('title') or ''
+            body = resolved_issue.get('body') or ''
+
+            new_item['problem_statement'] = title + '\n' + body
+            new_item['FAIL_TO_PASS'] = []
+            new_item['PASS_TO_PASS'] = []
+            new_item['base_commit'] = item['base'].get('sha', '')
+            new_item['version'] = '0.1'  # depends
+
+            output_data = new_item
+            fout.write(json.dumps(output_data, ensure_ascii=False) + '\n')
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input', required=True, help='Input .jsonl file path')
+    parser.add_argument('--output', required=True, help='Output .jsonl file path')
+    args = parser.parse_args()
+    main(args.input, args.output)
@@ -0,0 +1,69 @@
+import argparse
+import gzip
+import json
+import os
+from glob import glob
+
+from tqdm import tqdm
+
+tqdm.pandas()
+
+
+# Load trajectories for resolved instances
+def load_completions(output_dir: str, instance_id: str):
+    glob_path = os.path.join(output_dir, 'llm_completions', instance_id, '*.json')
+    files = sorted(glob(glob_path))  # this is ascending order
+    # pick the last file (last turn)
+    try:
+        file_path = files[-1]
+    except IndexError:
+        # print(f'No files found for instance {instance_id}: files={files}')
+        return None
+    with open(file_path, 'r') as f:
+        result = json.load(f)
+    # create messages
+    messages = result['messages']
+    messages.append(result['response']['choices'][0]['message'])
+    tools = result['kwargs'].get('tools', [])
+    return {
+        'messages': messages,
+        'tools': tools,
+    }
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument('jsonl_path', type=str)
+args = parser.parse_args()
+
+output_dir = os.path.dirname(args.jsonl_path)
+output_path = os.path.join(output_dir, 'output.with_completions.jsonl.gz')
+
+# Check if output would be different from input
+needs_update = False
+with open(args.jsonl_path, 'r') as f_in:
+    for line in tqdm(f_in, desc='Checking for changes'):
+        data = json.loads(line)
+        new_completions = load_completions(output_dir, data['instance_id'])
+        current_completions = data.get('raw_completions')
+        if current_completions != new_completions:
+            needs_update = True
+            break
+
+if not needs_update:
+    print('No updates required. Skipping file update.')
+    exit(0)
+
+if os.path.exists(output_path):
+    print(f'Output file already exists at {output_path}, overwriting? (y/n)')
+    if input() != 'y':
+        print('Exiting...')
+        exit(0)
+
+# Process line by line
+with open(args.jsonl_path, 'r') as f_in, gzip.open(output_path, 'wt') as f_out:
+    for line in tqdm(f_in):
+        data = json.loads(line)
+        data['raw_completions'] = load_completions(output_dir, data['instance_id'])
+        f_out.write(json.dumps(data) + '\n')
+
+print(f'Saved compressed output to {output_path}')
@@ -1,13 +1,11 @@
+import argparse
 import json
 import re

-IN_FILE = 'output.jsonl'
-OUT_FILE = 'patch.jsonl'

-
-def main():
-    with open(IN_FILE, 'r') as fin:
-        with open(OUT_FILE, 'w') as fout:
+def main(input_file, output_file):
+    with open(input_file, 'r') as fin:
+        with open(output_file, 'w') as fout:
            for line in fin:
                data = json.loads(line)
                groups = re.match(r'(.*)__(.*)-(.*)', data['instance_id'])
@@ -15,10 +13,14 @@ def main():
                    'org': groups.group(1),
                    'repo': groups.group(2),
                    'number': groups.group(3),
-                    'fix_patch': data['test_result']['git_patch'],
+                    'fix_patch': data.get('test_result', {}).get('git_patch', '') or '',
                }
                fout.write(json.dumps(patch) + '\n')


 if __name__ == '__main__':
-    main()
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input', required=True, help='Input .jsonl file path')
+    parser.add_argument('--output', required=True, help='Output .jsonl file path')
+    args = parser.parse_args()
+    main(args.input, args.output)
@@ -0,0 +1,70 @@
+import argparse
+import json
+import os
+import subprocess
+
+
+def update_multi_swe_config(output_jsonl_path, config_path, dataset):
+    path_to_parent = os.path.dirname(os.path.abspath(output_jsonl_path))
+    converted_path = os.path.join(path_to_parent, 'output_converted.jsonl')
+
+    # Run the conversion script
+    subprocess.run(
+        [
+            'python3',
+            './evaluation/benchmarks/multi_swe_bench/scripts/eval/convert.py',
+            '--input',
+            output_jsonl_path,
+            '--output',
+            converted_path,
+        ],
+        check=True,
+    )
+
+    # Create required directories
+    os.makedirs(os.path.join(path_to_parent, 'eval_files', 'dataset'), exist_ok=True)
+    os.makedirs(os.path.join(path_to_parent, 'eval_files', 'workdir'), exist_ok=True)
+    os.makedirs(os.path.join(path_to_parent, 'eval_files', 'repos'), exist_ok=True)
+    os.makedirs(os.path.join(path_to_parent, 'eval_files', 'logs'), exist_ok=True)
+
+    # Prepare config dict
+    config = {
+        'mode': 'evaluation',
+        'workdir': os.path.join(path_to_parent, 'eval_files', 'workdir'),
+        'patch_files': [converted_path],
+        'dataset_files': [dataset],
+        'force_build': True,
+        'output_dir': os.path.join(path_to_parent, 'eval_files', 'dataset'),
+        'specifics': [],
+        'skips': [],
+        'repo_dir': os.path.join(path_to_parent, 'eval_files', 'repos'),
+        'need_clone': True,
+        'global_env': [],
+        'clear_env': True,
+        'stop_on_error': False,
+        'max_workers': 5,
+        'max_workers_build_image': 5,
+        'max_workers_run_instance': 5,
+        'log_dir': os.path.join(path_to_parent, 'eval_files', 'logs'),
+        'log_level': 'DEBUG',
+        'fix_patch_run_cmd': (
+            'bash -c "apt update ; apt install -y patch ; '
+            "sed -i 's@git apply.*@patch --batch --fuzz=5 -p1 -i /home/test.patch;"
+            'patch --batch --fuzz=5 -p1 -i /home/fix.patch@g\' /home/fix-run.sh ; chmod +x /home/*.sh  ; /home/fix-run.sh"'
+        ),
+    }
+
+    # Save to multibench.config
+    os.makedirs(os.path.dirname(config_path), exist_ok=True)
+    with open(config_path, 'w') as f:
+        json.dump(config, f, indent=4)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input', required=True, help='Path to input file')
+    parser.add_argument('--output', required=True, help='Path to create config')
+    parser.add_argument('--dataset', required=True, help='Path to dataset')
+    args = parser.parse_args()
+
+    update_multi_swe_config(args.input, args.output, args.dataset)
@@ -0,0 +1,176 @@
+import argparse
+import json
+import os
+from collections import defaultdict
+
+from tqdm import tqdm
+
+parser = argparse.ArgumentParser()
+parser.add_argument('input_file', type=str)
+parser.add_argument(
+    '--force',
+    action='store_true',
+    help='Force update all reports even if no changes are detected',
+)
+parser.add_argument(
+    '--overwrite-backup',
+    action='store_true',
+    help='Automatically overwrite existing backup files without prompting',
+)
+args = parser.parse_args()
+
+dirname = os.path.dirname(args.input_file)
+
+# Initialize counters and data structures
+instance_id_to_status = defaultdict(
+    lambda: {
+        'empty_generation': False,
+        'resolved': False,
+        'failed_apply_patch': False,
+        'error_eval': False,
+        'test_timeout': False,
+    }
+)
+
+# Process official report if it exists
+swebench_official_report_json = os.path.join(
+    dirname, 'eval_files/dataset/final_report.json'
+)
+openhands_remote_report_jsonl = args.input_file.replace(
+    '.jsonl', '.swebench_eval.jsonl'
+)
+
+if os.path.exists(swebench_official_report_json):
+    output_md_filepath = os.path.join(dirname, 'README.md')
+    with open(swebench_official_report_json, 'r') as f:
+        report = json.load(f)
+
+    # Convert instance IDs from "repo/name:pr-123" format to "repo__name-123" format
+    def convert_instance_id(instance_id):
+        """Convert instance ID from slash/colon-pr format to double underscore/dash format."""
+        if '/' in instance_id and ':pr-' in instance_id:
+            # Split on '/' and ':pr-'
+            parts = instance_id.split('/')
+            if len(parts) == 2:
+                repo_part = parts[0]
+                name_and_pr = parts[1]
+                if ':pr-' in name_and_pr:
+                    name, pr_number = name_and_pr.split(':pr-')
+                    return f'{repo_part}__{name}-{pr_number}'
+        return instance_id
+
+    # Convert all instance ID lists in the report
+    for key in [
+        'resolved_ids',
+        'unresolved_ids',
+        'error_ids',
+        'empty_patch_ids',
+        'incomplete_ids',
+    ]:
+        if key in report:
+            report[key] = [
+                convert_instance_id(instance_id) for instance_id in report[key]
+            ]
+
+    output_md = (
+        '# Multi-SWE-bench Report\n'
+        'This folder contains the evaluation results of the SWE-bench using the [official evaluation docker containerization](https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240627_docker/README.md#choosing-the-right-cache_level).\n\n'
+        '## Summary\n'
+        f'- total instances: {report["total_instances"]}\n'
+        f'- submitted instances: {report["submitted_instances"]}\n'
+        f'- completed instances: {report["completed_instances"]}\n'
+        f'- empty patch instances: {report["empty_patch_instances"]}\n'
+        f'- resolved instances: {report["resolved_instances"]}\n'
+        f'- unresolved instances: {report["unresolved_instances"]}\n'
+        f'- error instances: {report["error_instances"]}\n'
+    )
+
+    output_md += '\n## Resolved Instances\n'
+    # instance_id to status
+    for instance_id in report['resolved_ids']:
+        instance_id_to_status[instance_id]['resolved'] = True
+        output_md += (
+            f'- [{instance_id}](./eval_outputs/{instance_id}/run_instance.log)\n'
+        )
+
+    output_md += '\n## Unresolved Instances\n'
+    for instance_id in report['unresolved_ids']:
+        output_md += (
+            f'- [{instance_id}](./eval_outputs/{instance_id}/run_instance.log)\n'
+        )
+
+    output_md += '\n## Error Instances\n'
+    for instance_id in report['error_ids']:
+        instance_id_to_status[instance_id]['error_eval'] = True
+        output_md += (
+            f'- [{instance_id}](./eval_outputs/{instance_id}/run_instance.log)\n'
+        )
+
+    output_md += '\n## Empty Patch Instances\n'
+    for instance_id in report['empty_patch_ids']:
+        instance_id_to_status[instance_id]['empty_generation'] = True
+        output_md += (
+            f'- [{instance_id}](./eval_outputs/{instance_id}/run_instance.log)\n'
+        )
+
+    output_md += '\n## Incomplete Instances\n'
+    for instance_id in report['incomplete_ids']:
+        output_md += (
+            f'- [{instance_id}](./eval_outputs/{instance_id}/run_instance.log)\n'
+        )
+
+    with open(output_md_filepath, 'w') as f:
+        f.write(output_md)
+
+else:
+    print(
+        f'No report file found: Both {swebench_official_report_json} and {openhands_remote_report_jsonl} do not exist.'
+    )
+    exit()
+
+# Before backup and update, check if any changes would be made (unless --force is used)
+if not args.force:
+    needs_update = False
+    with open(args.input_file, 'r') as infile:
+        for line in tqdm(infile, desc='Checking for changes'):
+            data = json.loads(line)
+            instance_id = data['instance_id']
+            current_report = data.get('report', {})
+            new_report = instance_id_to_status[
+                instance_id
+            ]  # if no report, it's not resolved
+            if current_report != new_report:
+                needs_update = True
+                break
+
+    if not needs_update:
+        print('No updates detected. Skipping file update.')
+        exit()
+else:
+    print('Force flag enabled. Updating all reports regardless of changes.')
+
+# Backup and update the original file row by row
+if os.path.exists(args.input_file + '.bak'):
+    if args.overwrite_backup:
+        print(
+            'Existing backup file found. Overwriting automatically due to --overwrite-backup flag.'
+        )
+        os.remove(args.input_file + '.bak')
+    else:
+        conf = input('Existing backup file found. Do you want to overwrite it? (y/n)')
+        if conf != 'y':
+            exit()
+        os.remove(args.input_file + '.bak')
+
+os.rename(args.input_file, args.input_file + '.bak')
+
+# Process and write file row by row
+with (
+    open(args.input_file + '.bak', 'r') as infile,
+    open(args.input_file, 'w') as outfile,
+):
+    for line in tqdm(infile, desc='Updating output file'):
+        data = json.loads(line)
+        instance_id = data['instance_id']
+        data['report'] = instance_id_to_status[instance_id]
+        outfile.write(json.dumps(data) + '\n')
@@ -0,0 +1,146 @@
+#!/bin/bash
+
+# NOTE: this script is for rolling out the Multi-SWE-Gym dataset for **TRAINING**
+# For more information, please refer to
+# 1. the Github Repo: https://github.com/SWE-Gym/SWE-Gym
+# 2. the paper: https://arxiv.org/abs/2412.21139
+
+MODEL=$1  # eg your llm config name in config.toml (eg: "llm.claude-3-5-sonnet-20241022-t05")
+EXP_NAME=$2 # "train-t05"
+EVAL_DATASET=$3  # path to original dataset (jsonl file)
+N_WORKERS=${4:-64}
+N_RUNS=${5:-1}
+
+export EXP_NAME=$EXP_NAME
+# use 2x resources for rollout since some codebases are pretty resource-intensive
+export DEFAULT_RUNTIME_RESOURCE_FACTOR=2
+echo "MODEL: $MODEL"
+echo "EXP_NAME: $EXP_NAME"
+echo "EVAL_DATASET: $EVAL_DATASET"
+# Generate DATASET path by adding _with_runtime_ before .jsonl extension
+DATASET="${EVAL_DATASET%.jsonl}_with_runtime_.jsonl"  # path to converted dataset
+
+# Create the converted dataset file
+echo "Creating converted dataset at: $DATASET"
+poetry run python ./evaluation/benchmarks/multi_swe_bench/scripts/data/data_change.py --input "$EVAL_DATASET" --output "$DATASET"
+
+SPLIT="train"
+export LANGUAGE=java
+
+if [ -z "$ALLHANDS_API_KEY" ] || [ "$RUNTIME" != "remote" ]; then
+    echo "ALLHANDS_API_KEY is not set or RUNTIME is not set to remote. Will rollout and evaluate locally using Docker. WARNING: A large value of N_WORKERS will result in a large number of Docker containers being spun up and may crash your machine."
+    export RUNTIME=docker
+else
+    echo "ALLHANDS_API_KEY is set and RUNTIME is set to remote. Continuing rollout and evaluation with remote runtime..."
+    export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev"
+fi
+
+#EVAL_LIMIT=3000
+MAX_ITER=100
+
+
+# ===== Run inference =====
+source "evaluation/utils/version_control.sh"
+get_openhands_version
+
+echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
+echo "MODEL_CONFIG: $MODEL_CONFIG"
+echo "DATASET: $DATASET"
+echo "EVAL_DOCKER_IMAGE_PREFIX: $EVAL_DOCKER_IMAGE_PREFIX"
+
+# Default to NOT use Hint
+export USE_INSTANCE_IMAGE=true
+export USE_HINT_TEXT=false
+export RUN_WITH_BROWSING=false
+echo "USE_HINT_TEXT: $USE_HINT_TEXT"
+EVAL_NOTE="$OPENHANDS_VERSION-no-hint-$EXP_NAME"
+
+function run_eval() {
+  local eval_note=$1
+  export LANGUAGE=java
+  echo "About to run command"
+  COMMAND="EVAL_DOCKER_IMAGE_PREFIX=$EVAL_DOCKER_IMAGE_PREFIX; LANGUAGE=java;
+    poetry run python evaluation/benchmarks/multi_swe_bench/run_infer.py \
+    --agent-cls CodeActAgent \
+    --llm-config $MODEL \
+    --max-iterations $MAX_ITER \
+    --eval-num-workers $N_WORKERS \
+    --eval-note $eval_note \
+    --dataset $DATASET \
+    --split $SPLIT"
+
+  echo "Running command: $COMMAND"
+  if [ -n "$EVAL_LIMIT" ]; then
+    echo "EVAL_LIMIT: $EVAL_LIMIT"
+    COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
+  fi
+
+  # Run the command
+  eval $COMMAND
+}
+
+for run_idx in $(seq 1 $N_RUNS); do
+
+    while true; do
+        echo "### Running inference... ###"
+        unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
+        current_eval_note="$EVAL_NOTE-run_$run_idx"
+        echo "EVAL_NOTE: $current_eval_note"
+        echo "DATASET command: $DATASET"
+        #INFER_OUTPUT=$(run_eval $current_eval_note)
+        INFER_OUTPUT=$(run_eval $current_eval_note | tee /dev/stderr)
+        INFER_STATUS=$?  # Capture the exit status of run_infer.sh
+        echo "INFER_STATUS: $INFER_STATUS"
+
+        echo "### Cleaning up remote runtime... ###"
+        ./evaluation/utils/scripts/cleanup_remote_runtime.sh
+
+        if [ $INFER_STATUS -eq 0 ]; then
+            echo "### Inference completed successfully. ###"
+            break
+        else
+            echo "### Inference failed with exit code $INFER_STATUS. Retrying... ###"
+        fi
+    done
+
+    # Extract the output directory using the special delimiters
+    OUTPUT_FILE=$(echo "$INFER_OUTPUT" | grep -o '### OUTPUT FILE:.* ###' | sed 's/### OUTPUT FILE: \(.*\) ###/\1/')
+    echo "Got OUTPUT_FILE: $OUTPUT_FILE"
+
+    while true; do
+        echo "### Evaluating on $OUTPUT_FILE ... ###"
+        OUTPUT_CONFIG_FILE="${OUTPUT_FILE%.jsonl}_config.json"
+        export EVAL_SKIP_BUILD_ERRORS=true
+        pip install multi-swe-bench --quiet --disable-pip-version-check > /dev/null 2>&1
+        COMMAND="poetry run python ./evaluation/benchmarks/multi_swe_bench/scripts/eval/update_multi_swe_bench_config.py --input $OUTPUT_FILE --output $OUTPUT_CONFIG_FILE --dataset $EVAL_DATASET;
+        python -m multi_swe_bench.harness.run_evaluation --config $OUTPUT_CONFIG_FILE
+        "
+
+        if [ -n "$EVAL_LIMIT" ]; then
+        echo "EVAL_LIMIT: $EVAL_LIMIT"
+        COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
+        fi
+        echo "Running command: $COMMAND"
+        # Run the command
+        eval $COMMAND
+        EVAL_STATUS=$?
+        if [ $EVAL_STATUS -eq 0 ]; then
+            echo "### Evaluation completed successfully. ###"
+            break
+        else
+            echo "### Evaluation failed with exit code $EVAL_STATUS. Retrying... ###"
+        fi
+
+        ./evaluation/utils/scripts/cleanup_remote_runtime.sh
+    done
+
+    # update the output with evaluation results
+    echo "### Updating the output with evaluation results... ###"
+    poetry run python evaluation/benchmarks/multi_swe_bench/scripts/eval/update_output_with_eval.py $OUTPUT_FILE
+
+    echo "### Combining the final completions... ###"
+    poetry run python evaluation/benchmarks/multi_swe_bench/scripts/eval/combine_final_completions.py $OUTPUT_FILE
+
+    echo "### DONE for run $run_idx! ###"
+    echo "You can find the final output at $(dirname $OUTPUT_FILE)/$FINAL_OUTPUT_FILE"
+done
@@ -47,8 +47,8 @@ if [ -z "$DATASET" ]; then
 fi

 if [ -z "$LANGUAGE" ]; then
-  echo "LANUGUAGE not specified, use default python"
-  LANGUAGE="python"
+  echo "LANGUAGE not specified, use default python"
+  LANGUAGE="java"
 fi

 if [ -z "$SPLIT" ]; then
@@ -69,10 +69,10 @@ fi

 if [ -z "$EVAL_DOCKER_IMAGE_PREFIX" ]; then
  if [ "$LANGUAGE" = "python" ]; then
-  echo "EVAL_DOCKER_IMAGE_PREFIX is docker.io/xingyaoww/ as default as LANUGUAGE is python"
+  echo "EVAL_DOCKER_IMAGE_PREFIX is docker.io/xingyaoww/ as default as LANGUAGE is python"
    EVAL_DOCKER_IMAGE_PREFIX="docker.io/xingyaoww/"
  elif [ "$LANGUAGE" = "java" ]; then
-  echo "EVAL_DOCKER_IMAGE_PREFIX is java_verified as LANUGUAGE is java"
+  echo "EVAL_DOCKER_IMAGE_PREFIX is empty as LANGUAGE is java"
    EVAL_DOCKER_IMAGE_PREFIX=""
  fi
 fi
@@ -0,0 +1,344 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "import pandas as pd\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "tqdm.pandas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 1. Load raw data and convert to training data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import gzip\n",
+    "import json\n",
+    "\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "FILE_PATHS = [\n",
+    "    'YOURPATH-no-hint-train-t05-run_1/output.with_completions.jsonl.gz',\n",
+    "    'YOURPATH-no-hint-train-t05-run_2/output.with_completions.jsonl.gz',\n",
+    "]\n",
+    "\n",
+    "# More memory efficient for large files\n",
+    "# Initialize lists to store the data\n",
+    "data = []\n",
+    "\n",
+    "\n",
+    "# Read file line by line\n",
+    "for FILE_PATH in FILE_PATHS:\n",
+    "    with gzip.open(FILE_PATH, 'rb') as f:  # Use 'rb' for gzipped files\n",
+    "        for i, line in tqdm(\n",
+    "            enumerate(f), desc=f'Processing {FILE_PATH.split(\"/\")[-1]}'\n",
+    "        ):\n",
+    "            # Parse only the fields we need\n",
+    "            raw_data = json.loads(line)\n",
+    "            data.append(\n",
+    "                {\n",
+    "                    'resolved': raw_data['report']['resolved'],\n",
+    "                    'messages': raw_data['raw_completions']['messages']\n",
+    "                    if raw_data['raw_completions'] is not None\n",
+    "                    else None,\n",
+    "                    'git_patch': raw_data['test_result'].get('git_patch', ''),\n",
+    "                    'tools': raw_data['raw_completions']['tools']\n",
+    "                    if raw_data['raw_completions'] is not None\n",
+    "                    and 'tools' in raw_data['raw_completions']\n",
+    "                    else None,\n",
+    "                }\n",
+    "            )\n",
+    "\n",
+    "# Convert to DataFrame after collecting all data\n",
+    "df = pd.DataFrame(data)\n",
+    "print(f'#total amount of data={len(df)}')\n",
+    "df = df[~df['messages'].isna()]\n",
+    "print(f'#total amount of data after removing nan={len(df)}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Filter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def _contains_multiple_tool_calls(messages: list[dict]) -> bool:\n",
+    "    return any(\n",
+    "        message.get('tool_calls') and len(message['tool_calls']) > 1\n",
+    "        for message in messages\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "df['contains_multiple_tool_calls'] = df['messages'].apply(_contains_multiple_tool_calls)\n",
+    "display(df.groupby(['contains_multiple_tool_calls'])['resolved'].sum())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import copy\n",
+    "\n",
+    "# Convert function calling messages to non-function calling messages\n",
+    "from openhands.llm.fn_call_converter import (\n",
+    "    FunctionCallConversionError,\n",
+    "    convert_fncall_messages_to_non_fncall_messages,\n",
+    "    convert_from_multiple_tool_calls_to_single_tool_call_messages,\n",
+    ")\n",
+    "\n",
+    "total_failed = 0\n",
+    "\n",
+    "\n",
+    "def _convert_messages(messages: list[dict], tools: list[dict]) -> list[dict]:\n",
+    "    global total_failed\n",
+    "    message_copy = copy.deepcopy(messages)\n",
+    "    for message in message_copy:\n",
+    "        if message['content'] is None:\n",
+    "            message['content'] = ''\n",
+    "    try:\n",
+    "        return convert_fncall_messages_to_non_fncall_messages(\n",
+    "            message_copy, tools, add_in_context_learning_example=False\n",
+    "        )\n",
+    "    except FunctionCallConversionError:\n",
+    "        total_failed += 1\n",
+    "        # print(f'Failed to convert messages: {messages}\\nTools: {tools}')\n",
+    "        # traceback.print_exc()\n",
+    "        return None\n",
+    "\n",
+    "\n",
+    "df['converted_messages'] = df.apply(\n",
+    "    lambda row: convert_from_multiple_tool_calls_to_single_tool_call_messages(\n",
+    "        row['messages'], ignore_final_tool_result=True\n",
+    "    ),\n",
+    "    axis=1,\n",
+    ")\n",
+    "df['nonfncall_messages'] = df.apply(\n",
+    "    lambda row: _convert_messages(row['converted_messages'], row['tools']), axis=1\n",
+    ")\n",
+    "print('total nan', df['nonfncall_messages'].isna().sum())\n",
+    "df = df[~df['nonfncall_messages'].isna()]\n",
+    "print(df['nonfncall_messages'].iloc[0])\n",
+    "\n",
+    "print(f'Total failed: {total_failed}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tokenization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pandarallel import pandarallel\n",
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "os.environ['TOKENIZERS_PARALLELISM'] = 'false'\n",
+    "pandarallel.initialize(progress_bar=True, verbose=1, nb_workers=16)\n",
+    "tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-7B-Instruct')\n",
+    "\n",
+    "\n",
+    "def clean_messages(messages):\n",
+    "    clean = []\n",
+    "    for msg in messages:\n",
+    "        if not isinstance(msg, dict):\n",
+    "            continue\n",
+    "        role = msg.get('role')\n",
+    "        content = msg.get('content')\n",
+    "        if isinstance(content, str):\n",
+    "            text = content\n",
+    "        elif isinstance(content, dict):\n",
+    "            text = content.get('text')\n",
+    "        elif (\n",
+    "            isinstance(content, list)\n",
+    "            and len(content) == 1\n",
+    "            and isinstance(content[0], dict)\n",
+    "        ):\n",
+    "            text = content[0].get('text')\n",
+    "        else:\n",
+    "            print(f'Format not accepted {content}')\n",
+    "        clean.append({'role': role, 'content': text})\n",
+    "    return clean\n",
+    "\n",
+    "\n",
+    "# Step 1: Clean the messages\n",
+    "df['nonfncall_messages'] = df['nonfncall_messages'].apply(clean_messages)\n",
+    "\n",
+    "# Step 2: Compute token count\n",
+    "df['n_tokens'] = df['nonfncall_messages'].parallel_apply(\n",
+    "    lambda x: len(tokenizer.apply_chat_template(x))\n",
+    ")\n",
+    "\n",
+    "# print(df['nonfncall_messages'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f'BEFORE: #total={len(df)}')\n",
+    "df_selected = df[df['n_tokens'] < 131072]\n",
+    "print(f'AFTER(truncated to 128k): #total={len(df_selected)}')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_selected['n_tokens'].describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ecdf of n_tokens\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "\n",
+    "display(df.groupby(['resolved'])['n_tokens'].describe())\n",
+    "sns.ecdfplot(x='n_tokens', data=df, hue='resolved')\n",
+    "plt.show()\n",
+    "\n",
+    "print(f'#total={len(df)}')\n",
+    "df_selected = df[df['n_tokens'] < 131072]\n",
+    "print(f'#selected={len(df_selected)}')\n",
+    "display(df_selected.groupby(['resolved'])['n_tokens'].describe())\n",
+    "sns.ecdfplot(x='n_tokens', data=df_selected, hue='resolved')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_selected[~df_selected['resolved']]['n_tokens'].describe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_selected['resolved'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_selected.groupby(['resolved'])['n_tokens'].describe()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Save Resolved Messages for SFT"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Flatten messages and change format to {\"content\": \"\", \"role\": \"\"}\n",
+    "df_selected[df_selected['resolved']][['nonfncall_messages']].rename(\n",
+    "    columns={'nonfncall_messages': 'messages'}\n",
+    ").to_json(\n",
+    "    os.path.join(\n",
+    "        'PATH_TO_FILE',\n",
+    "        f'policy_traj_128k_swegym_{df_selected[\"resolved\"].value_counts()[True]}i.jsonl',\n",
+    "    ),\n",
+    "    lines=True,\n",
+    "    orient='records',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
@@ -0,0 +1,81 @@
+# SWE-Perf Evaluation
+
+This folder contains the OpenHands inference generation of the [SWE-Perf benchmark](https://swe-perf.github.io/) ([paper](https://arxiv.org/pdf/2507.12415v1)).
+
+The evaluation consists of three steps:
+
+1. Environment setup: [install python environment](../../README.md#development-environment) and [configure LLM config](../../README.md#configure-openhands-and-your-llm).
+2. [Run inference](#running-inference-locally-with-docker): Generate a edit patch for each Github issue
+3. [Evaluate patches](#evaluate-generated-patches)
+
+## Setup Environment and LLM Configuration
+
+Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
+
+## Running inference Locally with Docker
+
+Make sure your Docker daemon is running, and you have ample disk space (at least 200-500GB, depends on the SWE-PErf set you are running on) for the instance-level docker image.
+
+When the `run_infer.sh` script is started, it will automatically pull the relevant SWE-Perf images.
+For example, for instance ID `scikit-learn_scikit-learn-11674`, it will try to pull our pre-build docker image `betty1202/sweb.eval.x86_64.scikit-learn_s_scikit-learn-11674` from DockerHub.
+This image will be used create an OpenHands runtime image where the agent will operate on.
+
+```bash
+./evaluation/benchmarks/swe_perf/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split] [n_runs] [mode]
+
+# Example
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 500 100 1 SWE-Perf/SWE-Perf test
+```
+
+where `model_config` is mandatory, and the rest are optional.
+
+- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
+LLM settings, as defined in your `config.toml`.
+- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version you would
+like to evaluate. It could also be a release tag like `0.6.2`.
+- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
+to `CodeActAgent`.
+- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By
+default, the script evaluates the entire SWE-Perf test set (140 issues). Note:
+in order to use `eval_limit`, you must also set `agent`.
+- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By
+default, it is set to 100.
+- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
+default, it is set to 1.
+- `dataset`, a huggingface dataset name. e.g. `SWE-Perf/SWE-Perf`, specifies which dataset to evaluate on.
+- `dataset_split`, split for the huggingface dataset. e.g., `test`, `dev`. Default to `test`.
+
+- `n_runs`, e.g. `3`, is the number of times to run the evaluation. Default is 1.
+- `mode`, e.g. `swt`, `swt-ci`, or `swe`, specifies the evaluation mode. Default is `swe`.
+
+> [!CAUTION]
+> Setting `num_workers` larger than 1 is not officially tested, YMMV.
+
+
+Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,
+
+then your command would be:
+
+```bash
+./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
+```
+
+## Evaluate Generated Patches
+
+
+To evaluate the generated patch, follow these steps:
+
+### 1. Convert output to the evaluation standard format
+Run the following command:
+```bash
+python -m evaluation.benchmarks.swe_perf.format_conversion \
+    --input_path [input_path] \
+    --output_path [output_path]
+```
+
+* `input_path`: Path to the raw generated patch file.
+* `output_path`: Path where the converted file will be saved.
+
+### 2. Run the SWE-Perf benchmark official evaluation
+
+Once the output is converted, use the [official SWE-Perf benchmark evaluation](https://github.com/SWE-Perf/SWE-Perf/tree/main/evaluation) to evaluate it.
@@ -0,0 +1,52 @@
+"""
+Utilities for handling binary files and patch generation in SWE-Perf evaluation.
+"""
+
+
+def remove_binary_diffs(patch_text):
+    """
+    Remove binary file diffs from a git patch.
+
+    Args:
+        patch_text (str): The git patch text
+
+    Returns:
+        str: The cleaned patch text with binary diffs removed
+    """
+    lines = patch_text.splitlines()
+    cleaned_lines = []
+    block = []
+    is_binary_block = False
+
+    for line in lines:
+        if line.startswith('diff --git '):
+            if block and not is_binary_block:
+                cleaned_lines.extend(block)
+            block = [line]
+            is_binary_block = False
+        elif 'Binary files' in line:
+            is_binary_block = True
+            block.append(line)
+        else:
+            block.append(line)
+
+    if block and not is_binary_block:
+        cleaned_lines.extend(block)
+    return '\n'.join(cleaned_lines)
+
+
+def remove_binary_files_from_git():
+    """
+    Generate a bash command to remove binary files from git staging.
+
+    Returns:
+        str: A bash command that removes binary files from git staging
+    """
+    return """
+    for file in $(git status --porcelain | grep -E "^(M| M|\\?\\?|A| A)" | cut -c4-); do
+        if [ -f "$file" ] && (file "$file" | grep -q "executable" || git check-attr binary "$file" | grep -q "binary: set"); then
+            git rm -f "$file" 2>/dev/null || rm -f "$file"
+            echo "Removed: $file"
+        fi
+    done
+    """.strip()
@@ -0,0 +1,45 @@
+import json
+import os
+from argparse import ArgumentParser
+
+parser = ArgumentParser()
+parser.add_argument('--input_path', type=str, help='Name of input path to JSON file.')
+parser.add_argument('--output_path', type=str, help='Name of output path to JSON file.')
+args = parser.parse_args()
+
+input_path = args.input_path
+output_path = args.output_path
+os.makedirs(output_path, exist_ok=True)
+
+
+def load_jsonl(file_path):
+    """Load JSONL file into a list of dictionaries."""
+    data = []
+    with open(file_path, 'r') as f:
+        for line in f:
+            data.append(json.loads(line))
+    return data
+
+
+dataset = load_jsonl(input_path)
+ooutput_dataset = []
+for data in dataset:
+    instance_id = data['instance_id']
+    model_name_or_path = 'openhands'
+    model_patch = (
+        data['test_result']['git_patch']
+        if 'test_result' in data and 'git_patch' in data['test_result']
+        else None
+    )
+    ooutput_dataset.append(
+        {
+            'instance_id': instance_id,
+            'model_name_or_path': model_name_or_path,
+            'model_patch': model_patch,
+        }
+    )
+
+with open(os.path.join(output_path, 'output.jsonl'), 'w') as f:
+    for item in ooutput_dataset:
+        json_line = json.dumps(item, ensure_ascii=False)
+        f.write(json_line + '\n')
@@ -0,0 +1,39 @@
+"""Mapping instance_id to resource_factor.
+
+Different instances may have different resource requirements.
+e.g., some instances may require more memory/CPU to run inference.
+This file tracks the resource requirements of different instances.
+"""
+
+import json
+import os
+
+from openhands.core.logger import openhands_logger as logger
+
+CUR_DIR = os.path.dirname(os.path.abspath(__file__))
+DEFAULT_RUNTIME_RESOURCE_FACTOR = int(
+    os.environ.get('DEFAULT_RUNTIME_RESOURCE_FACTOR', 1)
+)
+
+# dataset to resource mapping
+_global_resource_mapping: dict[str, dict[str, float]] = {}
+
+
+def get_resource_mapping(dataset_name: str) -> dict[str, float]:
+    if dataset_name not in _global_resource_mapping:
+        file_path = os.path.join(CUR_DIR, f'{dataset_name}.json')
+        if not os.path.exists(file_path):
+            logger.info(f'Resource mapping for {dataset_name} not found.')
+            return None
+
+        with open(file_path, 'r') as f:
+            _global_resource_mapping[dataset_name] = json.load(f)
+        logger.debug(f'Loaded resource mapping for {dataset_name}')
+    return _global_resource_mapping[dataset_name]
+
+
+def get_instance_resource_factor(dataset_name: str, instance_id: str) -> int:
+    resource_mapping = get_resource_mapping(dataset_name)
+    if resource_mapping is None:
+        return DEFAULT_RUNTIME_RESOURCE_FACTOR
+    return int(resource_mapping.get(instance_id, DEFAULT_RUNTIME_RESOURCE_FACTOR))
@@ -0,0 +1,842 @@
+# Based on https://github.com/logic-star-ai/swt-bench/blob/master/src/constants.py
+
+# Constants - Installation Specifications
+MAP_VERSION_TO_INSTALL_SKLEARN = {
+    k: {
+        'python': '3.6',
+        'packages': 'numpy scipy cython pytest pandas matplotlib',
+        'install': 'python -m pip install -v --no-use-pep517 --no-build-isolation -e .',
+        'pip_packages': [
+            'cython',
+            'numpy==1.19.2',
+            'setuptools',
+            'scipy==1.5.2',
+        ],
+    }
+    for k in ['0.20', '0.21', '0.22']
+}
+MAP_VERSION_TO_INSTALL_SKLEARN.update(
+    {
+        k: {
+            'python': '3.9',
+            'packages': "'numpy==1.19.2' 'scipy==1.5.2' 'cython==3.0.10' pytest 'pandas<2.0.0' 'matplotlib<3.9.0' setuptools pytest joblib threadpoolctl",
+            'install': 'python -m pip install -v --no-use-pep517 --no-build-isolation -e .',
+            'pip_packages': ['cython', 'setuptools', 'numpy', 'scipy'],
+        }
+        for k in ['1.3', '1.4']
+    }
+)
+MAP_VERSION_TO_INSTALL_FLASK = {
+    '2.0': {
+        'python': '3.9',
+        'packages': 'requirements.txt',
+        'install': 'python -m pip install -e .',
+        'pip_packages': [
+            'setuptools==70.0.0',
+            'Werkzeug==2.3.7',
+            'Jinja2==3.0.1',
+            'itsdangerous==2.1.2',
+            'click==8.0.1',
+            'MarkupSafe==2.1.3',
+        ],
+    },
+    '2.1': {
+        'python': '3.10',
+        'packages': 'requirements.txt',
+        'install': 'python -m pip install -e .',
+        'pip_packages': [
+            'click==8.1.3',
+            'itsdangerous==2.1.2',
+            'Jinja2==3.1.2',
+            'MarkupSafe==2.1.1',
+            'Werkzeug==2.3.7',
+        ],
+    },
+}
+MAP_VERSION_TO_INSTALL_FLASK.update(
+    {
+        k: {
+            'python': '3.11',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'pip_packages': [
+                'click==8.1.3',
+                'itsdangerous==2.1.2',
+                'Jinja2==3.1.2',
+                'MarkupSafe==2.1.1',
+                'Werkzeug==2.3.7',
+            ],
+        }
+        for k in ['2.2', '2.3']
+    }
+)
+MAP_VERSION_TO_INSTALL_DJANGO = {
+    k: {
+        'python': '3.5',
+        'packages': 'requirements.txt',
+        'pre_install': [
+            'apt-get update && apt-get install -y locales',
+            "echo 'en_US UTF-8' > /etc/locale.gen",
+            'locale-gen en_US.UTF-8',
+        ],
+        'install': 'python setup.py install',
+        'pip_packages': ['setuptools'],
+        'eval_commands': [
+            'export LANG=en_US.UTF-8',
+            'export LC_ALL=en_US.UTF-8',
+            'export PYTHONIOENCODING=utf8',
+            'export LANGUAGE=en_US:en',
+        ],
+    }
+    for k in ['1.7', '1.8', '1.9', '1.10', '1.11', '2.0', '2.1', '2.2']
+}
+MAP_VERSION_TO_INSTALL_DJANGO.update(
+    {
+        k: {'python': '3.5', 'install': 'python setup.py install'}
+        for k in ['1.4', '1.5', '1.6']
+    }
+)
+MAP_VERSION_TO_INSTALL_DJANGO.update(
+    {
+        k: {
+            'python': '3.6',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'eval_commands': [
+                "sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen",
+                'export LANG=en_US.UTF-8',
+                'export LANGUAGE=en_US:en',
+                'export LC_ALL=en_US.UTF-8',
+            ],
+        }
+        for k in ['3.0', '3.1', '3.2']
+    }
+)
+MAP_VERSION_TO_INSTALL_DJANGO.update(
+    {
+        k: {
+            'python': '3.8',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+        }
+        for k in ['4.0']
+    }
+)
+MAP_VERSION_TO_INSTALL_DJANGO.update(
+    {
+        k: {
+            'python': '3.9',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+        }
+        for k in ['4.1', '4.2']
+    }
+)
+MAP_VERSION_TO_INSTALL_DJANGO.update(
+    {
+        k: {
+            'python': '3.11',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+        }
+        for k in ['5.0']
+    }
+)
+MAP_VERSION_TO_INSTALL_REQUESTS = {
+    k: {'python': '3.9', 'packages': 'pytest', 'install': 'python -m pip install .'}
+    for k in ['0.7', '0.8', '0.9', '0.11', '0.13', '0.14', '1.1', '1.2', '2.0', '2.2']
+    + ['2.3', '2.4', '2.5', '2.7', '2.8', '2.9', '2.10', '2.11', '2.12', '2.17']
+    + ['2.18', '2.19', '2.22', '2.26', '2.25', '2.27', '3.0']
+}
+MAP_VERSION_TO_INSTALL_SEABORN = {
+    k: {
+        'python': '3.9',
+        'install': 'python -m pip install -e .',
+        'pip_packages': [
+            'contourpy==1.1.0',
+            'cycler==0.11.0',
+            'fonttools==4.42.1',
+            'importlib-resources==6.0.1',
+            'kiwisolver==1.4.5',
+            'matplotlib==3.7.2',
+            'numpy==1.25.2',
+            'packaging==23.1',
+            'pandas==1.3.5',  # 2.0.3
+            'pillow==10.0.0',
+            'pyparsing==3.0.9',
+            'pytest',
+            'python-dateutil==2.8.2',
+            'pytz==2023.3.post1',
+            'scipy==1.11.2',
+            'six==1.16.0',
+            'tzdata==2023.1',
+            'zipp==3.16.2',
+        ],
+    }
+    for k in ['0.11']
+}
+MAP_VERSION_TO_INSTALL_SEABORN.update(
+    {
+        k: {
+            'python': '3.9',
+            'install': 'python -m pip install -e .[dev]',
+            'pip_packages': [
+                'contourpy==1.1.0',
+                'cycler==0.11.0',
+                'fonttools==4.42.1',
+                'importlib-resources==6.0.1',
+                'kiwisolver==1.4.5',
+                'matplotlib==3.7.2',
+                'numpy==1.25.2',
+                'packaging==23.1',
+                'pandas==2.0.0',
+                'pillow==10.0.0',
+                'pyparsing==3.0.9',
+                'pytest',
+                'python-dateutil==2.8.2',
+                'pytz==2023.3.post1',
+                'scipy==1.11.2',
+                'six==1.16.0',
+                'tzdata==2023.1',
+                'zipp==3.16.2',
+            ],
+        }
+        for k in ['0.12', '0.13']
+    }
+)
+MAP_VERSION_TO_INSTALL_PYTEST = {
+    k: {'python': '3.9', 'install': 'python -m pip install -e .'}
+    for k in [
+        '4.4',
+        '4.5',
+        '4.6',
+        '5.0',
+        '5.1',
+        '5.2',
+        '5.3',
+        '5.4',
+        '6.0',
+        '6.2',
+        '6.3',
+        '7.0',
+        '7.1',
+        '7.2',
+        '7.4',
+        '8.0',
+    ]
+}
+MAP_VERSION_TO_INSTALL_PYTEST['4.4']['pip_packages'] = [
+    'atomicwrites==1.4.1',
+    'attrs==23.1.0',
+    'more-itertools==10.1.0',
+    'pluggy==0.13.1',
+    'py==1.11.0',
+    'setuptools==68.0.0',
+    'six==1.16.0',
+]
+MAP_VERSION_TO_INSTALL_PYTEST['4.5']['pip_packages'] = [
+    'atomicwrites==1.4.1',
+    'attrs==23.1.0',
+    'more-itertools==10.1.0',
+    'pluggy==0.11.0',
+    'py==1.11.0',
+    'setuptools==68.0.0',
+    'six==1.16.0',
+    'wcwidth==0.2.6',
+]
+MAP_VERSION_TO_INSTALL_PYTEST['4.6']['pip_packages'] = [
+    'atomicwrites==1.4.1',
+    'attrs==23.1.0',
+    'more-itertools==10.1.0',
+    'packaging==23.1',
+    'pluggy==0.13.1',
+    'py==1.11.0',
+    'six==1.16.0',
+    'wcwidth==0.2.6',
+]
+for k in ['5.0', '5.1', '5.2']:
+    MAP_VERSION_TO_INSTALL_PYTEST[k]['pip_packages'] = [
+        'atomicwrites==1.4.1',
+        'attrs==23.1.0',
+        'more-itertools==10.1.0',
+        'packaging==23.1',
+        'pluggy==0.13.1',
+        'py==1.11.0',
+        'wcwidth==0.2.6',
+    ]
+MAP_VERSION_TO_INSTALL_PYTEST['5.3']['pip_packages'] = [
+    'attrs==23.1.0',
+    'more-itertools==10.1.0',
+    'packaging==23.1',
+    'pluggy==0.13.1',
+    'py==1.11.0',
+    'wcwidth==0.2.6',
+]
+MAP_VERSION_TO_INSTALL_PYTEST['5.4']['pip_packages'] = [
+    'py==1.11.0',
+    'packaging==23.1',
+    'attrs==23.1.0',
+    'more-itertools==10.1.0',
+    'pluggy==0.13.1',
+]
+MAP_VERSION_TO_INSTALL_PYTEST['6.0']['pip_packages'] = [
+    'attrs==23.1.0',
+    'iniconfig==2.0.0',
+    'more-itertools==10.1.0',
+    'packaging==23.1',
+    'pluggy==0.13.1',
+    'py==1.11.0',
+    'toml==0.10.2',
+]
+for k in ['6.2', '6.3']:
+    MAP_VERSION_TO_INSTALL_PYTEST[k]['pip_packages'] = [
+        'attrs==23.1.0',
+        'iniconfig==2.0.0',
+        'packaging==23.1',
+        'pluggy==0.13.1',
+        'py==1.11.0',
+        'toml==0.10.2',
+    ]
+MAP_VERSION_TO_INSTALL_PYTEST['7.0']['pip_packages'] = [
+    'attrs==23.1.0',
+    'iniconfig==2.0.0',
+    'packaging==23.1',
+    'pluggy==0.13.1',
+    'py==1.11.0',
+]
+for k in ['7.1', '7.2']:
+    MAP_VERSION_TO_INSTALL_PYTEST[k]['pip_packages'] = [
+        'attrs==23.1.0',
+        'iniconfig==2.0.0',
+        'packaging==23.1',
+        'pluggy==0.13.1',
+        'py==1.11.0',
+        'tomli==2.0.1',
+    ]
+MAP_VERSION_TO_INSTALL_PYTEST['7.4']['pip_packages'] = [
+    'iniconfig==2.0.0',
+    'packaging==23.1',
+    'pluggy==1.3.0',
+    'exceptiongroup==1.1.3',
+    'tomli==2.0.1',
+]
+MAP_VERSION_TO_INSTALL_PYTEST['8.0']['pip_packages'] = [
+    'iniconfig==2.0.0',
+    'packaging==23.1',
+    'pluggy==1.3.0',
+    'exceptiongroup==1.1.3',
+    'tomli==2.0.1',
+]
+MAP_VERSION_TO_INSTALL_MATPLOTLIB = {
+    k: {
+        'python': '3.11',
+        'packages': 'environment.yml',
+        'install': 'python -m pip install -e .',
+        'pre_install': [
+            'apt-get -y update && apt-get -y upgrade && apt-get install -y imagemagick ffmpeg texlive texlive-latex-extra texlive-fonts-recommended texlive-xetex texlive-luatex cm-super dvipng'
+        ],
+        'pip_packages': [
+            'contourpy==1.1.0',
+            'cycler==0.11.0',
+            'fonttools==4.42.1',
+            'ghostscript',
+            'kiwisolver==1.4.5',
+            'numpy==1.25.2',
+            'packaging==23.1',
+            'pillow==10.0.0',
+            'pikepdf',
+            'pyparsing==3.0.9',
+            'python-dateutil==2.8.2',
+            'six==1.16.0',
+            'setuptools==68.1.2',
+            'setuptools-scm==7.1.0',
+            'typing-extensions==4.7.1',
+        ],
+    }
+    for k in ['3.5', '3.6', '3.7']
+}
+MAP_VERSION_TO_INSTALL_MATPLOTLIB.update(
+    {
+        k: {
+            'python': '3.8',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'pre_install': [
+                'apt-get -y update && apt-get -y upgrade && apt-get install -y imagemagick ffmpeg libfreetype6-dev pkg-config texlive texlive-latex-extra texlive-fonts-recommended texlive-xetex texlive-luatex cm-super'
+            ],
+            'pip_packages': ['pytest', 'ipython'],
+        }
+        for k in ['3.1', '3.2', '3.3', '3.4']
+    }
+)
+MAP_VERSION_TO_INSTALL_MATPLOTLIB.update(
+    {
+        k: {
+            'python': '3.7',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'pre_install': [
+                'apt-get -y update && apt-get -y upgrade && apt-get install -y imagemagick ffmpeg libfreetype6-dev pkg-config'
+            ],
+            'pip_packages': ['pytest'],
+        }
+        for k in ['3.0']
+    }
+)
+MAP_VERSION_TO_INSTALL_MATPLOTLIB.update(
+    {
+        k: {
+            'python': '3.5',
+            'install': 'python setup.py build; python setup.py install',
+            'pre_install': [
+                'apt-get -y update && apt-get -y upgrade && && apt-get install -y imagemagick ffmpeg'
+            ],
+            'pip_packages': ['pytest'],
+            'execute_test_as_nonroot': True,
+        }
+        for k in ['2.0', '2.1', '2.2', '1.0', '1.1', '1.2', '1.3', '1.4', '1.5']
+    }
+)
+MAP_VERSION_TO_INSTALL_SPHINX = {
+    k: {
+        'python': '3.9',
+        'pip_packages': ['tox==4.16.0', 'tox-current-env==0.0.11'],
+        'install': 'python -m pip install -e .[test]',
+        'pre_install': ["sed -i 's/pytest/pytest -rA/' tox.ini"],
+    }
+    for k in ['1.5', '1.6', '1.7', '1.8', '2.0', '2.1', '2.2', '2.3', '2.4', '3.0']
+    + ['3.1', '3.2', '3.3', '3.4', '3.5', '4.0', '4.1', '4.2', '4.3', '4.4']
+    + ['4.5', '5.0', '5.1', '5.2', '5.3', '6.0', '6.2', '7.0', '7.1', '7.2']
+}
+for k in ['3.0', '3.1', '3.2', '3.3', '3.4', '3.5', '4.0', '4.1', '4.2', '4.3', '4.4']:
+    MAP_VERSION_TO_INSTALL_SPHINX[k]['pre_install'].extend(
+        [
+            "sed -i 's/Jinja2>=2.3/Jinja2<3.0/' setup.py",
+            "sed -i 's/sphinxcontrib-applehelp/sphinxcontrib-applehelp<=1.0.7/' setup.py",
+            "sed -i 's/sphinxcontrib-devhelp/sphinxcontrib-devhelp<=1.0.5/' setup.py",
+            "sed -i 's/sphinxcontrib-qthelp/sphinxcontrib-qthelp<=1.0.6/' setup.py",
+            "sed -i 's/alabaster>=0.7,<0.8/alabaster>=0.7,<0.7.12/' setup.py",
+            "sed -i \"s/'packaging',/'packaging', 'markupsafe<=2.0.1',/\" setup.py",
+        ]
+    )
+    if k in ['4.2', '4.3', '4.4']:
+        MAP_VERSION_TO_INSTALL_SPHINX[k]['pre_install'].extend(
+            [
+                "sed -i 's/sphinxcontrib-htmlhelp>=2.0.0/sphinxcontrib-htmlhelp>=2.0.0,<=2.0.4/' setup.py",
+                "sed -i 's/sphinxcontrib-serializinghtml>=1.1.5/sphinxcontrib-serializinghtml>=1.1.5,<=1.1.9/' setup.py",
+            ]
+        )
+    elif k == '4.1':
+        MAP_VERSION_TO_INSTALL_SPHINX[k]['pre_install'].extend(
+            [
+                (
+                    "grep -q 'sphinxcontrib-htmlhelp>=2.0.0' setup.py && "
+                    "sed -i 's/sphinxcontrib-htmlhelp>=2.0.0/sphinxcontrib-htmlhelp>=2.0.0,<=2.0.4/' setup.py || "
+                    "sed -i 's/sphinxcontrib-htmlhelp/sphinxcontrib-htmlhelp<=2.0.4/' setup.py"
+                ),
+                (
+                    "grep -q 'sphinxcontrib-serializinghtml>=1.1.5' setup.py && "
+                    "sed -i 's/sphinxcontrib-serializinghtml>=1.1.5/sphinxcontrib-serializinghtml>=1.1.5,<=1.1.9/' setup.py || "
+                    "sed -i 's/sphinxcontrib-serializinghtml/sphinxcontrib-serializinghtml<=1.1.9/' setup.py"
+                ),
+            ]
+        )
+    else:
+        MAP_VERSION_TO_INSTALL_SPHINX[k]['pre_install'].extend(
+            [
+                "sed -i 's/sphinxcontrib-htmlhelp/sphinxcontrib-htmlhelp<=2.0.4/' setup.py",
+                "sed -i 's/sphinxcontrib-serializinghtml/sphinxcontrib-serializinghtml<=1.1.9/' setup.py",
+            ]
+        )
+MAP_VERSION_TO_INSTALL_SPHINX['7.2']['pre_install'] += [
+    'apt-get update && apt-get install -y graphviz'
+]
+MAP_VERSION_TO_INSTALL_ASTROPY = {
+    k: {
+        'python': '3.9',
+        'install': 'python -m pip install -e .[test] --verbose',
+        'pip_packages': [
+            'attrs==23.1.0',
+            'exceptiongroup==1.1.3',
+            'execnet==2.0.2',
+            'hypothesis==6.82.6',
+            'iniconfig==2.0.0',
+            'numpy==1.25.2',
+            'packaging==23.1',
+            'pluggy==1.3.0',
+            'psutil==5.9.5',
+            'pyerfa==2.0.0.3',
+            'pytest-arraydiff==0.5.0',
+            'pytest-astropy-header==0.2.2',
+            'pytest-astropy==0.10.0',
+            'pytest-cov==4.1.0',
+            'pytest-doctestplus==1.0.0',
+            'pytest-filter-subpackage==0.1.2',
+            'pytest-mock==3.11.1',
+            'pytest-openfiles==0.5.0',
+            'pytest-remotedata==0.4.0',
+            'pytest-xdist==3.3.1',
+            'pytest==7.4.0',
+            'PyYAML==6.0.1',
+            'setuptools==68.0.0',
+            'sortedcontainers==2.4.0',
+            'tomli==2.0.1',
+        ],
+    }
+    for k in ['0.1', '0.2', '0.3', '0.4', '1.1', '1.2', '1.3', '3.0', '3.1', '3.2']
+    + ['4.1', '4.2', '4.3', '5.0', '5.1', '5.2']
+}
+for k in ['4.1', '4.2', '4.3', '5.0', '5.1', '5.2']:
+    MAP_VERSION_TO_INSTALL_ASTROPY[k]['pre_install'] = [
+        'sed -i \'s/requires = \\["setuptools",/requires = \\["setuptools==68.0.0",/\' pyproject.toml'
+    ]
+MAP_VERSION_TO_INSTALL_SYMPY = {
+    k: {
+        'python': '3.9',
+        'packages': 'mpmath flake8',
+        'pip_packages': ['mpmath==1.3.0', 'flake8-comprehensions'],
+        'install': 'python -m pip install -e .',
+    }
+    for k in ['0.7', '1.0', '1.1', '1.10', '1.11', '1.12', '1.2', '1.4', '1.5', '1.6']
+    + ['1.7', '1.8', '1.9']
+}
+MAP_VERSION_TO_INSTALL_SYMPY.update(
+    {
+        k: {
+            'python': '3.9',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'pip_packages': ['mpmath==1.3.0'],
+        }
+        for k in ['1.13']
+    }
+)
+MAP_VERSION_TO_INSTALL_PYLINT = {
+    k: {
+        'python': '3.9',
+        'packages': 'requirements.txt',
+        'install': 'python -m pip install -e .',
+    }
+    for k in [
+        '2.10',
+        '2.11',
+        '2.13',
+        '2.14',
+        '2.15',
+        '2.16',
+        '2.17',
+        '2.8',
+        '2.9',
+        '3.0',
+    ]
+}
+MAP_VERSION_TO_INSTALL_PYLINT['2.8']['pip_packages'] = ['pyenchant==3.2']
+MAP_VERSION_TO_INSTALL_PYLINT['2.8']['pre_install'] = [
+    'apt-get update && apt-get install -y libenchant-2-dev hunspell-en-us'
+]
+MAP_VERSION_TO_INSTALL_PYLINT.update(
+    {
+        k: {
+            **MAP_VERSION_TO_INSTALL_PYLINT[k],
+            'pip_packages': ['astroid==3.0.0a6', 'setuptools'],
+        }
+        for k in ['3.0']
+    }
+)
+
+MAP_VERSION_TO_INSTALL_XARRAY = {
+    k: {
+        'python': '3.10',
+        'packages': 'environment.yml',
+        'install': 'python -m pip install -e .',
+        'pip_packages': [
+            'numpy==1.23.0',
+            'packaging==23.1',
+            'pandas==1.5.3',
+            'pytest==7.4.0',
+            'python-dateutil==2.8.2',
+            'pytz==2023.3',
+            'six==1.16.0',
+            'scipy==1.11.1',
+            'setuptools==68.0.0',
+        ],
+        'no_use_env': True,
+    }
+    for k in ['0.12', '0.18', '0.19', '0.20', '2022.03', '2022.06', '2022.09']
+}
+
+MAP_VERSION_TO_INSTALL_SQLFLUFF = {
+    k: {
+        'python': '3.9',
+        'packages': 'requirements.txt',
+        'install': 'python -m pip install -e .',
+    }
+    for k in [
+        '0.10',
+        '0.11',
+        '0.12',
+        '0.13',
+        '0.4',
+        '0.5',
+        '0.6',
+        '0.8',
+        '0.9',
+        '1.0',
+        '1.1',
+        '1.2',
+        '1.3',
+        '1.4',
+        '2.0',
+        '2.1',
+        '2.2',
+    ]
+}
+MAP_VERSION_TO_INSTALL_DBT_CORE = {
+    k: {
+        'python': '3.9',
+        'packages': 'requirements.txt',
+        'install': 'python -m pip install -e .',
+    }
+    for k in [
+        '0.13',
+        '0.14',
+        '0.15',
+        '0.16',
+        '0.17',
+        '0.18',
+        '0.19',
+        '0.20',
+        '0.21',
+        '1.0',
+        '1.1',
+        '1.2',
+        '1.3',
+        '1.4',
+        '1.5',
+        '1.6',
+        '1.7',
+    ]
+}
+MAP_VERSION_TO_INSTALL_PYVISTA = {
+    k: {
+        'python': '3.9',
+        'install': 'python -m pip install -e .',
+        'pip_packages': ['pytest'],
+    }
+    for k in ['0.20', '0.21', '0.22', '0.23']
+}
+MAP_VERSION_TO_INSTALL_PYVISTA.update(
+    {
+        k: {
+            'python': '3.9',
+            'packages': 'requirements.txt',
+            'install': 'python -m pip install -e .',
+            'pip_packages': ['pytest'],
+        }
+        for k in [
+            '0.24',
+            '0.25',
+            '0.26',
+            '0.27',
+            '0.28',
+            '0.29',
+            '0.30',
+            '0.31',
+            '0.32',
+            '0.33',
+            '0.34',
+            '0.35',
+            '0.36',
+            '0.37',
+            '0.38',
+            '0.39',
+            '0.40',
+            '0.41',
+            '0.42',
+            '0.43',
+        ]
+    }
+)
+MAP_VERSION_TO_INSTALL_ASTROID = {
+    k: {
+        'python': '3.9',
+        'install': 'python -m pip install -e .',
+        'pip_packages': ['pytest'],
+    }
+    for k in [
+        '2.10',
+        '2.12',
+        '2.13',
+        '2.14',
+        '2.15',
+        '2.16',
+        '2.5',
+        '2.6',
+        '2.7',
+        '2.8',
+        '2.9',
+        '3.0',
+    ]
+}
+MAP_VERSION_TO_INSTALL_MARSHMALLOW = {
+    k: {
+        'python': '3.9',
+        'install': "python -m pip install -e '.[dev]'",
+    }
+    for k in [
+        '2.18',
+        '2.19',
+        '2.20',
+        '3.0',
+        '3.1',
+        '3.10',
+        '3.11',
+        '3.12',
+        '3.13',
+        '3.15',
+        '3.16',
+        '3.19',
+        '3.2',
+        '3.4',
+        '3.8',
+        '3.9',
+    ]
+}
+MAP_VERSION_TO_INSTALL_PVLIB = {
+    k: {
+        'python': '3.9',
+        'install': 'python -m pip install -e .[all]',
+        'packages': 'pandas scipy',
+        'pip_packages': ['jupyter', 'ipython', 'matplotlib', 'pytest', 'flake8'],
+    }
+    for k in ['0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9']
+}
+MAP_VERSION_TO_INSTALL_PYDICOM = {
+    k: {'python': '3.6', 'install': 'python -m pip install -e .', 'packages': 'numpy'}
+    for k in [
+        '1.0',
+        '1.1',
+        '1.2',
+        '1.3',
+        '1.4',
+        '2.0',
+        '2.1',
+        '2.2',
+        '2.3',
+        '2.4',
+        '3.0',
+    ]
+}
+MAP_VERSION_TO_INSTALL_PYDICOM.update(
+    {k: {**MAP_VERSION_TO_INSTALL_PYDICOM[k], 'python': '3.8'} for k in ['1.4', '2.0']}
+)
+MAP_VERSION_TO_INSTALL_PYDICOM.update(
+    {k: {**MAP_VERSION_TO_INSTALL_PYDICOM[k], 'python': '3.9'} for k in ['2.1', '2.2']}
+)
+MAP_VERSION_TO_INSTALL_PYDICOM.update(
+    {k: {**MAP_VERSION_TO_INSTALL_PYDICOM[k], 'python': '3.10'} for k in ['2.3']}
+)
+MAP_VERSION_TO_INSTALL_PYDICOM.update(
+    {k: {**MAP_VERSION_TO_INSTALL_PYDICOM[k], 'python': '3.11'} for k in ['2.4', '3.0']}
+)
+MAP_VERSION_TO_INSTALL_HUMANEVAL = {k: {'python': '3.9'} for k in ['1.0']}
+MAP_VERSION_TO_INSTALL_HUMANEVAL_FIX = {
+    k: {'python': '3.10', 'packages': 'pytest'} for k in ['0.0.1']
+}
+
+# Constants - Task Instance Instllation Environment
+MAP_VERSION_TO_INSTALL = {
+    'astropy/astropy': MAP_VERSION_TO_INSTALL_ASTROPY,
+    'dbt-labs/dbt-core': MAP_VERSION_TO_INSTALL_DBT_CORE,
+    'django/django': MAP_VERSION_TO_INSTALL_DJANGO,
+    'matplotlib/matplotlib': MAP_VERSION_TO_INSTALL_MATPLOTLIB,
+    'marshmallow-code/marshmallow': MAP_VERSION_TO_INSTALL_MARSHMALLOW,
+    'mwaskom/seaborn': MAP_VERSION_TO_INSTALL_SEABORN,
+    'pallets/flask': MAP_VERSION_TO_INSTALL_FLASK,
+    'psf/requests': MAP_VERSION_TO_INSTALL_REQUESTS,
+    'pvlib/pvlib-python': MAP_VERSION_TO_INSTALL_PVLIB,
+    'pydata/xarray': MAP_VERSION_TO_INSTALL_XARRAY,
+    'pydicom/pydicom': MAP_VERSION_TO_INSTALL_PYDICOM,
+    'pylint-dev/astroid': MAP_VERSION_TO_INSTALL_ASTROID,
+    'pylint-dev/pylint': MAP_VERSION_TO_INSTALL_PYLINT,
+    'pytest-dev/pytest': MAP_VERSION_TO_INSTALL_PYTEST,
+    'pyvista/pyvista': MAP_VERSION_TO_INSTALL_PYVISTA,
+    'scikit-learn/scikit-learn': MAP_VERSION_TO_INSTALL_SKLEARN,
+    'sphinx-doc/sphinx': MAP_VERSION_TO_INSTALL_SPHINX,
+    'sqlfluff/sqlfluff': MAP_VERSION_TO_INSTALL_SQLFLUFF,
+    'swe-bench/humaneval': MAP_VERSION_TO_INSTALL_HUMANEVAL,
+    'nielstron/humaneval_fix': MAP_VERSION_TO_INSTALL_HUMANEVAL_FIX,
+    'sympy/sympy': MAP_VERSION_TO_INSTALL_SYMPY,
+}
+
+# Constants - Repository Specific Installation Instructions
+MAP_REPO_TO_INSTALL = {}
+
+# Constants - Task Instance Test Frameworks
+TEST_PYTEST_VERBOSE = 'pytest -rA --tb=long -p no:cacheprovider'
+MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE = {
+    'astropy/astropy': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_ASTROPY.keys()
+    },
+    'django/django': {
+        k: './tests/runtests.py --verbosity 2 --settings=test_sqlite --parallel 1'
+        for k in MAP_VERSION_TO_INSTALL_DJANGO.keys()
+    },
+    'marshmallow-code/marshmallow': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_MARSHMALLOW.keys()
+    },
+    'matplotlib/matplotlib': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_MATPLOTLIB.keys()
+    },
+    'mwaskom/seaborn': {
+        k: 'pytest -rA --tb=long' for k in MAP_VERSION_TO_INSTALL_SEABORN.keys()
+    },
+    'pallets/flask': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_FLASK.keys()
+    },
+    'psf/requests': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_REQUESTS.keys()
+    },
+    'pvlib/pvlib-python': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_PVLIB.keys()
+    },
+    'pydata/xarray': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_XARRAY.keys()
+    },
+    'pydicom/pydicom': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_PYDICOM.keys()
+    },
+    'pylint-dev/astroid': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_ASTROID.keys()
+    },
+    'pylint-dev/pylint': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_PYLINT.keys()
+    },
+    'pytest-dev/pytest': {
+        k: 'pytest -rA --tb=long' for k in MAP_VERSION_TO_INSTALL_PYTEST.keys()
+    },
+    'pyvista/pyvista': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_PYVISTA.keys()
+    },
+    'scikit-learn/scikit-learn': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_SKLEARN.keys()
+    },
+    'sphinx-doc/sphinx': {
+        k: 'tox -epy39 -v --' for k in MAP_VERSION_TO_INSTALL_SPHINX.keys()
+    },
+    'sqlfluff/sqlfluff': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_SQLFLUFF.keys()
+    },
+    'swe-bench/humaneval': {
+        k: 'python' for k in MAP_VERSION_TO_INSTALL_HUMANEVAL.keys()
+    },
+    'nielstron/humaneval_fix': {
+        k: TEST_PYTEST_VERBOSE for k in MAP_VERSION_TO_INSTALL_HUMANEVAL.keys()
+    },
+    'sympy/sympy': {
+        k: 'bin/test -C --verbose' for k in MAP_VERSION_TO_INSTALL_SYMPY.keys()
+    },
+}
+MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE['django/django']['1.9'] = (
+    './tests/runtests.py --verbosity 2'
+)
@@ -0,0 +1,978 @@
+import asyncio
+import copy
+import json
+import os
+import tempfile
+from typing import Any, Literal
+
+import pandas as pd
+import toml
+from datasets import load_dataset
+
+import openhands.agenthub
+from evaluation.benchmarks.swe_perf.binary_patch_utils import (
+    remove_binary_diffs,
+    remove_binary_files_from_git,
+)
+from evaluation.benchmarks.swe_perf.resource.mapping import (
+    get_instance_resource_factor,
+)
+from evaluation.benchmarks.swe_perf.resource.swt_bench_constants import (
+    MAP_REPO_TO_INSTALL,
+    MAP_VERSION_TO_INSTALL,
+)
+from evaluation.utils.shared import (
+    EvalException,
+    EvalMetadata,
+    EvalOutput,
+    assert_and_raise,
+    check_maximum_retries_exceeded,
+    codeact_user_response,
+    get_default_sandbox_config_for_eval,
+    get_metrics,
+    is_fatal_evaluation_error,
+    make_metadata,
+    prepare_dataset,
+    reset_logger_for_multiprocessing,
+    run_evaluation,
+    update_llm_config_for_completions_logging,
+)
+from openhands.controller.state.state import State
+from openhands.core.config import (
+    AgentConfig,
+    OpenHandsConfig,
+    get_evaluation_parser,
+    get_llm_config_arg,
+)
+from openhands.core.config.condenser_config import NoOpCondenserConfig
+from openhands.core.config.utils import get_condenser_config_arg
+from openhands.core.logger import openhands_logger as logger
+from openhands.core.main import create_runtime, run_controller
+from openhands.critic import AgentFinishedCritic
+from openhands.events.action import CmdRunAction, FileReadAction, MessageAction
+from openhands.events.observation import (
+    CmdOutputObservation,
+    ErrorObservation,
+    FileReadObservation,
+)
+from openhands.events.serialization.event import event_from_dict, event_to_dict
+from openhands.runtime.base import Runtime
+from openhands.utils.async_utils import call_async_from_sync
+from openhands.utils.shutdown_listener import sleep_if_should_continue
+
+USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
+RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'true'
+ENABLE_LLM_EDITOR = os.environ.get('ENABLE_LLM_EDITOR', 'false').lower() == 'true'
+BenchMode = Literal['swe', 'swt', 'swt-ci']
+
+# Global variable to track dataset type
+DATASET_TYPE = 'SWE-Perf'
+
+
+AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
+    'CodeActAgent': codeact_user_response,
+}
+
+
+def _get_sweperf_workspace_dir_name(instance: pd.Series) -> str:
+    return f'{instance.repo}__{instance.version}'.replace('/', '__')
+
+
+def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
+    workspace_dir_name = _get_sweperf_workspace_dir_name(instance)
+
+    # The instruction
+    instruction = f"""
+<uploaded_files>
+/workspace/{workspace_dir_name}
+</uploaded_files>
+
+I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:
+
+
+<issue_description>
+{instance.problem_statement_realistic}
+</issue_description>
+
+Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
+I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
+Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
+Your task is to make the minimal changes to non-test files in the /workspace/{workspace_dir_name} directory to ensure the <issue_description> is satisfied.
+
+Follow these phases to resolve the issue:
+
+## ⚙️ Phase 1: Understand the Problem & Test Reuse
+
+**1.1. Install the package locally:**
+
+```bash
+python -m pip install pyinstrument
+python -m pip install -e .
+```
+
+> Only proceed to README-based install if the above fails.
+
+**1.2. Identify relevant modules and logic:**
+
+* Use test cases mentioned in `<issue_description>` to locate the functions and files involved.
+* Focus on potential performance bottlenecks: loops, I/O, locks, cache access, data structures, etc.
+
+**1.3. Run initial benchmark:**
+
+```bash
+pytest -rA --durations=0 --disable-warnings -p no:warnings --tb=no <test_case>
+```
+
+## 📊 Phase 2: Localization (Hierarchical Bottleneck Detection)
+
+**2.1. Global profiling using `pyinstrument`:**
+
+```bash
+pyinstrument -m pytest -rA --durations=0 --disable-warnings --tb=no --continue-on-collection-errors -p no:warnings <test_case>
+```
+
+**2.2. Analyze performance stack if necessary:**
+
+* 🔍 **Module level**: Identify hot files and methods.
+* 🔬 **Function level**: Focus on top-consuming classes/functions.
+* 🧬 **Line level**: Add fine-grained sampling/logging if needed.
+
+**2.3. Output a layered summary** showing where time is spent and why.
+
+
+## 🧠 Phase 3: Repair (Design Candidate Fixes)
+
+**3.1. Propose multiple optimization ideas:**
+
+* Algorithm refinement
+* Data structure improvement
+* Parallelism / async
+* Caching / batching
+
+**3.2. For each candidate:**
+
+* Describe the idea using pseudocode or `diff`
+* Evaluate expected gain vs implementation complexity
+
+---
+
+## 🔬 Phase 4: Patch Validation (Quantitative Benchmarking)
+
+**4.1. Apply each patch separately**
+
+**4.2. Re-run benchmark using the same test case**
+
+**4.3. Record improvements in pytest `call` phase**
+
+**4.4. Build a comparison table:**
+
+| Patch | Call Time (s) | % Improvement | Notes |
+| ----- | ------------- | ------------- | ----- |
+
+**4.5. Select the best-performing, low-risk patch**
+
+---
+
+## 📌 Phase 5: Fix Analysis (Explain the Fix)
+
+**5.1.** Clearly describe the **problem**
+
+**5.2.** Identify the **exact location** in the code
+
+**5.3.** Describe **how the test reveals** the problem
+
+**5.4.** State **best practices** relevant to the fix
+
+**5.5.** Explain **how the fix resolves** the issue
+
+---
+
+## 🧩 Phase 6: Fix Implementation
+
+**6.1.** Implement the selected fix in the non-test source files
+**6.2.** Ensure the changes are minimal and performance-focused
+
+---
+
+## ✅ Phase 7: Final Verification & Review
+
+**7.1.** Run **full regression tests** to ensure correctness
+
+**7.2.** Re-benchmark performance and compare with base commit `{instance['base_commit']}`
+
+**7.3.** Ensure `call` time is significantly improved in `pytest` output
+
+**7.4.** If gain is limited, consider deeper optimization
+
+**7.5.** Write a concise final report including:
+
+* Bottleneck path
+* Fix alternatives considered
+* Final solution and metrics
+* Code diff summary
+
+## Important ##
+1. Please do not optimize specifically for certain test cases, as this may be considered cheating.
+2. Delete all files unrelated to optimization, including temporary files, auxiliary test code, and so on.
+3. Avoid excessive optimization and unnecessary divergence; if the improvement is not significant, stop promptly to maintain efficiency and focus.
+
+Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
+"""
+
+    if RUN_WITH_BROWSING:
+        instruction += (
+            '<IMPORTANT!>\nYou SHOULD NEVER attempt to browse the web. </IMPORTANT!>\n'
+        )
+
+    if 'image_assets' in instance:
+        assets = json.loads(instance['image_assets'])
+        assert 'problem_statement' in assets, (
+            'problem_statement is required in image_assets'
+        )
+        image_urls = assets['problem_statement']
+        return MessageAction(content=instruction, image_urls=image_urls)
+    return MessageAction(content=instruction)
+
+
+def get_instance_docker_image(
+    instance_id: str,
+) -> str:
+    docker_image_prefix = 'docker.io/betty1202/'
+    image_name = 'sweb.eval.x86_64.' + instance_id
+    image_name = image_name.replace(
+        '__', '_s_'
+    )  # to comply with docker image naming convention
+    return (docker_image_prefix.rstrip('/') + '/' + image_name).lower()
+
+
+def get_config(
+    instance: pd.Series,
+    metadata: EvalMetadata,
+) -> OpenHandsConfig:
+    base_container_image = get_instance_docker_image(
+        instance['instance_id'],
+    )
+    logger.info(
+        f'Using instance container image: {base_container_image}. '
+        f'Please make sure this image exists. '
+        f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
+    )
+
+    sandbox_config = get_default_sandbox_config_for_eval()
+    sandbox_config.base_container_image = base_container_image
+    sandbox_config.enable_auto_lint = True
+    sandbox_config.use_host_network = False
+    # Add platform to the sandbox config to solve issue 4401
+    sandbox_config.platform = 'linux/amd64'
+    sandbox_config.remote_runtime_resource_factor = get_instance_resource_factor(
+        dataset_name=metadata.dataset,
+        instance_id=instance['instance_id'],
+    )
+
+    config = OpenHandsConfig(
+        default_agent=metadata.agent_class,
+        run_as_openhands=False,
+        max_iterations=metadata.max_iterations,
+        enable_browser=RUN_WITH_BROWSING,
+        runtime=os.environ.get('RUNTIME', 'docker'),
+        sandbox=sandbox_config,
+        # do not mount workspace
+        workspace_base=None,
+        workspace_mount_path=None,
+    )
+
+    config.set_llm_config(
+        update_llm_config_for_completions_logging(
+            metadata.llm_config, metadata.eval_output_dir, instance['instance_id']
+        )
+    )
+    # get 'draft_editor' config if exists
+    config.set_llm_config(get_llm_config_arg('draft_editor'), 'draft_editor')
+
+    agent_config = AgentConfig(
+        enable_jupyter=False,
+        enable_browsing=RUN_WITH_BROWSING,
+        enable_llm_editor=ENABLE_LLM_EDITOR,
+        enable_mcp=False,
+        condenser=metadata.condenser_config,
+        enable_prompt_extensions=False,
+    )
+    config.set_agent_config(agent_config)
+    return config
+
+
+def initialize_runtime(
+    runtime: Runtime,
+    instance: pd.Series,  # this argument is not required
+    metadata: EvalMetadata,
+):
+    """Initialize the runtime for the agent.
+
+    This function is called before the runtime is used to run the agent.
+    """
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Initialization Fn')
+    logger.info('-' * 30)
+    workspace_dir_name = _get_sweperf_workspace_dir_name(instance)
+    obs: CmdOutputObservation
+
+    # Set instance id and git configuration
+    action = CmdRunAction(
+        command=f"""echo 'export SWE_INSTANCE_ID={instance['instance_id']}' >> ~/.bashrc && echo 'export PIP_CACHE_DIR=~/.cache/pip' >> ~/.bashrc && echo "alias git='git --no-pager'" >> ~/.bashrc && git config --global core.pager "" && git config --global diff.binary false"""
+    )
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to export SWE_INSTANCE_ID and configure git: {str(obs)}',
+    )
+
+    action = CmdRunAction(command="""export USER=$(whoami); echo USER=${USER} """)
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to export USER: {str(obs)}')
+
+    # inject the init script
+    script_dir = os.path.dirname(__file__)
+
+    # inject the instance info
+    action = CmdRunAction(command='mkdir -p /swe_util/eval_data/instances')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to create /swe_util/eval_data/instances: {str(obs)}',
+    )
+
+    swe_instance_json_name = 'swe-perf-instance.json'
+    with tempfile.TemporaryDirectory() as temp_dir:
+        # Construct the full path for the desired file name within the temporary directory
+        temp_file_path = os.path.join(temp_dir, swe_instance_json_name)
+        # Write to the file with the desired name within the temporary directory
+        with open(temp_file_path, 'w') as f:
+            if not isinstance(instance, dict):
+                json.dump([instance.to_dict()], f)
+            else:
+                json.dump([instance], f)
+
+        # Copy the file to the desired location
+        runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')
+
+        # inject the instance swe entry
+        entry_script_path = 'instance_swe_entry.sh'
+        runtime.copy_to(
+            str(os.path.join(script_dir, f'scripts/setup/{entry_script_path}')),
+            '/swe_util/',
+        )
+
+    action = CmdRunAction(command='cat ~/.bashrc')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to cat ~/.bashrc: {str(obs)}')
+
+    action = CmdRunAction(command='source ~/.bashrc')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    if isinstance(obs, ErrorObservation):
+        logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
+    assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')
+
+    action = CmdRunAction(command=f'source /swe_util/{entry_script_path}')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to source /swe_util/{entry_script_path}: {str(obs)}',
+    )
+
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+
+    action = CmdRunAction(command='git reset --hard')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to git reset --hard: {str(obs)}')
+
+    action = CmdRunAction(
+        command='for remote_name in $(git remote); do git remote remove "${remote_name}"; done'
+    )
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(obs.exit_code == 0, f'Failed to remove git remotes: {str(obs)}')
+
+    if metadata.details['mode'] == 'swt-ci':
+        # set up repo
+        setup_commands = []
+        if instance['repo'] in MAP_REPO_TO_INSTALL:
+            setup_commands.append(MAP_REPO_TO_INSTALL[instance['repo']])
+
+        # Run pre-install set up if provided
+        install = MAP_VERSION_TO_INSTALL.get(instance['repo'], {}).get(
+            instance['version'], []
+        )
+        if 'pre_install' in install:
+            for pre_install in install['pre_install']:
+                setup_commands.append(pre_install)
+
+        if 'install' in install:
+            setup_commands.append(install['install'])
+
+        for command in setup_commands:
+            action = CmdRunAction(command=command)
+            action.set_hard_timeout(600)
+            logger.info(action, extra={'msg_type': 'ACTION'})
+            obs = runtime.run_action(action)
+            logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    action = CmdRunAction(command='which python')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        obs.exit_code == 0 and 'testbed' in obs.content,
+        f'Expected to find python interpreter from testbed, but got: {str(obs)}',
+    )
+
+    logger.info('-' * 30)
+    logger.info('END Runtime Initialization Fn')
+    logger.info('-' * 30)
+
+
+def complete_runtime(
+    runtime: Runtime,
+    instance: pd.Series,  # this argument is not required, but it is used to get the workspace_dir_name
+) -> dict[str, Any]:
+    """Complete the runtime for the agent.
+
+    This function is called before the runtime is used to run the agent.
+    If you need to do something in the sandbox to get the correctness metric after
+    the agent has run, modify this function.
+    """
+    logger.info('-' * 30)
+    logger.info('BEGIN Runtime Completion Fn')
+    logger.info('-' * 30)
+    obs: CmdOutputObservation
+    workspace_dir_name = _get_sweperf_workspace_dir_name(instance)
+
+    action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    if obs.exit_code == -1:
+        # The previous command is still running
+        # We need to kill previous command
+        logger.info('The previous command is still running, trying to kill it...')
+        action = CmdRunAction(command='C-c')
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+        # Then run the command again
+        action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+        action.set_hard_timeout(600)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    if obs.exit_code == -1:
+        # The previous command is still running
+        # We need to kill previous command
+        logger.info('The previous command is still running, trying to ctrl+z it...')
+        action = CmdRunAction(command='C-z')
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+        # Then run the command again
+        action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+        action.set_hard_timeout(600)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
+    )
+
+    action = CmdRunAction(command='git config --global core.pager ""')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git config --global core.pager "": {str(obs)}',
+    )
+
+    # First check for any git repositories in subdirectories
+    action = CmdRunAction(command='find . -type d -name .git -not -path "./.git"')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to find git repositories: {str(obs)}',
+    )
+
+    git_dirs = [p for p in obs.content.strip().split('\n') if p]
+    if git_dirs:
+        # Remove all .git directories in subdirectories
+        for git_dir in git_dirs:
+            action = CmdRunAction(command=f'rm -rf "{git_dir}"')
+            action.set_hard_timeout(600)
+            logger.info(action, extra={'msg_type': 'ACTION'})
+            obs = runtime.run_action(action)
+            logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+            assert_and_raise(
+                isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+                f'Failed to remove git directory {git_dir}: {str(obs)}',
+            )
+
+    # add all files
+    action = CmdRunAction(command='git add -A')
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to git add -A: {str(obs)}',
+    )
+
+    # Remove binary files from git staging
+    action = CmdRunAction(command=remove_binary_files_from_git())
+    action.set_hard_timeout(600)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert_and_raise(
+        isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
+        f'Failed to remove binary files: {str(obs)}',
+    )
+
+    n_retries = 0
+    git_patch = None
+    while n_retries < 5:
+        action = CmdRunAction(
+            command=f'git diff --no-color --cached {instance["base_commit"]} > patch.diff'
+        )
+        action.set_hard_timeout(max(300 + 100 * n_retries, 600))
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+        n_retries += 1
+        if isinstance(obs, CmdOutputObservation):
+            if obs.exit_code == 0:
+                # Read the patch file
+                action = FileReadAction(path='patch.diff')
+                action.set_hard_timeout(max(300 + 100 * n_retries, 600))
+                logger.info(action, extra={'msg_type': 'ACTION'})
+                obs = runtime.run_action(action)
+                logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+                if isinstance(obs, FileReadObservation):
+                    git_patch = obs.content
+                    break
+                elif isinstance(obs, ErrorObservation):
+                    # Fall back to cat "patch.diff" to get the patch
+                    assert 'File could not be decoded as utf-8' in obs.content
+                    action = CmdRunAction(command='cat patch.diff')
+                    action.set_hard_timeout(max(300 + 100 * n_retries, 600))
+                    logger.info(action, extra={'msg_type': 'ACTION'})
+                    obs = runtime.run_action(action)
+                    assert isinstance(obs, CmdOutputObservation) and obs.exit_code == 0
+                    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+                    git_patch = obs.content
+                    break
+                else:
+                    assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+            else:
+                logger.info('Failed to get git diff, retrying...')
+                sleep_if_should_continue(10)
+        elif isinstance(obs, ErrorObservation):
+            logger.error(f'Error occurred: {obs.content}. Retrying...')
+            sleep_if_should_continue(10)
+        else:
+            assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
+
+    assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
+
+    # Remove binary diffs from the patch
+    git_patch = remove_binary_diffs(git_patch)
+
+    logger.info('-' * 30)
+    logger.info('END Runtime Completion Fn')
+    logger.info('-' * 30)
+    return {'git_patch': git_patch}
+
+
+def process_instance(
+    instance: pd.Series,
+    metadata: EvalMetadata,
+    reset_logger: bool = True,
+    runtime_failure_count: int = 0,
+) -> EvalOutput:
+    config = get_config(instance, metadata)
+
+    # Setup the logger properly, so you can run multi-processing to parallelize the evaluation
+    if reset_logger:
+        log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
+        reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
+    else:
+        logger.info(f'Starting evaluation for instance {instance.instance_id}.')
+
+    # Increase resource_factor with increasing attempt_id
+    if runtime_failure_count > 0:
+        config.sandbox.remote_runtime_resource_factor = min(
+            config.sandbox.remote_runtime_resource_factor * (2**runtime_failure_count),
+            8,
+        )
+        logger.warning(
+            f'This is the {runtime_failure_count + 1}th attempt for instance {instance.instance_id}, setting resource factor to {config.sandbox.remote_runtime_resource_factor}'
+        )
+
+    metadata = copy.deepcopy(metadata)
+    metadata.details['runtime_failure_count'] = runtime_failure_count
+    metadata.details['remote_runtime_resource_factor'] = (
+        config.sandbox.remote_runtime_resource_factor
+    )
+
+    runtime = create_runtime(config)
+    call_async_from_sync(runtime.connect)
+
+    try:
+        initialize_runtime(runtime, instance, metadata)
+
+        message_action = get_instruction(instance, metadata)
+
+        # Here's how you can run the agent (similar to the `main` function) and get the final task state
+        state: State | None = asyncio.run(
+            run_controller(
+                config=config,
+                initial_user_action=message_action,
+                runtime=runtime,
+                fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
+                    metadata.agent_class
+                ],
+            )
+        )
+
+        # if fatal error, throw EvalError to trigger re-run
+        if is_fatal_evaluation_error(state.last_error):
+            raise EvalException('Fatal error detected: ' + state.last_error)
+
+        # Get git patch
+        complete_runtime_fn = complete_runtime
+        return_val = complete_runtime_fn(runtime, instance)
+        git_patch = return_val['git_patch']
+        logger.info(
+            f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
+        )
+    finally:
+        runtime.close()
+    # ==========================================
+
+    # ======= Attempt to evaluate the agent's edits =======
+    # we use eval_infer.sh to evaluate the agent's edits, not here
+    # because the agent may alter the environment / testcases
+    test_result = {
+        'git_patch': git_patch,
+    }
+
+    # If you are working on some simpler benchmark that only evaluates the final model output (e.g., in a MessageAction)
+    # You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
+    if state is None:
+        raise ValueError('State should not be None.')
+
+    # NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
+    histories = [event_to_dict(event) for event in state.history]
+    metrics = get_metrics(state)
+
+    # Save the output
+    instruction = message_action.content
+    if message_action.image_urls:
+        instruction += (
+            '\n\n<image_urls>' + '\n'.join(message_action.image_urls) + '</image_urls>'
+        )
+    output = EvalOutput(
+        instance_id=instance.instance_id,
+        instruction=instruction,
+        instance=instance.to_dict(),  # SWE Bench specific
+        test_result=test_result,
+        metadata=metadata,
+        history=histories,
+        metrics=metrics,
+        error=state.last_error if state and state.last_error else None,
+    )
+    return output
+
+
+def filter_dataset(dataset: pd.DataFrame, filter_column: str) -> pd.DataFrame:
+    file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.toml')
+    if os.path.exists(file_path):
+        with open(file_path, 'r') as file:
+            data = toml.load(file)
+            if 'selected_ids' in data:
+                selected_ids = data['selected_ids']
+                logger.info(
+                    f'Filtering {len(selected_ids)} tasks from "selected_ids"...'
+                )
+                subset = dataset[dataset[filter_column].isin(selected_ids)]
+                logger.info(f'Retained {subset.shape[0]} tasks after filtering')
+                return subset
+            if 'selected_repos' in data:
+                selected_repos = data['selected_repos']
+                if isinstance(selected_repos, str):
+                    selected_repos = [selected_repos]
+                assert isinstance(selected_repos, list)
+                logger.info(
+                    f'Filtering {selected_repos} tasks from "selected_repos"...'
+                )
+                subset = dataset[dataset['repo'].isin(selected_repos)]
+                logger.info(f'Retained {subset.shape[0]} tasks after filtering')
+                return subset
+
+    skip_ids = os.environ.get('SKIP_IDS', '').split(',')
+    if len(skip_ids) > 0:
+        logger.info(f'Filtering {len(skip_ids)} tasks from "SKIP_IDS"...')
+        return dataset[~dataset[filter_column].isin(skip_ids)]
+    return dataset
+
+
+if __name__ == '__main__':
+    parser = get_evaluation_parser()
+    parser.add_argument(
+        '--dataset',
+        type=str,
+        default='SWE-Perf/SWE-Perf',
+        help='data set to evaluate on, either full-test or lite-test',
+    )
+    parser.add_argument(
+        '--split',
+        type=str,
+        default='test',
+        help='split to evaluate on',
+    )
+    parser.add_argument(
+        '--mode',
+        type=str,
+        default='swe',
+        choices=['swe', 'swt', 'swt-ci'],
+        help="mode to run the evaluation, either 'swe', 'swt', or 'swt-ci'",
+    )
+
+    args, _ = parser.parse_known_args()
+
+    # NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
+    # so we don't need to manage file uploading to OpenHands's repo
+    dataset = load_dataset(args.dataset, split=args.split)
+
+    swe_perf_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
+    logger.info(
+        f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_perf_tests)} tasks'
+    )
+
+    llm_config = None
+    if args.llm_config:
+        llm_config = get_llm_config_arg(args.llm_config)
+        llm_config.log_completions = True
+        # modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
+        llm_config.modify_params = False
+
+    if llm_config is None:
+        raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
+
+    # Get condenser config from environment variable
+    condenser_name = os.environ.get('EVAL_CONDENSER')
+    if condenser_name:
+        condenser_config = get_condenser_config_arg(condenser_name)
+        if condenser_config is None:
+            raise ValueError(
+                f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
+            )
+    else:
+        # If no specific condenser config is provided via env var, default to NoOpCondenser
+        condenser_config = NoOpCondenserConfig()
+        logger.debug(
+            'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
+        )
+
+    details = {'mode': args.mode}
+    _agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
+
+    dataset_descrption = (
+        args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
+    )
+    metadata = make_metadata(
+        llm_config,
+        dataset_descrption,
+        args.agent_cls,
+        args.max_iterations,
+        args.eval_note,
+        args.eval_output_dir,
+        details=details,
+        condenser_config=condenser_config,
+    )
+
+    output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
+    print(f'### OUTPUT FILE: {output_file} ###')
+
+    # Run evaluation in iterative mode:
+    # If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
+    ITERATIVE_EVAL_MODE = (
+        os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
+    )
+    ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
+        os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
+    )
+
+    if not ITERATIVE_EVAL_MODE:
+        # load the dataset
+        instances = prepare_dataset(swe_perf_tests, output_file, args.eval_n_limit)
+
+        run_evaluation(
+            instances,
+            metadata,
+            output_file,
+            args.eval_num_workers,
+            process_instance,
+            timeout_seconds=8
+            * 60
+            * 60,  # 8 hour PER instance should be more than enough
+            max_retries=5,
+        )
+    else:
+        critic = AgentFinishedCritic()
+
+        def get_cur_output_file_path(attempt: int) -> str:
+            return (
+                f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
+            )
+
+        eval_ids = None
+        for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
+            cur_output_file = get_cur_output_file_path(attempt)
+            logger.info(
+                f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
+            )
+
+            # For deterministic eval, we set temperature to 0.1 for (>1) attempt
+            # so hopefully we get slightly different results
+            if attempt > 1 and metadata.llm_config.temperature == 0:
+                logger.info(
+                    f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
+                )
+                metadata.llm_config.temperature = 0.1
+
+            # Load instances - at first attempt, we evaluate all instances
+            # On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
+            instances = prepare_dataset(
+                swe_perf_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
+            )
+
+            # Run evaluation - but save them to cur_output_file
+            logger.info(
+                f'Evaluating {len(instances)} instances for attempt {attempt}...'
+            )
+            run_evaluation(
+                instances,
+                metadata,
+                cur_output_file,
+                args.eval_num_workers,
+                process_instance,
+                timeout_seconds=8
+                * 60
+                * 60,  # 8 hour PER instance should be more than enough
+                max_retries=5,
+            )
+
+            # When eval is done, we update eval_ids to the instances that failed the current attempt
+            instances_failed = []
+            logger.info(
+                f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
+            )
+            with open(cur_output_file, 'r') as f:
+                for line in f:
+                    instance = json.loads(line)
+                    try:
+                        history = [
+                            event_from_dict(event) for event in instance['history']
+                        ]
+                        critic_result = critic.evaluate(
+                            history, instance['test_result'].get('git_patch', '')
+                        )
+                        if not critic_result.success:
+                            instances_failed.append(instance['instance_id'])
+                    except Exception as e:
+                        logger.error(
+                            f'Error loading history for instance {instance["instance_id"]}: {e}'
+                        )
+                        instances_failed.append(instance['instance_id'])
+            logger.info(
+                f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
+            )
+            eval_ids = instances_failed
+
+            # If no instances failed, we break
+            if len(instances_failed) == 0:
+                break
+
+        # Then we should aggregate the results from all attempts into the original output file
+        # and remove the intermediate files
+        logger.info(
+            'Aggregating results from all attempts into the original output file...'
+        )
+        fout = open(output_file, 'w')
+        added_instance_ids = set()
+        for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
+            cur_output_file = get_cur_output_file_path(attempt)
+            if not os.path.exists(cur_output_file):
+                logger.warning(
+                    f'Intermediate output file {cur_output_file} does not exist. Skipping...'
+                )
+                continue
+
+            with open(cur_output_file, 'r') as f:
+                for line in f:
+                    instance = json.loads(line)
+                    # Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
+                    if (
+                        instance['instance_id'] not in added_instance_ids
+                        and instance['test_result'].get('git_patch', '').strip()
+                    ):
+                        fout.write(line)
+                        added_instance_ids.add(instance['instance_id'])
+            logger.info(
+                f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
+            )
+        fout.close()
+        logger.info(
+            f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
+        )
+        # Check if any instances reached maximum retries
+        check_maximum_retries_exceeded(metadata.eval_output_dir)
@@ -0,0 +1,146 @@
+#!/usr/bin/env bash
+set -eo pipefail
+
+source "evaluation/utils/version_control.sh"
+
+MODEL_CONFIG=$1
+COMMIT_HASH=$2
+AGENT=$3
+EVAL_LIMIT=$4
+MAX_ITER=$5
+NUM_WORKERS=$6
+DATASET=$7
+SPLIT=$8
+N_RUNS=$9
+MODE=${10}
+
+
+if [ -z "$NUM_WORKERS" ]; then
+  NUM_WORKERS=1
+  echo "Number of workers not specified, use default $NUM_WORKERS"
+fi
+checkout_eval_branch
+
+if [ -z "$AGENT" ]; then
+  echo "Agent not specified, use default CodeActAgent"
+  AGENT="CodeActAgent"
+fi
+
+if [ -z "$MAX_ITER" ]; then
+  echo "MAX_ITER not specified, use default 100"
+  MAX_ITER=100
+fi
+
+if [ -z "$RUN_WITH_BROWSING" ]; then
+  echo "RUN_WITH_BROWSING not specified, use default false"
+  RUN_WITH_BROWSING=false
+fi
+
+
+if [ -z "$DATASET" ]; then
+  echo "DATASET not specified, use default SWE-Perf/SWE-Perf"
+  DATASET="SWE-Perf/SWE-Perf"
+fi
+
+if [ -z "$SPLIT" ]; then
+  echo "SPLIT not specified, use default test"
+  SPLIT="test"
+fi
+
+if [ -z "$MODE" ]; then
+  MODE="swe"
+  echo "MODE not specified, use default $MODE"
+fi
+
+if [ -n "$EVAL_CONDENSER" ]; then
+  echo "Using Condenser Config: $EVAL_CONDENSER"
+else
+  echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
+fi
+
+export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
+echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
+
+get_openhands_version
+
+echo "AGENT: $AGENT"
+echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
+echo "MODEL_CONFIG: $MODEL_CONFIG"
+echo "DATASET: $DATASET"
+echo "SPLIT: $SPLIT"
+echo "MAX_ITER: $MAX_ITER"
+echo "NUM_WORKERS: $NUM_WORKERS"
+echo "COMMIT_HASH: $COMMIT_HASH"
+echo "MODE: $MODE"
+echo "EVAL_CONDENSER: $EVAL_CONDENSER"
+
+# Default to NOT use Hint
+if [ -z "$USE_HINT_TEXT" ]; then
+  export USE_HINT_TEXT=false
+fi
+echo "USE_HINT_TEXT: $USE_HINT_TEXT"
+EVAL_NOTE="$OPENHANDS_VERSION"
+# if not using Hint, add -no-hint to the eval note
+if [ "$USE_HINT_TEXT" = false ]; then
+  EVAL_NOTE="$EVAL_NOTE-no-hint"
+fi
+
+if [ "$RUN_WITH_BROWSING" = true ]; then
+  EVAL_NOTE="$EVAL_NOTE-with-browsing"
+fi
+
+if [ -n "$EXP_NAME" ]; then
+  EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
+fi
+# if mode != swe, add mode to the eval note
+if [ "$MODE" != "swe" ]; then
+  EVAL_NOTE="${EVAL_NOTE}-${MODE}"
+fi
+# Add condenser config to eval note if provided
+if [ -n "$EVAL_CONDENSER" ]; then
+  EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
+fi
+
+function run_eval() {
+  local eval_note="${1}"
+  COMMAND="poetry run python evaluation/benchmarks/swe_perf/run_infer.py \
+    --agent-cls $AGENT \
+    --llm-config $MODEL_CONFIG \
+    --max-iterations $MAX_ITER \
+    --eval-num-workers $NUM_WORKERS \
+    --eval-note $eval_note \
+    --dataset $DATASET \
+    --split $SPLIT \
+    --mode $MODE"
+
+
+
+  if [ -n "$EVAL_LIMIT" ]; then
+    echo "EVAL_LIMIT: $EVAL_LIMIT"
+    COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
+  fi
+
+  # Run the command
+  eval $COMMAND
+}
+
+unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
+if [ -z "$N_RUNS" ]; then
+  N_RUNS=1
+  echo "N_RUNS not specified, use default $N_RUNS"
+fi
+
+# Skip runs if the run number is in the SKIP_RUNS list
+# read from env variable SKIP_RUNS as a comma separated list of run numbers
+SKIP_RUNS=(${SKIP_RUNS//,/ })
+for i in $(seq 1 $N_RUNS); do
+  if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
+    echo "Skipping run $i"
+    continue
+  fi
+  current_eval_note="$EVAL_NOTE-run_$i"
+  echo "EVAL_NOTE: $current_eval_note"
+  run_eval $current_eval_note
+done
+
+checkout_original_branch
@@ -0,0 +1,54 @@
+"""This script compares gold patches with OpenHands-generated patches and check whether
+OpenHands found the right (set of) files to modify.
+"""
+
+import argparse
+import json
+import re
+
+
+def extract_modified_files(patch):
+    modified_files = set()
+    file_pattern = re.compile(r'^diff --git a/(.*?) b/')
+
+    for line in patch.split('\n'):
+        match = file_pattern.match(line)
+        if match:
+            modified_files.add(match.group(1))
+
+    return modified_files
+
+
+def process_report(oh_output_file):
+    succ = 0
+    fail = 0
+    for line in open(oh_output_file):
+        line = json.loads(line)
+        instance_id = line['instance_id']
+        gold_patch = line['swe_instance']['patch']
+        generated_patch = line['git_patch']
+        gold_modified_files = extract_modified_files(gold_patch)
+        # swe-bench lite only: a gold patch always contains exactly one file
+        assert len(gold_modified_files) == 1
+        generated_modified_files = extract_modified_files(generated_patch)
+
+        # Check if all files in gold_patch are also in generated_patch
+        all_files_in_generated = gold_modified_files.issubset(generated_modified_files)
+        if all_files_in_generated:
+            succ += 1
+        else:
+            fail += 1
+            print(
+                f'{instance_id}: file mismatch, gold = {gold_modified_files}, generated = {generated_modified_files}'
+            )
+    print(
+        f'\nSUMMARY: {succ} out of {succ + fail} instances found correct files to edit, success rate = {succ / float(succ + fail)}'
+    )
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--oh_output_file', help='Path to the OH output file')
+    args = parser.parse_args()
+
+    process_report(args.oh_output_file)
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+
+source ~/.bashrc
+SWEUTIL_DIR=/swe_util
+
+# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
+# SWE_INSTANCE_ID=django__django-11099
+if [ -z "$SWE_INSTANCE_ID" ]; then
+    echo "Error: SWE_INSTANCE_ID is not set." >&2
+    exit 1
+fi
+
+# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
+item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
+
+if [[ -z "$item" ]]; then
+  echo "No item found for the provided instance ID."
+  exit 1
+fi
+
+
+WORKSPACE_NAME=$(echo "$item" | jq -r '(.repo | tostring) + "__" + (.version | tostring) | gsub("/"; "__")')
+
+echo "WORKSPACE_NAME: $WORKSPACE_NAME"
+
+# Clear the workspace
+if [ -d /workspace ]; then
+    rm -rf /workspace/*
+else
+    mkdir /workspace
+fi
+# Copy repo to workspace
+if [ -d /workspace/$WORKSPACE_NAME ]; then
+    rm -rf /workspace/$WORKSPACE_NAME
+fi
+mkdir -p /workspace
+cp -r /testbed /workspace/$WORKSPACE_NAME
+
+# Activate instance-specific environment
+if [ -d /opt/miniconda3 ]; then
+    . /opt/miniconda3/etc/profile.d/conda.sh
+    conda activate testbed
+fi
@@ -0,0 +1 @@
+test
@@ -1,8 +1,6 @@
 # Run frontend checks
 echo "Running frontend checks..."
 cd frontend
-npm run lint
-npm run check-translation-completeness
 npx lint-staged

 # Run backend pre-commit
@@ -1,5 +1,5 @@
 import { describe, expect, it } from "vitest";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
 import {
  FILE_VARIANTS_1,
  FILE_VARIANTS_2,
@@ -10,20 +10,20 @@ import {
 * You can find the mock handlers in `frontend/src/mocks/file-service-handlers.ts`.
 */

-describe("OpenHands File API", () => {
+describe("ConversationService File API", () => {
  it("should get a list of files", async () => {
-    await expect(OpenHands.getFiles("test-conversation-id")).resolves.toEqual(
-      FILE_VARIANTS_1,
-    );
+    await expect(
+      ConversationService.getFiles("test-conversation-id"),
+    ).resolves.toEqual(FILE_VARIANTS_1);

    await expect(
-      OpenHands.getFiles("test-conversation-id-2"),
+      ConversationService.getFiles("test-conversation-id-2"),
    ).resolves.toEqual(FILE_VARIANTS_2);
  });

  it("should get content of a file", async () => {
    await expect(
-      OpenHands.getFile("test-conversation-id", "file1.txt"),
+      ConversationService.getFile("test-conversation-id", "file1.txt"),
    ).resolves.toEqual("Content of file1.txt");
  });
 });
@@ -3,7 +3,7 @@ import { screen } from "@testing-library/react";
 import { renderWithProviders } from "test-utils";
 import { createRoutesStub } from "react-router";
 import { ExpandableMessage } from "#/components/features/chat/expandable-message";
-import OpenHands from "#/api/open-hands";
+import OptionService from "#/api/option-service/option-service.api";

 vi.mock("react-i18next", async () => {
  const actual = await vi.importActual("react-i18next");
@@ -113,7 +113,7 @@ describe("ExpandableMessage", () => {
  });

  it("should render the out of credits message when the user is out of credits", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - We only care about the APP_MODE and FEATURE_FLAGS fields
    getConfigSpy.mockResolvedValue({
      APP_MODE: "saas",
@@ -3,13 +3,13 @@ import { describe, expect, it, vi } from "vitest";
 import { render, screen, waitFor } from "@testing-library/react";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
 import { AnalyticsConsentFormModal } from "#/components/features/analytics/analytics-consent-form-modal";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";

 describe("AnalyticsConsentFormModal", () => {
  it("should call saveUserSettings with consent", async () => {
    const user = userEvent.setup();
    const onCloseMock = vi.fn();
-    const saveUserSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveUserSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

    render(<AnalyticsConsentFormModal onClose={onCloseMock} />, {
      wrapper: ({ children }) => (
@@ -8,7 +8,7 @@ import {
  UserMessageAction,
 } from "#/types/core/actions";
 import { OpenHandsObservation } from "#/types/core/observations";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
 import { Conversation } from "#/api/open-hands.types";

 vi.mock("react-router", () => ({
@@ -80,7 +80,7 @@ describe("Messages", () => {
  });

  it("should render a launch to microagent action button on chat messages only if it is a user message", () => {
-    const getConversationSpy = vi.spyOn(OpenHands, "getConversation");
+    const getConversationSpy = vi.spyOn(ConversationService, "getConversation");
    const mockConversation: Conversation = {
      conversation_id: "123",
      title: "Test Conversation",
@@ -357,69 +357,6 @@ describe("ConversationCard", () => {
    expect(onClick).not.toHaveBeenCalled();
  });

-  it("should show display cost button only when showOptions is true", async () => {
-    const onContextMenuToggle = vi.fn();
-    const { rerender } = renderWithProviders(
-      <ConversationCard
-        onDelete={onDelete}
-        onChangeTitle={onChangeTitle}
-        title="Conversation 1"
-        selectedRepository={null}
-        lastUpdatedAt="2021-10-01T12:00:00Z"
-        contextMenuOpen
-        onContextMenuToggle={onContextMenuToggle}
-      />,
-    );
-
-    // Wait for context menu to appear
-    const menu = await screen.findByTestId("context-menu");
-    expect(
-      within(menu).queryByTestId("display-cost-button"),
-    ).not.toBeInTheDocument();
-
-    rerender(
-      <ConversationCard
-        onDelete={onDelete}
-        onChangeTitle={onChangeTitle}
-        showOptions
-        title="Conversation 1"
-        selectedRepository={null}
-        lastUpdatedAt="2021-10-01T12:00:00Z"
-        contextMenuOpen
-        onContextMenuToggle={onContextMenuToggle}
-      />,
-    );
-
-    // Wait for context menu to appear and check for display cost button
-    const newMenu = await screen.findByTestId("context-menu");
-    within(newMenu).getByTestId("display-cost-button");
-  });
-
-  it("should show metrics modal when clicking the display cost button", async () => {
-    const user = userEvent.setup();
-    const onContextMenuToggle = vi.fn();
-    renderWithProviders(
-      <ConversationCard
-        onDelete={onDelete}
-        onChangeTitle={onChangeTitle}
-        title="Conversation 1"
-        selectedRepository={null}
-        lastUpdatedAt="2021-10-01T12:00:00Z"
-        showOptions
-        contextMenuOpen
-        onContextMenuToggle={onContextMenuToggle}
-      />,
-    );
-
-    const menu = screen.getByTestId("context-menu");
-    const displayCostButton = within(menu).getByTestId("display-cost-button");
-
-    await user.click(displayCostButton);
-
-    // Verify if metrics modal is displayed by checking for the modal content
-    expect(screen.getByTestId("metrics-modal")).toBeInTheDocument();
-  });
-
  it("should not display the edit or delete options if the handler is not provided", async () => {
    const onContextMenuToggle = vi.fn();
    const { rerender } = renderWithProviders(
@@ -1,12 +1,11 @@
 import { screen, waitFor, within } from "@testing-library/react";
 import { beforeAll, beforeEach, describe, expect, it, vi } from "vitest";
-import { QueryClientConfig } from "@tanstack/react-query";
 import userEvent from "@testing-library/user-event";
 import { createRoutesStub } from "react-router";
 import React from "react";
-import { renderWithProviders } from "test-utils";
+import { renderWithQueryAndI18n } from "test-utils";
 import { ConversationPanel } from "#/components/features/conversation-panel/conversation-panel";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
 import { Conversation } from "#/api/open-hands.types";

 describe("ConversationPanel", () => {
@@ -18,16 +17,7 @@ describe("ConversationPanel", () => {
    },
  ]);

-  const renderConversationPanel = (config?: QueryClientConfig) =>
-    renderWithProviders(<RouterStub />, {
-      preloadedState: {
-        metrics: {
-          cost: null,
-          max_budget_per_task: null,
-          usage: null,
-        },
-      },
-    });
+  const renderConversationPanel = () => renderWithQueryAndI18n(<RouterStub />);

  beforeAll(() => {
    vi.mock("react-router", async (importOriginal) => ({
@@ -85,7 +75,7 @@ describe("ConversationPanel", () => {
    vi.clearAllMocks();
    vi.restoreAllMocks();
    // Setup default mock for getUserConversations
-    vi.spyOn(OpenHands, "getUserConversations").mockResolvedValue({
+    vi.spyOn(ConversationService, "getUserConversations").mockResolvedValue({
      results: [...mockConversations],
      next_page_id: null,
    });
@@ -101,7 +91,10 @@ describe("ConversationPanel", () => {
  });

  it("should display an empty state when there are no conversations", async () => {
-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockResolvedValue({
      results: [],
      next_page_id: null,
@@ -114,7 +107,10 @@ describe("ConversationPanel", () => {
  });

  it("should handle an error when fetching conversations", async () => {
-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockRejectedValue(
      new Error("Failed to fetch conversations"),
    );
@@ -203,14 +199,17 @@ describe("ConversationPanel", () => {
      },
    ];

-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockImplementation(async () => ({
      results: mockData,
      next_page_id: null,
    }));

    const deleteUserConversationSpy = vi.spyOn(
-      OpenHands,
+      ConversationService,
      "deleteUserConversation",
    );
    deleteUserConversationSpy.mockImplementation(async (id: string) => {
@@ -260,7 +259,10 @@ describe("ConversationPanel", () => {

  it("should refetch data on rerenders", async () => {
    const user = userEvent.setup();
-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockResolvedValue({
      results: [...mockConversations],
      next_page_id: null,
@@ -285,15 +287,7 @@ describe("ConversationPanel", () => {
      },
    ]);

-    renderWithProviders(<MyRouterStub />, {
-      preloadedState: {
-        metrics: {
-          cost: null,
-          max_budget_per_task: null,
-          usage: null,
-        },
-      },
-    });
+    renderWithQueryAndI18n(<MyRouterStub />);

    const toggleButton = screen.getByText("Toggle");

@@ -357,7 +351,10 @@ describe("ConversationPanel", () => {
      },
    ];

-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockResolvedValue({
      results: mockRunningConversations,
      next_page_id: null,
@@ -424,13 +421,19 @@ describe("ConversationPanel", () => {
      },
    ];

-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockImplementation(async () => ({
      results: mockData,
      next_page_id: null,
    }));

-    const stopConversationSpy = vi.spyOn(OpenHands, "stopConversation");
+    const stopConversationSpy = vi.spyOn(
+      ConversationService,
+      "stopConversation",
+    );
    stopConversationSpy.mockImplementation(async (id: string) => {
      const conversation = mockData.find((conv) => conv.conversation_id === id);
      if (conversation) {
@@ -512,7 +515,10 @@ describe("ConversationPanel", () => {
      },
    ];

-    const getUserConversationsSpy = vi.spyOn(OpenHands, "getUserConversations");
+    const getUserConversationsSpy = vi.spyOn(
+      ConversationService,
+      "getUserConversations",
+    );
    getUserConversationsSpy.mockResolvedValue({
      results: mockMixedStatusConversations,
      next_page_id: null,
@@ -619,7 +625,10 @@ describe("ConversationPanel", () => {
    const user = userEvent.setup();

    // Mock the updateConversation API call
-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    // Mock the toast function
@@ -656,7 +665,10 @@ describe("ConversationPanel", () => {
  it("should save title when Enter key is pressed", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    renderConversationPanel();
@@ -685,7 +697,10 @@ describe("ConversationPanel", () => {
  it("should trim whitespace from title", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    renderConversationPanel();
@@ -714,7 +729,10 @@ describe("ConversationPanel", () => {
  it("should revert to original title when empty", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    renderConversationPanel();
@@ -740,7 +758,10 @@ describe("ConversationPanel", () => {
  it("should handle API error when updating title", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockRejectedValue(new Error("API Error"));

    vi.mock("#/utils/custom-toast-handlers", () => ({
@@ -807,7 +828,10 @@ describe("ConversationPanel", () => {
  it("should not call API when title is unchanged", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    renderConversationPanel();
@@ -833,7 +857,10 @@ describe("ConversationPanel", () => {
  it("should handle special characters in title", async () => {
    const user = userEvent.setup();

-    const updateConversationSpy = vi.spyOn(OpenHands, "updateConversation");
+    const updateConversationSpy = vi.spyOn(
+      ConversationService,
+      "updateConversation",
+    );
    updateConversationSpy.mockResolvedValue(true);

    renderConversationPanel();
@@ -3,7 +3,7 @@ import { render, screen } from "@testing-library/react";
 import { Provider } from "react-redux";
 import { setupStore } from "test-utils";
 import { describe, expect, it, vi } from "vitest";
-import { HomeHeader } from "#/components/features/home/home-header";
+import { HomeHeader } from "#/components/features/home/home-header/home-header";

 // Mock the translation function
 vi.mock("react-i18next", async () => {
@@ -5,8 +5,8 @@ import { createRoutesStub } from "react-router";
 import { setupStore } from "test-utils";
 import { describe, expect, it, vi } from "vitest";
 import userEvent from "@testing-library/user-event";
-import { NewConversation } from "#/components/features/home/new-conversation";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
+import { NewConversation } from "#/components/features/home/new-conversation/new-conversation";

 // Mock the translation function
 vi.mock("react-i18next", async () => {
@@ -54,7 +54,10 @@ const renderNewConversation = () => {

 describe("NewConversation", () => {
  it("should create an empty conversation and redirect when pressing the launch from scratch button", async () => {
-    const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+    const createConversationSpy = vi.spyOn(
+      ConversationService,
+      "createConversation",
+    );

    renderNewConversation();

@@ -5,7 +5,10 @@ import { QueryClientProvider, QueryClient } from "@tanstack/react-query";
 import { setupStore } from "test-utils";
 import { Provider } from "react-redux";
 import { createRoutesStub, Outlet } from "react-router";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
+import GitService from "#/api/git-service/git-service.api";
+import OptionService from "#/api/option-service/option-service.api";
 import { GitRepository } from "#/types/git";
 import { RepoConnector } from "#/components/features/home/repo-connector";
 import { MOCK_DEFAULT_USER_SETTINGS } from "#/mocks/handlers";
@@ -66,7 +69,7 @@ const MOCK_RESPOSITORIES: GitRepository[] = [
 ];

 beforeEach(() => {
-  const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+  const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
  getSettingsSpy.mockResolvedValue({
    ...MOCK_DEFAULT_USER_SETTINGS,
    provider_tokens_set: {
@@ -84,7 +87,7 @@ describe("RepoConnector", () => {

  it("should render the available repositories in the dropdown", async () => {
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -93,7 +96,7 @@ describe("RepoConnector", () => {
    });

    // Mock the search function that's used by the dropdown
-    vi.spyOn(OpenHands, "searchGitRepositories").mockResolvedValue(
+    vi.spyOn(GitService, "searchGitRepositories").mockResolvedValue(
      MOCK_RESPOSITORIES,
    );

@@ -121,7 +124,7 @@ describe("RepoConnector", () => {

  it("should only enable the launch button if a repo is selected", async () => {
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -135,10 +138,16 @@ describe("RepoConnector", () => {
    expect(launchButton).toBeDisabled();

    // Mock the repository branches API call
-    vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({ branches: [
-      { name: "main", commit_sha: "123", protected: false },
-      { name: "develop", commit_sha: "456", protected: false },
-    ], has_next_page: false, current_page: 1, per_page: 30, total_count: 2 });
+    vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
+      branches: [
+        { name: "main", commit_sha: "123", protected: false },
+        { name: "develop", commit_sha: "456", protected: false },
+      ],
+      has_next_page: false,
+      current_page: 1,
+      per_page: 30,
+      total_count: 2,
+    });

    // First select the provider
    const providerDropdown = await waitFor(() =>
@@ -170,14 +179,14 @@ describe("RepoConnector", () => {
  });

  it("should render the 'add github repos' link in dropdown if saas mode and github provider is set", async () => {
-    const getConfiSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfiSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - only return the APP_MODE and APP_SLUG
    getConfiSpy.mockResolvedValue({
      APP_MODE: "saas",
      APP_SLUG: "openhands",
    });

-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {
@@ -187,7 +196,7 @@ describe("RepoConnector", () => {
    });

    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -217,14 +226,14 @@ describe("RepoConnector", () => {
  });

  it("should not render the 'add github repos' link if github provider is not set", async () => {
-    const getConfiSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfiSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - only return the APP_MODE and APP_SLUG
    getConfiSpy.mockResolvedValue({
      APP_MODE: "saas",
      APP_SLUG: "openhands",
    });

-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {
@@ -234,7 +243,7 @@ describe("RepoConnector", () => {
    });

    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -262,13 +271,13 @@ describe("RepoConnector", () => {
  });

  it("should not render the 'add github repos' link in dropdown if oss mode", async () => {
-    const getConfiSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfiSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - only return the APP_MODE
    getConfiSpy.mockResolvedValue({
      APP_MODE: "oss",
    });

-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {
@@ -278,7 +287,7 @@ describe("RepoConnector", () => {
    });

    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -306,7 +315,10 @@ describe("RepoConnector", () => {
  });

  it("should create a conversation and redirect with the selected repo when pressing the launch button", async () => {
-    const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+    const createConversationSpy = vi.spyOn(
+      ConversationService,
+      "createConversation",
+    );
    createConversationSpy.mockResolvedValue({
      conversation_id: "mock-conversation-id",
      title: "Test Conversation",
@@ -321,7 +333,7 @@ describe("RepoConnector", () => {
      session_api_key: null,
    });
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -340,10 +352,16 @@ describe("RepoConnector", () => {
    expect(createConversationSpy).not.toHaveBeenCalled();

    // Mock the repository branches API call
-    vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({ branches: [
-      { name: "main", commit_sha: "123", protected: false },
-      { name: "develop", commit_sha: "456", protected: false },
-    ], has_next_page: false, current_page: 1, per_page: 30, total_count: 2 });
+    vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
+      branches: [
+        { name: "main", commit_sha: "123", protected: false },
+        { name: "develop", commit_sha: "456", protected: false },
+      ],
+      has_next_page: false,
+      current_page: 1,
+      per_page: 30,
+      total_count: 2,
+    });

    // First select the provider
    const providerDropdown = await waitFor(() =>
@@ -385,10 +403,13 @@ describe("RepoConnector", () => {
  });

  it("should change the launch button text to 'Loading...' when creating a conversation", async () => {
-    const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+    const createConversationSpy = vi.spyOn(
+      ConversationService,
+      "createConversation",
+    );
    createConversationSpy.mockImplementation(() => new Promise(() => {})); // Never resolves to keep loading state
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -397,10 +418,16 @@ describe("RepoConnector", () => {
    });

    // Mock the repository branches API call
-    vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({ branches: [
-      { name: "main", commit_sha: "123", protected: false },
-      { name: "develop", commit_sha: "456", protected: false },
-    ], has_next_page: false, current_page: 1, per_page: 30, total_count: 2 });
+    vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
+      branches: [
+        { name: "main", commit_sha: "123", protected: false },
+        { name: "develop", commit_sha: "456", protected: false },
+      ],
+      has_next_page: false,
+      current_page: 1,
+      per_page: 30,
+      total_count: 2,
+    });

    renderRepoConnector();

@@ -448,7 +475,7 @@ describe("RepoConnector", () => {
  });

  it("should display a button to settings if the user needs to sign in with their git provider", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {},
@@ -2,7 +2,8 @@ import { render, screen } from "@testing-library/react";
 import { describe, expect, vi, beforeEach, it } from "vitest";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
 import { RepositorySelectionForm } from "../../../../src/components/features/home/repo-selection-form";
-import OpenHands from "#/api/open-hands";
+import UserService from "#/api/user-service/user-service.api";
+import GitService from "#/api/git-service/git-service.api";
 import { GitRepository } from "#/types/git";

 // Create mock functions
@@ -204,7 +205,7 @@ describe("RepositorySelectionForm", () => {
    ];

    // Create a spy on the API call
-    const searchGitReposSpy = vi.spyOn(OpenHands, "searchGitRepositories");
+    const searchGitReposSpy = vi.spyOn(GitService, "searchGitRepositories");
    searchGitReposSpy.mockResolvedValue(MOCK_SEARCH_REPOS);

    mockUseGitRepositories.mockReturnValue({
@@ -5,7 +5,9 @@ import userEvent from "@testing-library/user-event";
 import { Provider } from "react-redux";
 import { createRoutesStub } from "react-router";
 import { setupStore } from "test-utils";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
+import UserService from "#/api/user-service/user-service.api";
+import GitService from "#/api/git-service/git-service.api";
 import { TaskCard } from "#/components/features/home/tasks/task-card";
 import { GitRepository } from "#/types/git";
 import { SuggestedTask } from "#/utils/types";
@@ -57,7 +59,10 @@ describe("TaskCard", () => {
  });

  it("should call createConversation when clicking the launch button", async () => {
-    const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+    const createConversationSpy = vi.spyOn(
+      ConversationService,
+      "createConversation",
+    );

    renderTaskCard();

@@ -70,7 +75,7 @@ describe("TaskCard", () => {
  describe("creating suggested task conversation", () => {
    beforeEach(() => {
      const retrieveUserGitRepositoriesSpy = vi.spyOn(
-        OpenHands,
+        GitService,
        "retrieveUserGitRepositories",
      );
      retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -80,7 +85,10 @@ describe("TaskCard", () => {
    });

    it("should call create conversation with suggest task trigger and selected suggested task", async () => {
-      const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+      const createConversationSpy = vi.spyOn(
+        ConversationService,
+        "createConversation",
+      );

      renderTaskCard(MOCK_TASK_1);

@@ -106,7 +114,10 @@ describe("TaskCard", () => {
  });

  it("should navigate to the conversation page after creating a conversation", async () => {
-    const createConversationSpy = vi.spyOn(OpenHands, "createConversation");
+    const createConversationSpy = vi.spyOn(
+      ConversationService,
+      "createConversation",
+    );
    createConversationSpy.mockResolvedValue({
      conversation_id: "test-conversation-id",
      title: "Test Conversation",
@@ -7,7 +7,8 @@ import React from "react";
 import { renderWithProviders } from "test-utils";
 import MicroagentManagement from "#/routes/microagent-management";
 import { MicroagentManagementMain } from "#/components/features/microagent-management/microagent-management-main";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
+import GitService from "#/api/git-service/git-service.api";
 import { GitRepository } from "#/types/git";
 import { RepositoryMicroagent } from "#/types/microagent-management";
 import { Conversation } from "#/api/open-hands.types";
@@ -56,11 +57,6 @@ describe("MicroagentManagement", () => {
  const renderMicroagentManagement = (config?: QueryClientConfig) =>
    renderWithProviders(<RouterStub />, {
      preloadedState: {
-        metrics: {
-          cost: null,
-          max_budget_per_task: null,
-          usage: null,
-        },
        microagentManagement: {
          addMicroagentModalVisible: false,
          updateMicroagentModalVisible: false,
@@ -231,20 +227,20 @@ describe("MicroagentManagement", () => {
    });

    // Setup default mock for retrieveUserGitRepositories
-    vi.spyOn(OpenHands, "retrieveUserGitRepositories").mockResolvedValue({
+    vi.spyOn(GitService, "retrieveUserGitRepositories").mockResolvedValue({
      data: [...mockRepositories],
      nextPage: null,
    });
    // Setup default mock for getRepositoryMicroagents
-    vi.spyOn(OpenHands, "getRepositoryMicroagents").mockResolvedValue([
+    vi.spyOn(GitService, "getRepositoryMicroagents").mockResolvedValue([
      ...mockMicroagents,
    ]);
    // Setup default mock for searchConversations
-    vi.spyOn(OpenHands, "searchConversations").mockResolvedValue([
+    vi.spyOn(ConversationService, "searchConversations").mockResolvedValue([
      ...mockConversations,
    ]);
    // Setup default mock for getRepositoryMicroagentContent
-    vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+    vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
      content: "Original microagent content for testing updates",
      path: ".openhands/microagents/update-test-microagent",
      git_provider: "github",
@@ -1290,7 +1286,7 @@ describe("MicroagentManagement", () => {
  // Add microagent integration tests
  describe("Add microagent functionality", () => {
    beforeEach(() => {
-      vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
        branches: [{ name: "main", commit_sha: "abc123", protected: false }],
        has_next_page: false,
        current_page: 1,
@@ -1350,11 +1346,6 @@ describe("MicroagentManagement", () => {
      // Render with modal already visible in Redux state
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: null,
            addMicroagentModalVisible: true, // Start with modal visible
@@ -1645,11 +1636,6 @@ describe("MicroagentManagement", () => {
    const renderMicroagentManagementMain = (selectedMicroagentItem: any) =>
      renderWithProviders(<MicroagentManagementMain />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            addMicroagentModalVisible: false,
            selectedRepository: {
@@ -1983,7 +1969,7 @@ describe("MicroagentManagement", () => {
    };

    beforeEach(() => {
-      vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
        branches: [{ name: "main", commit_sha: "abc123", protected: false }],
        has_next_page: false,
        current_page: 1,
@@ -1997,11 +1983,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible in Redux state
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2036,11 +2017,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible and selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2074,11 +2050,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible and selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2117,11 +2088,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible and selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2173,11 +2139,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2224,11 +2185,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2278,11 +2234,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible but no microagent data
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: null,
            addMicroagentModalVisible: false,
@@ -2314,7 +2265,7 @@ describe("MicroagentManagement", () => {
      const user = userEvent.setup();

      // Mock the content API to return empty content for this test
-      vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
        content: "",
        path: ".openhands/microagents/update-test-microagent",
        git_provider: "github",
@@ -2324,11 +2275,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible and microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2363,7 +2309,7 @@ describe("MicroagentManagement", () => {
      const user = userEvent.setup();

      // Mock the content API to return content without triggers for this test
-      vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
        content: "Original microagent content for testing updates",
        path: ".openhands/microagents/update-test-microagent",
        git_provider: "github",
@@ -2373,11 +2319,6 @@ describe("MicroagentManagement", () => {
      // Render with update modal visible and microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForUpdate,
@@ -2560,11 +2501,6 @@ describe("MicroagentManagement", () => {
      // Render with selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForLearn,
@@ -2600,11 +2536,6 @@ describe("MicroagentManagement", () => {
      // Render with selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForLearn,
@@ -2647,7 +2578,7 @@ describe("MicroagentManagement", () => {
      const user = userEvent.setup();

      // Mock the content API to return the expected content for this test
-      vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
        content: "Test microagent content for learn functionality",
        path: ".openhands/microagents/learn-test-microagent",
        git_provider: "github",
@@ -2657,11 +2588,6 @@ describe("MicroagentManagement", () => {
      // Render with selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForLearn,
@@ -2707,7 +2633,7 @@ describe("MicroagentManagement", () => {
      const user = userEvent.setup();

      // Mock the content API to return empty content for this test
-      vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
        content: "",
        path: ".openhands/microagents/learn-test-microagent",
        git_provider: "github",
@@ -2717,11 +2643,6 @@ describe("MicroagentManagement", () => {
      // Render with selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForLearn,
@@ -2765,7 +2686,7 @@ describe("MicroagentManagement", () => {
      const user = userEvent.setup();

      // Mock the content API to return content without triggers for this test
-      vi.spyOn(OpenHands, "getRepositoryMicroagentContent").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryMicroagentContent").mockResolvedValue({
        content: "Test microagent content for learn functionality",
        path: ".openhands/microagents/learn-test-microagent",
        git_provider: "github",
@@ -2775,11 +2696,6 @@ describe("MicroagentManagement", () => {
      // Render with selected microagent
      renderWithProviders(<RouterStub />, {
        preloadedState: {
-          metrics: {
-            cost: null,
-            max_budget_per_task: null,
-            usage: null,
-          },
          microagentManagement: {
            selectedMicroagentItem: {
              microagent: mockMicroagentForLearn,
@@ -1,23 +1,30 @@
-import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
-import { render, screen, waitFor } from "@testing-library/react";
+import { screen, waitFor } from "@testing-library/react";
 import userEvent from "@testing-library/user-event";
 import { afterEach, beforeEach, describe, expect, it, test, vi } from "vitest";
-import OpenHands from "#/api/open-hands";
+import BillingService from "#/api/billing-service/billing-service.api";
+import OptionService from "#/api/option-service/option-service.api";
 import { PaymentForm } from "#/components/features/payment/payment-form";
+import { renderWithProviders } from "../../../../test-utils";
+
+// Mock the stripe checkout hook to avoid JSDOM navigation issues
+const mockMutate = vi.fn().mockResolvedValue(undefined);
+vi.mock("#/hooks/mutation/stripe/use-create-stripe-checkout-session", () => ({
+  useCreateStripeCheckoutSession: () => ({
+    mutate: mockMutate,
+    mutateAsync: vi.fn().mockResolvedValue(undefined),
+    isPending: false,
+  }),
+}));

 describe("PaymentForm", () => {
-  const getBalanceSpy = vi.spyOn(OpenHands, "getBalance");
-  const createCheckoutSessionSpy = vi.spyOn(OpenHands, "createCheckoutSession");
-  const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+  const getBalanceSpy = vi.spyOn(BillingService, "getBalance");
+  const createCheckoutSessionSpy = vi.spyOn(
+    BillingService,
+    "createCheckoutSession",
+  );
+  const getConfigSpy = vi.spyOn(OptionService, "getConfig");

-  const renderPaymentForm = () =>
-    render(<PaymentForm />, {
-      wrapper: ({ children }) => (
-        <QueryClientProvider client={new QueryClient()}>
-          {children}
-        </QueryClientProvider>
-      ),
-    });
+  const renderPaymentForm = () => renderWithProviders(<PaymentForm />);

  beforeEach(() => {
    // useBalance hook will return the balance only if the APP_MODE is "saas" and the billing feature is enabled
@@ -37,6 +44,7 @@ describe("PaymentForm", () => {

  afterEach(() => {
    vi.clearAllMocks();
+    mockMutate.mockClear();
  });

  it("should render the users current balance", async () => {
@@ -69,7 +77,7 @@ describe("PaymentForm", () => {
    const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
    await user.click(topUpButton);

-    expect(createCheckoutSessionSpy).toHaveBeenCalledWith(50);
+    expect(mockMutate).toHaveBeenCalledWith({ amount: 50 });
  });

  it("should only accept integer values", async () => {
@@ -82,7 +90,7 @@ describe("PaymentForm", () => {
    const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
    await user.click(topUpButton);

-    expect(createCheckoutSessionSpy).toHaveBeenCalledWith(50);
+    expect(mockMutate).toHaveBeenCalledWith({ amount: 50 });
  });

  it("should disable the top-up button if the user enters an invalid amount", async () => {
@@ -122,7 +130,7 @@ describe("PaymentForm", () => {
      const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
      await user.click(topUpButton);

-      expect(createCheckoutSessionSpy).not.toHaveBeenCalled();
+      expect(mockMutate).not.toHaveBeenCalled();
    });

    test("user enters an empty string", async () => {
@@ -135,7 +143,7 @@ describe("PaymentForm", () => {
      const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
      await user.click(topUpButton);

-      expect(createCheckoutSessionSpy).not.toHaveBeenCalled();
+      expect(mockMutate).not.toHaveBeenCalled();
    });

    test("user enters a non-numeric value", async () => {
@@ -150,7 +158,7 @@ describe("PaymentForm", () => {
      const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
      await user.click(topUpButton);

-      expect(createCheckoutSessionSpy).not.toHaveBeenCalled();
+      expect(mockMutate).not.toHaveBeenCalled();
    });

    test("user enters less than the minimum amount", async () => {
@@ -163,7 +171,7 @@ describe("PaymentForm", () => {
      const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
      await user.click(topUpButton);

-      expect(createCheckoutSessionSpy).not.toHaveBeenCalled();
+      expect(mockMutate).not.toHaveBeenCalled();
    });

    test("user enters a decimal value", async () => {
@@ -177,7 +185,175 @@ describe("PaymentForm", () => {
      const topUpButton = screen.getByText("PAYMENT$ADD_CREDIT");
      await user.click(topUpButton);

-      expect(createCheckoutSessionSpy).not.toHaveBeenCalled();
+      expect(mockMutate).not.toHaveBeenCalled();
+    });
+  });
+
+  describe("Cancel Subscription", () => {
+    const getSubscriptionAccessSpy = vi.spyOn(
+      BillingService,
+      "getSubscriptionAccess",
+    );
+    const cancelSubscriptionSpy = vi.spyOn(
+      BillingService,
+      "cancelSubscription",
+    );
+
+    beforeEach(() => {
+      // Mock active subscription
+      getSubscriptionAccessSpy.mockResolvedValue({
+        start_at: "2024-01-01T00:00:00Z",
+        end_at: "2024-12-31T23:59:59Z",
+        created_at: "2024-01-01T00:00:00Z",
+      });
+    });
+
+    it("should render cancel subscription button when user has active subscription", async () => {
+      renderPaymentForm();
+
+      await waitFor(() => {
+        const cancelButton = screen.getByTestId("cancel-subscription-button");
+        expect(cancelButton).toBeInTheDocument();
+        expect(cancelButton).toHaveTextContent("PAYMENT$CANCEL_SUBSCRIPTION");
+      });
+    });
+
+    it("should not render cancel subscription button when user has no subscription", async () => {
+      getSubscriptionAccessSpy.mockResolvedValue(null);
+      renderPaymentForm();
+
+      await waitFor(() => {
+        const cancelButton = screen.queryByTestId("cancel-subscription-button");
+        expect(cancelButton).not.toBeInTheDocument();
+      });
+    });
+
+    it("should show confirmation modal when cancel subscription button is clicked", async () => {
+      const user = userEvent.setup();
+      renderPaymentForm();
+
+      const cancelButton = await screen.findByTestId(
+        "cancel-subscription-button",
+      );
+      await user.click(cancelButton);
+
+      // Should show confirmation modal
+      expect(
+        screen.getByTestId("cancel-subscription-modal"),
+      ).toBeInTheDocument();
+      expect(
+        screen.getByText("PAYMENT$CANCEL_SUBSCRIPTION_TITLE"),
+      ).toBeInTheDocument();
+      // The message should be rendered (either with Trans component or regular text)
+      const modalContent = screen.getByTestId("cancel-subscription-modal");
+      expect(modalContent).toBeInTheDocument();
+      expect(screen.getByTestId("confirm-cancel-button")).toBeInTheDocument();
+      expect(screen.getByTestId("modal-cancel-button")).toBeInTheDocument();
+    });
+
+    it("should close modal when cancel button in modal is clicked", async () => {
+      const user = userEvent.setup();
+      renderPaymentForm();
+
+      const cancelButton = await screen.findByTestId(
+        "cancel-subscription-button",
+      );
+      await user.click(cancelButton);
+
+      // Modal should be visible
+      expect(
+        screen.getByTestId("cancel-subscription-modal"),
+      ).toBeInTheDocument();
+
+      // Click cancel in modal
+      const modalCancelButton = screen.getByTestId("modal-cancel-button");
+      await user.click(modalCancelButton);
+
+      // Modal should be closed
+      expect(
+        screen.queryByTestId("cancel-subscription-modal"),
+      ).not.toBeInTheDocument();
+    });
+
+    it("should call cancel subscription API when confirm button is clicked", async () => {
+      const user = userEvent.setup();
+      renderPaymentForm();
+
+      const cancelButton = await screen.findByTestId(
+        "cancel-subscription-button",
+      );
+      await user.click(cancelButton);
+
+      // Click confirm in modal
+      const confirmButton = screen.getByTestId("confirm-cancel-button");
+      await user.click(confirmButton);
+
+      // Should call the cancel subscription API
+      expect(cancelSubscriptionSpy).toHaveBeenCalled();
+    });
+
+    it("should close modal after successful cancellation", async () => {
+      const user = userEvent.setup();
+      cancelSubscriptionSpy.mockResolvedValue({
+        status: "success",
+        message: "Subscription cancelled successfully",
+      });
+      renderPaymentForm();
+
+      const cancelButton = await screen.findByTestId(
+        "cancel-subscription-button",
+      );
+      await user.click(cancelButton);
+
+      const confirmButton = screen.getByTestId("confirm-cancel-button");
+      await user.click(confirmButton);
+
+      // Wait for API call to complete and modal to close
+      await waitFor(() => {
+        expect(
+          screen.queryByTestId("cancel-subscription-modal"),
+        ).not.toBeInTheDocument();
+      });
+    });
+
+    it("should show next billing date for active subscription", async () => {
+      // Mock active subscription with end_at as next billing date
+      getSubscriptionAccessSpy.mockResolvedValue({
+        start_at: "2024-01-01T00:00:00Z",
+        end_at: "2025-01-01T00:00:00Z",
+        created_at: "2024-01-01T00:00:00Z",
+        cancelled_at: null,
+        stripe_subscription_id: "sub_123",
+      });
+
+      renderPaymentForm();
+
+      await waitFor(() => {
+        const nextBillingInfo = screen.getByTestId("next-billing-date");
+        expect(nextBillingInfo).toBeInTheDocument();
+        // Check that it contains some date-related content (translation key or actual date)
+        expect(nextBillingInfo).toHaveTextContent(
+          /2025|PAYMENT.*BILLING.*DATE/,
+        );
+      });
+    });
+
+    it("should not show next billing date when subscription is cancelled", async () => {
+      // Mock cancelled subscription
+      getSubscriptionAccessSpy.mockResolvedValue({
+        start_at: "2024-01-01T00:00:00Z",
+        end_at: "2025-01-01T00:00:00Z",
+        created_at: "2024-01-01T00:00:00Z",
+        cancelled_at: "2024-06-15T10:30:00Z",
+        stripe_subscription_id: "sub_123",
+      });
+
+      renderPaymentForm();
+
+      await waitFor(() => {
+        const nextBillingInfo = screen.queryByTestId("next-billing-date");
+        expect(nextBillingInfo).not.toBeInTheDocument();
+      });
    });
  });
 });
@@ -3,7 +3,7 @@ import { renderWithProviders } from "test-utils";
 import { createRoutesStub } from "react-router";
 import { waitFor } from "@testing-library/react";
 import { Sidebar } from "#/components/features/sidebar/sidebar";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";

 // These tests will now fail because the conversation panel is rendered through a portal
 // and technically not a child of the Sidebar component.
@@ -19,7 +19,7 @@ const renderSidebar = () =>
  renderWithProviders(<RouterStub initialEntries={["/conversation/123"]} />);

 describe("Sidebar", () => {
-  const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+  const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

  afterEach(() => {
    vi.clearAllMocks();
@@ -1,11 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("File Operations Messages", () => {
-  it.todo("should show success indicator for successful file read operation");
-
-  it.todo("should show failure indicator for failed file read operation");
-
-  it.todo("should show success indicator for successful file edit operation");
-
-  it.todo("should show failure indicator for failed file edit operation");
-});
@@ -3,7 +3,7 @@ import userEvent from "@testing-library/user-event";
 import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
 import { renderWithProviders } from "test-utils";
 import { MicroagentsModal } from "#/components/features/conversation-panel/microagents-modal";
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";
 import { AgentState } from "#/types/agent-state";

 vi.mock("react-redux", async () => {
@@ -48,7 +48,7 @@ describe("MicroagentsModal - Refresh Button", () => {
    vi.clearAllMocks();

    // Setup default mock for getUserConversations
-    vi.spyOn(OpenHands, "getMicroagents").mockResolvedValue({
+    vi.spyOn(ConversationService, "getMicroagents").mockResolvedValue({
      microagents: mockMicroagents,
    });
  });
@@ -73,7 +73,7 @@ describe("MicroagentsModal - Refresh Button", () => {

      renderWithProviders(<MicroagentsModal {...defaultProps} />);

-      const refreshSpy = vi.spyOn(OpenHands, "getMicroagents");
+      const refreshSpy = vi.spyOn(ConversationService, "getMicroagents");

      const refreshButton = screen.getByTestId("refresh-microagents");
      await user.click(refreshButton);
@@ -3,13 +3,13 @@ import { describe, expect, it, vi } from "vitest";
 import { renderWithProviders } from "test-utils";
 import { createRoutesStub } from "react-router";
 import { screen } from "@testing-library/react";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
 import { SettingsForm } from "#/components/shared/modals/settings/settings-form";
 import { DEFAULT_SETTINGS } from "#/services/settings";

 describe("SettingsForm", () => {
  const onCloseMock = vi.fn();
-  const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+  const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

  const RouteStub = createRoutesStub([
    {
@@ -1,17 +1,14 @@
 import { act, screen } from "@testing-library/react";
 import { renderWithProviders } from "test-utils";
 import { vi, describe, afterEach, it, expect } from "vitest";
-import { Command, appendInput, appendOutput } from "#/state/command-slice";
+import { Command, useCommandStore } from "#/state/command-store";
 import Terminal from "#/components/features/terminal/terminal";

-const renderTerminal = (commands: Command[] = []) =>
-  renderWithProviders(<Terminal />, {
-    preloadedState: {
-      cmd: {
-        commands,
-      },
-    },
-  });
+const renderTerminal = (commands: Command[] = []) => {
+  // Set initial commands in Zustand store
+  useCommandStore.setState({ commands });
+  return renderWithProviders(<Terminal />);
+};

 describe.skip("Terminal", () => {
  global.ResizeObserver = vi.fn().mockImplementation(() => ({
@@ -58,25 +55,25 @@ describe.skip("Terminal", () => {
  });

  it("should write commands to the terminal", () => {
-    const { store } = renderTerminal();
+    renderTerminal();

    act(() => {
-      store.dispatch(appendInput("echo Hello"));
-      store.dispatch(appendOutput("Hello"));
+      useCommandStore.getState().appendInput("echo Hello");
+      useCommandStore.getState().appendOutput("Hello");
    });

    expect(mockTerminal.writeln).toHaveBeenNthCalledWith(1, "echo Hello");
    expect(mockTerminal.writeln).toHaveBeenNthCalledWith(2, "Hello");

    act(() => {
-      store.dispatch(appendInput("echo World"));
+      useCommandStore.getState().appendInput("echo World");
    });

    expect(mockTerminal.writeln).toHaveBeenNthCalledWith(3, "echo World");
  });

  it("should load and write commands to the terminal", () => {
-    const { store } = renderTerminal([
+    renderTerminal([
      { type: "input", content: "echo Hello" },
      { type: "output", content: "Hello" },
    ]);
@@ -85,17 +82,17 @@ describe.skip("Terminal", () => {
    expect(mockTerminal.writeln).toHaveBeenNthCalledWith(2, "Hello");

    act(() => {
-      store.dispatch(appendInput("echo Hello"));
+      useCommandStore.getState().appendInput("echo Hello");
    });

    expect(mockTerminal.writeln).toHaveBeenNthCalledWith(3, "echo Hello");
  });

  it("should end the line with a dollar sign after writing a command", () => {
-    const { store } = renderTerminal();
+    renderTerminal();

    act(() => {
-      store.dispatch(appendInput("echo Hello"));
+      useCommandStore.getState().appendInput("echo Hello");
    });

    expect(mockTerminal.writeln).toHaveBeenCalledWith("echo Hello");
@@ -1,58 +0,0 @@
-import { render, screen } from "@testing-library/react";
-import userEvent from "@testing-library/user-event";
-import { afterEach, describe, expect, it, vi } from "vitest";
-import { UploadImageInput } from "#/components/features/images/upload-image-input";
-
-describe("UploadImageInput", () => {
-  const user = userEvent.setup();
-  const onUploadMock = vi.fn();
-
-  afterEach(() => {
-    vi.clearAllMocks();
-  });
-
-  it("should render an input", () => {
-    render(<UploadImageInput onUpload={onUploadMock} />);
-    expect(screen.getByTestId("upload-image-input")).toBeInTheDocument();
-  });
-
-  it("should call onUpload when a file is selected", async () => {
-    render(<UploadImageInput onUpload={onUploadMock} />);
-
-    const file = new File(["(⌐□_□)"], "chucknorris.png", { type: "image/png" });
-    const input = screen.getByTestId("upload-image-input");
-
-    await user.upload(input, file);
-
-    expect(onUploadMock).toHaveBeenNthCalledWith(1, [file]);
-  });
-
-  it("should call onUpload when multiple files are selected", async () => {
-    render(<UploadImageInput onUpload={onUploadMock} />);
-
-    const files = [
-      new File(["(⌐□_□)"], "chucknorris.png", { type: "image/png" }),
-      new File(["(⌐□_□)"], "chucknorris2.png", { type: "image/png" }),
-    ];
-    const input = screen.getByTestId("upload-image-input");
-
-    await user.upload(input, files);
-
-    expect(onUploadMock).toHaveBeenNthCalledWith(1, files);
-  });
-
-  it("should render custom labels", () => {
-    const { rerender } = render(<UploadImageInput onUpload={onUploadMock} />);
-    expect(screen.getByTestId("default-label")).toBeInTheDocument();
-
-    function CustomLabel() {
-      return <span>Custom label</span>;
-    }
-    rerender(
-      <UploadImageInput onUpload={onUploadMock} label={<CustomLabel />} />,
-    );
-
-    expect(screen.getByText("Custom label")).toBeInTheDocument();
-    expect(screen.queryByTestId("default-label")).not.toBeInTheDocument();
-  });
-});
@@ -1,12 +1,12 @@
 import { renderHook, waitFor } from "@testing-library/react";
 import { describe, expect, it, vi } from "vitest";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
 import { useSaveSettings } from "#/hooks/mutation/use-save-settings";

 describe("useSaveSettings", () => {
  it("should send an empty string for llm_api_key if an empty string is passed, otherwise undefined", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
    const { result } = renderHook(() => useSaveSettings(), {
      wrapper: ({ children }) => (
        <QueryClientProvider client={new QueryClient()}>
@@ -1,7 +1,7 @@
 import { beforeAll, describe, expect, it, vi } from "vitest";
 import { afterEach } from "node:test";
 import { useTerminal } from "#/hooks/use-terminal";
-import { Command } from "#/state/command-slice";
+import { Command, useCommandStore } from "#/state/command-store";
 import { AgentState } from "#/types/agent-state";
 import { renderWithProviders } from "../../test-utils";

@@ -19,10 +19,10 @@ interface TestTerminalComponentProps {
  commands: Command[];
 }

-function TestTerminalComponent({
-  commands,
-}: TestTerminalComponentProps) {
-  const ref = useTerminal({ commands });
+function TestTerminalComponent({ commands }: TestTerminalComponentProps) {
+  // Set commands in Zustand store
+  useCommandStore.setState({ commands });
+  const ref = useTerminal();
  return <div ref={ref} />;
 }

@@ -60,7 +60,6 @@ describe("useTerminal", () => {
    renderWithProviders(<TestTerminalComponent commands={[]} />, {
      preloadedState: {
        agent: { curAgentState: AgentState.RUNNING },
-        cmd: { commands: [] },
      },
    });
  });
@@ -74,7 +73,6 @@ describe("useTerminal", () => {
    renderWithProviders(<TestTerminalComponent commands={commands} />, {
      preloadedState: {
        agent: { curAgentState: AgentState.RUNNING },
-        cmd: { commands },
      },
    });

@@ -94,17 +92,11 @@ describe("useTerminal", () => {
      { content: secret, type: "output" },
    ];

-    renderWithProviders(
-      <TestTerminalComponent
-        commands={commands}
-      />,
-      {
-        preloadedState: {
-          agent: { curAgentState: AgentState.RUNNING },
-          cmd: { commands },
-        },
+    renderWithProviders(<TestTerminalComponent commands={commands} />, {
+      preloadedState: {
+        agent: { curAgentState: AgentState.RUNNING },
      },
-    );
+    });

    // This test is no longer relevant as secrets filtering has been removed
  });
@@ -1,20 +1,24 @@
-import { describe, it, expect } from "vitest";
-import store from "../src/store";
-import {
-  setInitialPrompt,
-  clearInitialPrompt,
-} from "../src/state/initial-query-slice";
+import { describe, it, expect, beforeEach } from "vitest";
+import { useInitialQueryStore } from "../src/stores/initial-query-store";

 describe("Initial Query Behavior", () => {
-  it("should clear initial query when clearInitialPrompt is dispatched", () => {
+  beforeEach(() => {
+    // Reset the store before each test
+    useInitialQueryStore.getState().reset();
+  });
+
+  it("should clear initial query when clearInitialPrompt is called", () => {
+    const { setInitialPrompt, clearInitialPrompt, initialPrompt } =
+      useInitialQueryStore.getState();
+
    // Set up initial query in the store
-    store.dispatch(setInitialPrompt("test query"));
-    expect(store.getState().initialQuery.initialPrompt).toBe("test query");
+    setInitialPrompt("test query");
+    expect(useInitialQueryStore.getState().initialPrompt).toBe("test query");

    // Clear the initial query
-    store.dispatch(clearInitialPrompt());
+    clearInitialPrompt();

    // Verify initial query is cleared
-    expect(store.getState().initialQuery.initialPrompt).toBeNull();
+    expect(useInitialQueryStore.getState().initialPrompt).toBeNull();
  });
 });
@@ -8,8 +8,9 @@ import {
 import userEvent from "@testing-library/user-event";
 import MainApp from "#/routes/root-layout";
 import i18n from "#/i18n";
+import OptionService from "#/api/option-service/option-service.api";
 import * as CaptureConsent from "#/utils/handle-capture-consent";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
 import * as ToastHandlers from "#/utils/custom-toast-handlers";

 describe("frontend/routes/_oh", () => {
@@ -62,8 +63,8 @@ describe("frontend/routes/_oh", () => {
  // FIXME: This test fails when it shouldn't be, please investigate
  it.skip("should render and capture the user's consent if oss mode", async () => {
    const user = userEvent.setup();
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    const handleCaptureConsentSpy = vi.spyOn(
      CaptureConsent,
      "handleCaptureConsent",
@@ -106,7 +107,7 @@ describe("frontend/routes/_oh", () => {
  });

  it("should not render the user consent form if saas mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue({
      APP_MODE: "saas",
      GITHUB_CLIENT_ID: "test-id",
@@ -184,8 +185,8 @@ describe("frontend/routes/_oh", () => {
  });

  it("should render a you're in toast if it is a new user and in saas mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    const displaySuccessToastSpy = vi.spyOn(
      ToastHandlers,
      "displaySuccessToast",
@@ -3,7 +3,7 @@ import { afterEach, describe, expect, it, vi } from "vitest";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
 import userEvent from "@testing-library/user-event";
 import AppSettingsScreen from "#/routes/app-settings";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
 import { MOCK_DEFAULT_USER_SETTINGS } from "#/mocks/handlers";
 import { AvailableLanguages } from "#/i18n";
 import * as CaptureConsent from "#/utils/handle-capture-consent";
@@ -25,7 +25,7 @@ describe("Content", () => {
  });

  it("should render the correct default values", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      language: "no",
@@ -65,8 +65,8 @@ describe("Form submission", () => {
  });

  it("should submit the form with the correct values", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    renderAppSettingsScreen();
@@ -106,7 +106,7 @@ describe("Form submission", () => {
  });

  it("should only enable the submit button when there are changes", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    renderAppSettingsScreen();
@@ -146,7 +146,7 @@ describe("Form submission", () => {
  });

  it("should call handleCaptureConsents with true when the analytics switch is toggled", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    const handleCaptureConsentsSpy = vi.spyOn(
@@ -168,7 +168,7 @@ describe("Form submission", () => {
  });

  it("should call handleCaptureConsents with false when the analytics switch is toggled", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      user_consents_to_analytics: true,
@@ -215,8 +215,8 @@ describe("Form submission", () => {
  });

  it("should disable the button after submitting changes", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    renderAppSettingsScreen();
@@ -240,8 +240,8 @@ describe("Form submission", () => {

 describe("Status toasts", () => {
  it("should call displaySuccessToast when the settings are saved", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    const displaySuccessToastSpy = vi.spyOn(
@@ -265,8 +265,8 @@ describe("Status toasts", () => {
  });

  it("should call displayErrorToast when the settings fail to save", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");
@@ -6,9 +6,11 @@ import userEvent from "@testing-library/user-event";
 import i18next from "i18next";
 import { I18nextProvider } from "react-i18next";
 import GitSettingsScreen from "#/routes/git-settings";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
+import OptionService from "#/api/option-service/option-service.api";
+import AuthService from "#/api/auth-service/auth-service.api";
 import { MOCK_DEFAULT_USER_SETTINGS } from "#/mocks/handlers";
-import { GetConfigResponse } from "#/api/open-hands.types";
+import { GetConfigResponse } from "#/api/option-service/option.types";
 import * as ToastHandlers from "#/utils/custom-toast-handlers";
 import { SecretsService } from "#/api/secrets-service";

@@ -108,7 +110,7 @@ describe("Content", () => {
  });

  it("should render the inputs if OSS mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    const { rerender } = renderGitSettingsScreen();
@@ -151,8 +153,8 @@ describe("Content", () => {
  });

  it("should set '<hidden>' placeholder and indicator if the GitHub token is set", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
    getSettingsSpy.mockResolvedValue({
@@ -226,7 +228,7 @@ describe("Content", () => {
  });

  it("should render the 'Configure GitHub Repositories' button if SaaS mode and app slug exists", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    const { rerender } = renderGitSettingsScreen();
@@ -270,7 +272,7 @@ describe("Form submission", () => {
  it("should save the GitHub token", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
    saveProvidersSpy.mockImplementation(() => Promise.resolve(true));
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -291,7 +293,7 @@ describe("Form submission", () => {
  it("should save GitLab tokens", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
    saveProvidersSpy.mockImplementation(() => Promise.resolve(true));
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -312,7 +314,7 @@ describe("Form submission", () => {
  it("should save the Bitbucket token", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
    saveProvidersSpy.mockImplementation(() => Promise.resolve(true));
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -331,7 +333,7 @@ describe("Form submission", () => {
  });

  it("should disable the button if there is no input", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -357,8 +359,8 @@ describe("Form submission", () => {
  });

  it("should enable a disconnect tokens button if there is at least one token set", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
    getSettingsSpy.mockResolvedValue({
@@ -391,9 +393,9 @@ describe("Form submission", () => {
  });

  it("should call logout when pressing the disconnect tokens button", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const logoutSpy = vi.spyOn(OpenHands, "logout");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const logoutSpy = vi.spyOn(AuthService, "logout");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
    getSettingsSpy.mockResolvedValue({
@@ -418,7 +420,7 @@ describe("Form submission", () => {
  // flaky test
  it.skip("should disable the button when submitting changes", async () => {
    const saveSettingsSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -442,7 +444,7 @@ describe("Form submission", () => {

  it("should disable the button after submitting changes", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);

    renderGitSettingsScreen();
@@ -476,7 +478,7 @@ describe("Form submission", () => {
 describe("Status toasts", () => {
  it("should call displaySuccessToast when the settings are saved", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    const displaySuccessToastSpy = vi.spyOn(
@@ -499,7 +501,7 @@ describe("Status toasts", () => {

  it("should call displayErrorToast when the settings fail to save", async () => {
    const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);

    const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");
@@ -7,7 +7,9 @@ import { Provider } from "react-redux";
 import { createAxiosNotFoundErrorObject, setupStore } from "test-utils";
 import HomeScreen from "#/routes/home";
 import { GitRepository } from "#/types/git";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
+import GitService from "#/api/git-service/git-service.api";
+import OptionService from "#/api/option-service/option-service.api";
 import MainApp from "#/routes/root-layout";
 import { MOCK_DEFAULT_USER_SETTINGS } from "#/mocks/handlers";

@@ -91,7 +93,7 @@ const MOCK_RESPOSITORIES: GitRepository[] = [

 describe("HomeScreen", () => {
  beforeEach(() => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      provider_tokens_set: {
@@ -139,7 +141,7 @@ describe("HomeScreen", () => {

  it("should filter the suggested tasks based on the selected repository", async () => {
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -148,7 +150,7 @@ describe("HomeScreen", () => {
    });

    // Mock the repository branches API call
-    vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({
+    vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
      branches: [
        { name: "main", commit_sha: "123", protected: false },
        { name: "develop", commit_sha: "456", protected: false },
@@ -183,7 +185,7 @@ describe("HomeScreen", () => {

  it("should filter tasks when different repositories are selected", async () => {
    const retrieveUserGitRepositoriesSpy = vi.spyOn(
-      OpenHands,
+      GitService,
      "retrieveUserGitRepositories",
    );
    retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -192,7 +194,7 @@ describe("HomeScreen", () => {
    });

    // Mock the repository branches API call
-    vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({
+    vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
      branches: [
        { name: "main", commit_sha: "123", protected: false },
        { name: "develop", commit_sha: "456", protected: false },
@@ -246,7 +248,7 @@ describe("HomeScreen", () => {
        await screen.findAllByTestId("task-launch-button");

      // Mock the repository branches API call
-      vi.spyOn(OpenHands, "getRepositoryBranches").mockResolvedValue({
+      vi.spyOn(GitService, "getRepositoryBranches").mockResolvedValue({
        branches: [
          { name: "main", commit_sha: "123", protected: false },
          { name: "develop", commit_sha: "456", protected: false },
@@ -282,7 +284,7 @@ describe("HomeScreen", () => {

    beforeEach(() => {
      const retrieveUserGitRepositoriesSpy = vi.spyOn(
-        OpenHands,
+        GitService,
        "retrieveUserGitRepositories",
      );
      retrieveUserGitRepositoriesSpy.mockResolvedValue({
@@ -358,8 +360,8 @@ describe("Settings 404", () => {
    vi.resetAllMocks();
  });

-  const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-  const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+  const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+  const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

  it("should open the settings modal if GET /settings fails with a 404", async () => {
    const error = createAxiosNotFoundErrorObject();
@@ -417,8 +419,8 @@ describe("Settings 404", () => {
 });

 describe("Setup Payment modal", () => {
-  const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-  const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+  const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+  const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");

  it("should only render if SaaS mode and is new user", async () => {
    // @ts-expect-error - we only need the APP_MODE for this test
@@ -3,13 +3,27 @@ import userEvent from "@testing-library/user-event";
 import { beforeEach, describe, expect, it, vi } from "vitest";
 import { QueryClientProvider, QueryClient } from "@tanstack/react-query";
 import LlmSettingsScreen from "#/routes/llm-settings";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
+import OptionService from "#/api/option-service/option-service.api";
 import {
  MOCK_DEFAULT_USER_SETTINGS,
  resetTestHandlersMockSettings,
 } from "#/mocks/handlers";
 import * as AdvancedSettingsUtlls from "#/utils/has-advanced-settings-set";
 import * as ToastHandlers from "#/utils/custom-toast-handlers";
+import BillingService from "#/api/billing-service/billing-service.api";
+
+// Mock react-router hooks
+const mockUseSearchParams = vi.fn();
+vi.mock("react-router", () => ({
+  useSearchParams: () => mockUseSearchParams(),
+}));
+
+// Mock useIsAuthed hook
+const mockUseIsAuthed = vi.fn();
+vi.mock("#/hooks/query/use-is-authed", () => ({
+  useIsAuthed: () => mockUseIsAuthed(),
+}));

 const renderLlmSettingsScreen = () =>
  render(<LlmSettingsScreen />, {
@@ -23,6 +37,17 @@ const renderLlmSettingsScreen = () =>
 beforeEach(() => {
  vi.resetAllMocks();
  resetTestHandlersMockSettings();
+
+  // Default mock for useSearchParams - returns empty params
+  mockUseSearchParams.mockReturnValue([
+    {
+      get: () => null,
+    },
+    vi.fn(),
+  ]);
+
+  // Default mock for useIsAuthed - returns authenticated by default
+  mockUseIsAuthed.mockReturnValue({ data: true, isLoading: false });
 });

 describe("Content", () => {
@@ -56,7 +81,7 @@ describe("Content", () => {
    });

    it("should render the existing settings values", async () => {
-      const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+      const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
      getSettingsSpy.mockResolvedValue({
        ...MOCK_DEFAULT_USER_SETTINGS,
        llm_model: "openai/gpt-4o",
@@ -84,7 +109,9 @@ describe("Content", () => {
      renderLlmSettingsScreen();
      await screen.findByTestId("llm-settings-screen");

-      const confirmation = screen.getByTestId("enable-confirmation-mode-switch");
+      const confirmation = screen.getByTestId(
+        "enable-confirmation-mode-switch",
+      );

      // Initially confirmation mode is false, so security analyzer should not be visible
      expect(confirmation).not.toBeChecked();
@@ -185,7 +212,7 @@ describe("Content", () => {
    });

    it("should render existing advanced settings correctly", async () => {
-      const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+      const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
      getSettingsSpy.mockResolvedValue({
        ...MOCK_DEFAULT_USER_SETTINGS,
        llm_model: "openai/gpt-4o",
@@ -230,7 +257,7 @@ describe("Content", () => {

 describe("Form submission", () => {
  it("should submit the basic form with the correct values", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

    renderLlmSettingsScreen();
    await screen.findByTestId("llm-settings-screen");
@@ -266,7 +293,7 @@ describe("Form submission", () => {
  });

  it("should submit the advanced form with the correct values", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

    renderLlmSettingsScreen();
    await screen.findByTestId("llm-settings-screen");
@@ -310,7 +337,9 @@ describe("Form submission", () => {
    // select security analyzer
    const securityAnalyzer = screen.getByTestId("security-analyzer-input");
    await userEvent.click(securityAnalyzer);
-    const securityAnalyzerOption = screen.getByText("SETTINGS$SECURITY_ANALYZER_NONE");
+    const securityAnalyzerOption = screen.getByText(
+      "SETTINGS$SECURITY_ANALYZER_NONE",
+    );
    await userEvent.click(securityAnalyzerOption);

    const submitButton = screen.getByTestId("submit-button");
@@ -329,7 +358,7 @@ describe("Form submission", () => {
  });

  it("should disable the button if there are no changes in the basic form", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      llm_model: "openai/gpt-4o",
@@ -372,7 +401,7 @@ describe("Form submission", () => {
  });

  it("should disable the button if there are no changes in the advanced form", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      llm_model: "openai/gpt-4o",
@@ -392,10 +421,14 @@ describe("Form submission", () => {
    const baseUrl = await screen.findByTestId("base-url-input");
    const apiKey = await screen.findByTestId("llm-api-key-input");
    const agent = await screen.findByTestId("agent-input");
-    const condensor = await screen.findByTestId("enable-memory-condenser-switch");
+    const condensor = await screen.findByTestId(
+      "enable-memory-condenser-switch",
+    );

    // Confirmation mode switch is now in basic settings, always visible
-    const confirmation = await screen.findByTestId("enable-confirmation-mode-switch");
+    const confirmation = await screen.findByTestId(
+      "enable-confirmation-mode-switch",
+    );

    // enter custom model
    await userEvent.type(model, "-mini");
@@ -468,9 +501,13 @@ describe("Form submission", () => {
    expect(submitButton).toBeDisabled();

    // select security analyzer
-    const securityAnalyzer = await screen.findByTestId("security-analyzer-input");
+    const securityAnalyzer = await screen.findByTestId(
+      "security-analyzer-input",
+    );
    await userEvent.click(securityAnalyzer);
-    const securityAnalyzerOption = screen.getByText("SETTINGS$SECURITY_ANALYZER_NONE");
+    const securityAnalyzerOption = screen.getByText(
+      "SETTINGS$SECURITY_ANALYZER_NONE",
+    );
    await userEvent.click(securityAnalyzerOption);
    expect(securityAnalyzer).toHaveValue("SETTINGS$SECURITY_ANALYZER_NONE");

@@ -478,9 +515,13 @@ describe("Form submission", () => {

    // revert back to original value
    await userEvent.click(securityAnalyzer);
-    const originalSecurityAnalyzerOption = screen.getByText("SETTINGS$SECURITY_ANALYZER_LLM_DEFAULT");
+    const originalSecurityAnalyzerOption = screen.getByText(
+      "SETTINGS$SECURITY_ANALYZER_LLM_DEFAULT",
+    );
    await userEvent.click(originalSecurityAnalyzerOption);
-    expect(securityAnalyzer).toHaveValue("SETTINGS$SECURITY_ANALYZER_LLM_DEFAULT");
+    expect(securityAnalyzer).toHaveValue(
+      "SETTINGS$SECURITY_ANALYZER_LLM_DEFAULT",
+    );
    expect(submitButton).toBeDisabled();
  });

@@ -512,7 +553,7 @@ describe("Form submission", () => {

  // flaky test
  it.skip("should disable the button when submitting changes", async () => {
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

    renderLlmSettingsScreen();
    await screen.findByTestId("llm-settings-screen");
@@ -539,7 +580,7 @@ describe("Form submission", () => {
  });

  it("should clear advanced settings when saving basic settings", async () => {
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    getSettingsSpy.mockResolvedValue({
      ...MOCK_DEFAULT_USER_SETTINGS,
      llm_model: "openai/gpt-4o",
@@ -547,7 +588,7 @@ describe("Form submission", () => {
      llm_api_key_set: true,
      confirmation_mode: true,
    });
-    const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+    const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
    renderLlmSettingsScreen();

    await screen.findByTestId("llm-settings-screen");
@@ -583,7 +624,7 @@ describe("Form submission", () => {
 describe("Status toasts", () => {
  describe("Basic form", () => {
    it("should call displaySuccessToast when the settings are saved", async () => {
-      const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

      const displaySuccessToastSpy = vi.spyOn(
        ToastHandlers,
@@ -604,7 +645,7 @@ describe("Status toasts", () => {
    });

    it("should call displayErrorToast when the settings fail to save", async () => {
-      const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

      const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");

@@ -626,7 +667,7 @@ describe("Status toasts", () => {

  describe("Advanced form", () => {
    it("should call displaySuccessToast when the settings are saved", async () => {
-      const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

      const displaySuccessToastSpy = vi.spyOn(
        ToastHandlers,
@@ -652,7 +693,7 @@ describe("Status toasts", () => {
    });

    it("should call displayErrorToast when the settings fail to save", async () => {
-      const saveSettingsSpy = vi.spyOn(OpenHands, "saveSettings");
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");

      const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");

@@ -679,58 +720,401 @@ describe("Status toasts", () => {
 });

 describe("SaaS mode", () => {
-  it("should not render the runtime settings input in oss mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    // @ts-expect-error - only return mode
-    getConfigSpy.mockResolvedValue({
-      APP_MODE: "oss",
+  describe("SaaS subscription", () => {
+    // Common mock configurations
+    const MOCK_SAAS_CONFIG = {
+      APP_MODE: "saas" as const,
+      GITHUB_CLIENT_ID: "fake-github-client-id",
+      POSTHOG_CLIENT_KEY: "fake-posthog-client-key",
+      FEATURE_FLAGS: {
+        ENABLE_BILLING: true,
+        HIDE_LLM_SETTINGS: false,
+        ENABLE_JIRA: false,
+        ENABLE_JIRA_DC: false,
+        ENABLE_LINEAR: false,
+      },
+    };
+
+    const MOCK_ACTIVE_SUBSCRIPTION = {
+      start_at: "2024-01-01",
+      end_at: "2024-12-31",
+      created_at: "2024-01-01",
+    };
+
+    it("should show upgrade banner and prevent all interactions for unsubscribed SaaS users", async () => {
+      // Mock SaaS mode without subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access to return null (no subscription)
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);
+
+      // Mock saveSettings to ensure it's not called
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Should show upgrade banner
+      expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
+
+      // Should have a clickable upgrade button
+      const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
+      expect(upgradeButton).toBeInTheDocument();
+      expect(upgradeButton).not.toBeDisabled();
+
+      // Form should be disabled
+      const form = screen.getByTestId("llm-settings-form-basic");
+      expect(form).toHaveAttribute("aria-disabled", "true");
+
+      // All form inputs should be disabled or non-interactive
+      const providerInput = screen.getByTestId("llm-provider-input");
+      const modelInput = screen.getByTestId("llm-model-input");
+      const apiKeyInput = screen.getByTestId("llm-api-key-input");
+      const advancedSwitch = screen.getByTestId("advanced-settings-switch");
+      const confirmationModeSwitch = screen.getByTestId(
+        "enable-confirmation-mode-switch",
+      );
+      const submitButton = screen.getByTestId("submit-button");
+
+      // Inputs should be disabled
+      expect(providerInput).toBeDisabled();
+      expect(modelInput).toBeDisabled();
+      expect(apiKeyInput).toBeDisabled();
+      expect(advancedSwitch).toBeDisabled();
+      expect(confirmationModeSwitch).toBeDisabled();
+      expect(submitButton).toBeDisabled();
+
+      // Try to interact with inputs - they should not respond
+      await userEvent.click(providerInput);
+      await userEvent.type(apiKeyInput, "test-key");
+
+      // Values should not change
+      expect(apiKeyInput).toHaveValue("");
+
+      // Try to submit form - should not call API
+      await userEvent.click(submitButton);
+      expect(saveSettingsSpy).not.toHaveBeenCalled();
    });

-    renderLlmSettingsScreen();
-    await screen.findByTestId("llm-settings-screen");
+    it("should call subscription checkout API when upgrade button is clicked", async () => {
+      // Mock SaaS mode without subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);

-    const advancedSwitch = screen.getByTestId("advanced-settings-switch");
-    await userEvent.click(advancedSwitch);
-    await screen.findByTestId("llm-settings-form-advanced");
+      // Mock subscription access to return null (no subscription)
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);

-    const runtimeSettingsInput = screen.queryByTestId("runtime-settings-input");
-    expect(runtimeSettingsInput).not.toBeInTheDocument();
-  });
+      // Mock the subscription checkout API call
+      const createSubscriptionCheckoutSessionSpy = vi.spyOn(
+        BillingService,
+        "createSubscriptionCheckoutSession",
+      );
+      createSubscriptionCheckoutSessionSpy.mockResolvedValue({});

-  it("should render the runtime settings input in saas mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    // @ts-expect-error - only return mode
-    getConfigSpy.mockResolvedValue({
-      APP_MODE: "saas",
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Click the upgrade button
+      const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
+      await userEvent.click(upgradeButton);
+
+      // Should call the subscription checkout API
+      expect(createSubscriptionCheckoutSessionSpy).toHaveBeenCalled();
    });

-    renderLlmSettingsScreen();
-    await screen.findByTestId("llm-settings-screen");
+    it("should disable upgrade button for unauthenticated users in SaaS mode", async () => {
+      // Mock SaaS mode without subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);

-    const advancedSwitch = screen.getByTestId("advanced-settings-switch");
-    await userEvent.click(advancedSwitch);
-    await screen.findByTestId("llm-settings-form-advanced");
+      // Mock subscription access to return null (no subscription)
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);

-    const runtimeSettingsInput = screen.queryByTestId("runtime-settings-input");
-    expect(runtimeSettingsInput).toBeInTheDocument();
-  });
+      // Mock subscription checkout API
+      const createSubscriptionCheckoutSessionSpy = vi.spyOn(
+        BillingService,
+        "createSubscriptionCheckoutSession",
+      );

-  it("should always render the runtime settings input as disabled", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    // @ts-expect-error - only return mode
-    getConfigSpy.mockResolvedValue({
-      APP_MODE: "saas",
+      // Mock authentication to return false (unauthenticated) from the start
+      mockUseIsAuthed.mockReturnValue({ data: false, isLoading: false });
+
+      // Mock settings to return default settings even when unauthenticated
+      // This is necessary because the useSettings hook is disabled when user is not authenticated
+      const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
+      getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);
+
+      renderLlmSettingsScreen();
+
+      // Wait for either the settings screen or skeleton to appear
+      await waitFor(() => {
+        const settingsScreen = screen.queryByTestId("llm-settings-screen");
+        const skeleton = screen.queryByTestId("app-settings-skeleton");
+        expect(settingsScreen || skeleton).toBeInTheDocument();
+      });
+
+      // If we get the skeleton, the test scenario isn't valid - skip the rest
+      if (screen.queryByTestId("app-settings-skeleton")) {
+        // For unauthenticated users, the settings don't load, so no upgrade banner is shown
+        // This is the expected behavior - unauthenticated users see a skeleton loading state
+        expect(screen.queryByTestId("upgrade-banner")).not.toBeInTheDocument();
+        return;
+      }
+
+      await screen.findByTestId("llm-settings-screen");
+
+      // Should show upgrade banner
+      expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
+
+      // Upgrade button should be disabled for unauthenticated users
+      const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
+      expect(upgradeButton).toBeInTheDocument();
+      expect(upgradeButton).toBeDisabled();
+
+      // Clicking disabled button should not call the API
+      await userEvent.click(upgradeButton);
+      expect(createSubscriptionCheckoutSessionSpy).not.toHaveBeenCalled();
    });

-    renderLlmSettingsScreen();
-    await screen.findByTestId("llm-settings-screen");
+    it("should not show upgrade banner and allow form interaction for subscribed SaaS users", async () => {
+      // Mock SaaS mode with subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);

-    const advancedSwitch = screen.getByTestId("advanced-settings-switch");
-    await userEvent.click(advancedSwitch);
-    await screen.findByTestId("llm-settings-form-advanced");
+      // Mock subscription access to return active subscription
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);

-    const runtimeSettingsInput = screen.queryByTestId("runtime-settings-input");
-    expect(runtimeSettingsInput).toBeInTheDocument();
-    expect(runtimeSettingsInput).toBeDisabled();
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Wait for subscription data to load
+      await waitFor(() => {
+        expect(getSubscriptionAccessSpy).toHaveBeenCalled();
+      });
+
+      // Should NOT show upgrade banner
+      expect(screen.queryByTestId("upgrade-banner")).not.toBeInTheDocument();
+
+      // Form should NOT be disabled
+      const form = screen.getByTestId("llm-settings-form-basic");
+      expect(form).not.toHaveAttribute("aria-disabled", "true");
+    });
+
+    it("should not call save settings API when making changes in disabled form for unsubscribed users", async () => {
+      // Mock SaaS mode without subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access to return null (no subscription)
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);
+
+      // Mock saveSettings to track calls
+      const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
+
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Verify that form elements are disabled for unsubscribed users
+      const confirmationModeSwitch = screen.getByTestId(
+        "enable-confirmation-mode-switch",
+      );
+      const submitButton = screen.getByTestId("submit-button");
+
+      expect(confirmationModeSwitch).not.toBeChecked();
+      expect(confirmationModeSwitch).toBeDisabled();
+      expect(submitButton).toBeDisabled();
+
+      // Try to click the disabled confirmation mode switch - it should not change state
+      await userEvent.click(confirmationModeSwitch);
+      expect(confirmationModeSwitch).not.toBeChecked(); // Should remain unchecked
+
+      // Try to submit the form - button should remain disabled
+      await userEvent.click(submitButton);
+
+      // Should NOT call save settings API for unsubscribed users
+      expect(saveSettingsSpy).not.toHaveBeenCalled();
+    });
+
+    it("should show backdrop overlay for unsubscribed users", async () => {
+      // Mock SaaS mode without subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access to return null (no subscription)
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);
+
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Wait for subscription data to load
+      await waitFor(() => {
+        expect(getSubscriptionAccessSpy).toHaveBeenCalled();
+      });
+
+      // Should show upgrade banner
+      expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
+
+      // Should show backdrop overlay
+      const backdrop = screen.getByTestId("settings-backdrop");
+      expect(backdrop).toBeInTheDocument();
+    });
+
+    it("should not show backdrop overlay for subscribed users", async () => {
+      // Mock SaaS mode with subscription
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access to return active subscription
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
+
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Wait for subscription data to load
+      await waitFor(() => {
+        expect(getSubscriptionAccessSpy).toHaveBeenCalled();
+      });
+
+      // Should NOT show backdrop overlay
+      expect(screen.queryByTestId("settings-backdrop")).not.toBeInTheDocument();
+    });
+
+    it("should display success toast when redirected back with ?checkout=success parameter", async () => {
+      // Mock SaaS mode
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
+
+      // Mock toast handler
+      const displaySuccessToastSpy = vi.spyOn(
+        ToastHandlers,
+        "displaySuccessToast",
+      );
+
+      // Mock URL search params with ?checkout=success
+      mockUseSearchParams.mockReturnValue([
+        {
+          get: (param: string) => (param === "checkout" ? "success" : null),
+        },
+        vi.fn(),
+      ]);
+
+      // Render component with checkout=success parameter
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Verify success toast is displayed with correct message
+      expect(displaySuccessToastSpy).toHaveBeenCalledWith(
+        "SUBSCRIPTION$SUCCESS",
+      );
+    });
+
+    it("should display error toast when redirected back with ?checkout=cancel parameter", async () => {
+      // Mock SaaS mode
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
+
+      // Mock toast handler
+      const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");
+
+      // Mock URL search params with ?checkout=cancel
+      mockUseSearchParams.mockReturnValue([
+        {
+          get: (param: string) => (param === "checkout" ? "cancel" : null),
+        },
+        vi.fn(),
+      ]);
+
+      // Render component with checkout=cancel parameter
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Verify error toast is displayed with correct message
+      expect(displayErrorToastSpy).toHaveBeenCalledWith("SUBSCRIPTION$FAILURE");
+    });
+
+    it("should show upgrade banner when subscription is expired or disabled", async () => {
+      // Mock SaaS mode
+      const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+      getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
+
+      // Mock subscription access to return null (expired/disabled subscriptions return null from backend)
+      // The backend only returns active subscriptions within their validity period
+      const getSubscriptionAccessSpy = vi.spyOn(
+        BillingService,
+        "getSubscriptionAccess",
+      );
+      getSubscriptionAccessSpy.mockResolvedValue(null);
+
+      renderLlmSettingsScreen();
+      await screen.findByTestId("llm-settings-screen");
+
+      // Wait for subscription data to load
+      await waitFor(() => {
+        expect(getSubscriptionAccessSpy).toHaveBeenCalled();
+      });
+
+      // Should show upgrade banner for expired/disabled subscriptions (when API returns null)
+      expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
+
+      // Form should be disabled
+      const form = screen.getByTestId("llm-settings-form-basic");
+      expect(form).toHaveAttribute("aria-disabled", "true");
+
+      // All form inputs should be disabled
+      const providerInput = screen.getByTestId("llm-provider-input");
+      const modelInput = screen.getByTestId("llm-model-input");
+      const apiKeyInput = screen.getByTestId("llm-api-key-input");
+      const confirmationModeSwitch = screen.getByTestId(
+        "enable-confirmation-mode-switch",
+      );
+
+      expect(providerInput).toBeDisabled();
+      expect(modelInput).toBeDisabled();
+      expect(apiKeyInput).toBeDisabled();
+      expect(confirmationModeSwitch).toBeDisabled();
+    });
  });
 });
@@ -6,7 +6,8 @@ import { createRoutesStub, Outlet } from "react-router";
 import SecretsSettingsScreen from "#/routes/secrets-settings";
 import { SecretsService } from "#/api/secrets-service";
 import { GetSecretsResponse } from "#/api/secrets-service.types";
-import OpenHands from "#/api/open-hands";
+import SettingsService from "#/settings-service/settings-service.api";
+import OptionService from "#/api/option-service/option-service.api";
 import { MOCK_DEFAULT_USER_SETTINGS } from "#/mocks/handlers";

 const MOCK_GET_SECRETS_RESPONSE: GetSecretsResponse["custom_secrets"] = [
@@ -53,7 +54,7 @@ const renderSecretsSettings = () =>
  });

 beforeEach(() => {
-  const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+  const getConfigSpy = vi.spyOn(OptionService, "getConfig");
  // @ts-expect-error - only return the config we need
  getConfigSpy.mockResolvedValue({
    APP_MODE: "oss",
@@ -67,8 +68,8 @@ describe("Content", () => {
  });

  it("should NOT render a button to connect with git if they havent already in oss", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
-    const getSettingsSpy = vi.spyOn(OpenHands, "getSettings");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
+    const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
    const getSecretsSpy = vi.spyOn(SecretsService, "getSecrets");
    // @ts-expect-error - only return the config we need
    getConfigSpy.mockResolvedValue({
@@ -87,7 +88,7 @@ describe("Content", () => {
  });

  it("should render add secret button in saas mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    const getSecretsSpy = vi.spyOn(SecretsService, "getSecrets");
    // @ts-expect-error - only return the config we need
    getConfigSpy.mockResolvedValue({
@@ -476,7 +477,9 @@ describe("Secret actions", () => {

    // make POST request
    expect(createSecretSpy).not.toHaveBeenCalled();
-    expect(screen.queryByText("SECRETS$SECRET_ALREADY_EXISTS")).toBeInTheDocument();
+    expect(
+      screen.queryByText("SECRETS$SECRET_ALREADY_EXISTS"),
+    ).toBeInTheDocument();

    await userEvent.clear(nameInput);
    await userEvent.type(nameInput, "My_Custom_Secret");
@@ -560,7 +563,9 @@ describe("Secret actions", () => {

    // make POST request
    expect(createSecretSpy).not.toHaveBeenCalled();
-    expect(screen.queryByText("SECRETS$SECRET_ALREADY_EXISTS")).toBeInTheDocument();
+    expect(
+      screen.queryByText("SECRETS$SECRET_ALREADY_EXISTS"),
+    ).toBeInTheDocument();

    expect(nameInput).toHaveValue(MOCK_GET_SECRETS_RESPONSE[0].name);
    expect(valueInput).toHaveValue("my-custom-secret-value");
@@ -3,14 +3,14 @@ import userEvent from "@testing-library/user-event";
 import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
 import { createRoutesStub } from "react-router";
 import { renderWithProviders } from "test-utils";
-import OpenHands from "#/api/open-hands";
 import SettingsScreen from "#/routes/settings";
 import { PaymentForm } from "#/components/features/payment/payment-form";
-import * as useSettingsModule from "#/hooks/query/use-settings";

 // Mock the useSettings hook
 vi.mock("#/hooks/query/use-settings", async () => {
-  const actual = await vi.importActual<typeof import("#/hooks/query/use-settings")>("#/hooks/query/use-settings");
+  const actual = await vi.importActual<
+    typeof import("#/hooks/query/use-settings")
+  >("#/hooks/query/use-settings");
  return {
    ...actual,
    useSettings: vi.fn().mockReturnValue({
@@ -24,21 +24,23 @@ vi.mock("#/hooks/query/use-settings", async () => {

 // Mock the i18next hook
 vi.mock("react-i18next", async () => {
-  const actual = await vi.importActual<typeof import("react-i18next")>("react-i18next");
+  const actual =
+    await vi.importActual<typeof import("react-i18next")>("react-i18next");
  return {
    ...actual,
    useTranslation: () => ({
      t: (key: string) => {
        const translations: Record<string, string> = {
-          "SETTINGS$NAV_INTEGRATIONS": "Integrations",
-          "SETTINGS$NAV_APPLICATION": "Application",
-          "SETTINGS$NAV_CREDITS": "Credits",
-          "SETTINGS$NAV_API_KEYS": "API Keys",
-          "SETTINGS$NAV_LLM": "LLM",
-          "SETTINGS$NAV_USER": "User",
-          "SETTINGS$NAV_SECRETS": "Secrets",
-          "SETTINGS$NAV_MCP": "MCP",
-          "SETTINGS$TITLE": "Settings"
+          SETTINGS$NAV_INTEGRATIONS: "Integrations",
+          SETTINGS$NAV_APPLICATION: "Application",
+          SETTINGS$NAV_CREDITS: "Credits",
+          SETTINGS$NAV_BILLING: "Billing",
+          SETTINGS$NAV_API_KEYS: "API Keys",
+          SETTINGS$NAV_LLM: "LLM",
+          SETTINGS$NAV_USER: "User",
+          SETTINGS$NAV_SECRETS: "Secrets",
+          SETTINGS$NAV_MCP: "MCP",
+          SETTINGS$TITLE: "Settings",
        };
        return translations[key] || key;
      },
@@ -105,16 +107,16 @@ describe("Settings Billing", () => {
    vi.clearAllMocks();
  });

-  it("should not render the credits tab if OSS mode", async () => {
+  it("should not render the billing tab if OSS mode", async () => {
    // OSS mode is set by default in beforeEach
    renderSettingsScreen();

    const navbar = await screen.findByTestId("settings-navbar");
-    const credits = within(navbar).queryByText("Credits");
+    const credits = within(navbar).queryByText("Billing");
    expect(credits).not.toBeInTheDocument();
  });

-  it("should render the credits tab if SaaS mode and billing is enabled", async () => {
+  it("should render the billing tab if SaaS mode and billing is enabled", async () => {
    mockUseConfig.mockReturnValue({
      data: {
        APP_MODE: "saas",
@@ -134,10 +136,10 @@ describe("Settings Billing", () => {
    renderSettingsScreen();

    const navbar = await screen.findByTestId("settings-navbar");
-    within(navbar).getByText("Credits");
+    within(navbar).getByText("Billing");
  });

-  it("should render the billing settings if clicking the credits item", async () => {
+  it("should render the billing settings if clicking the billing item", async () => {
    const user = userEvent.setup();
    mockUseConfig.mockReturnValue({
      data: {
@@ -158,7 +160,7 @@ describe("Settings Billing", () => {
    renderSettingsScreen();

    const navbar = await screen.findByTestId("settings-navbar");
-    const credits = within(navbar).getByText("Credits");
+    const credits = within(navbar).getByText("Billing");
    await user.click(credits);

    const billingSection = await screen.findByTestId("billing-settings");
@@ -3,7 +3,7 @@ import { createRoutesStub } from "react-router";
 import { describe, expect, it, vi } from "vitest";
 import { QueryClientProvider } from "@tanstack/react-query";
 import SettingsScreen, { clientLoader } from "#/routes/settings";
-import OpenHands from "#/api/open-hands";
+import OptionService from "#/api/option-service/option-service.api";

 // Mock the i18next hook
 vi.mock("react-i18next", async () => {
@@ -93,7 +93,7 @@ describe("Settings Screen", () => {
  it("should render the navbar", async () => {
    const sectionsToInclude = ["llm", "integrations", "application", "secrets"];
    const sectionsToExclude = ["api keys", "credits", "billing"];
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - only return app mode
    getConfigSpy.mockResolvedValue({
      APP_MODE: "oss",
@@ -129,14 +129,15 @@ describe("Settings Screen", () => {
    mockQueryClient.setQueryData(["config"], saasConfig);

    const sectionsToInclude = [
+      "llm", // LLM settings are now always shown in SaaS mode
      "user",
      "integrations",
      "application",
-      "credits", // The nav item shows "credits" text but routes to /billing
+      "billing", // The nav item shows "billing" text and routes to /billing
      "secrets",
      "api keys",
    ];
-    const sectionsToExclude = ["llm"];
+    const sectionsToExclude: string[] = []; // No sections are excluded in SaaS mode now

    renderSettingsScreen();

@@ -156,7 +157,7 @@ describe("Settings Screen", () => {
  });

  it("should not be able to access saas-only routes in oss mode", async () => {
-    const getConfigSpy = vi.spyOn(OpenHands, "getConfig");
+    const getConfigSpy = vi.spyOn(OptionService, "getConfig");
    // @ts-expect-error - only return app mode
    getConfigSpy.mockResolvedValue({
      APP_MODE: "oss",
@@ -13,14 +13,26 @@ vi.mock("#/store", () => ({
  },
 }));

-vi.mock("#/state/command-slice", () => ({
-  appendInput: mockAppendInput,
+vi.mock("#/state/command-store", () => ({
+  useCommandStore: {
+    getState: () => ({
+      appendInput: mockAppendInput,
+    }),
+  },
 }));

 vi.mock("#/state/jupyter-slice", () => ({
  appendJupyterInput: mockAppendJupyterInput,
 }));

+vi.mock("#/state/metrics-slice", () => ({
+  setMetrics: vi.fn(),
+}));
+
+vi.mock("#/state/security-analyzer-slice", () => ({
+  appendSecurityAnalyzerInput: vi.fn(),
+}));
+
 describe("handleActionMessage", () => {
  beforeEach(() => {
    // Clear all mocks before each test
@@ -45,7 +57,8 @@ describe("handleActionMessage", () => {
    handleActionMessage(runAction);

    // Check that appendInput was called with the command
-    expect(mockDispatch).toHaveBeenCalledWith(mockAppendInput("ls -la"));
+    expect(mockAppendInput).toHaveBeenCalledWith("ls -la");
+    expect(mockDispatch).not.toHaveBeenCalled();
    expect(mockAppendJupyterInput).not.toHaveBeenCalled();
  });

@@ -59,7 +72,8 @@ describe("handleActionMessage", () => {
      args: {
        code: "print('Hello from Jupyter!')",
      },
-      message: "Running Python code interactively: print('Hello from Jupyter!')",
+      message:
+        "Running Python code interactively: print('Hello from Jupyter!')",
      timestamp: "2023-01-01T00:00:00Z",
    };

@@ -67,7 +81,9 @@ describe("handleActionMessage", () => {
    handleActionMessage(ipythonAction);

    // Check that appendJupyterInput was called with the code
-    expect(mockDispatch).toHaveBeenCalledWith(mockAppendJupyterInput("print('Hello from Jupyter!')"));
+    expect(mockDispatch).toHaveBeenCalledWith(
+      mockAppendJupyterInput("print('Hello from Jupyter!')"),
+    );
    expect(mockAppendInput).not.toHaveBeenCalled();
  });

@@ -89,7 +105,9 @@ describe("handleActionMessage", () => {
    // Handle the action
    handleActionMessage(hiddenAction);

-    // Check that nothing was dispatched
+    // Check that nothing was dispatched or called
    expect(mockDispatch).not.toHaveBeenCalled();
+    expect(mockAppendInput).not.toHaveBeenCalled();
+    expect(mockAppendJupyterInput).not.toHaveBeenCalled();
  });
 });
@@ -1,51 +0,0 @@
-import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
-
-import { browserTab } from "#/utils/browser-tab";
-
-// These tests exercise the browser-tab notification flasher behavior.
-// Specifically we verify that when the document title changes externally
-// while a notification is active, the flasher updates its internal
-// baseline so it restores/toggles to the new title instead of an old one.
-
-describe("browserTab notifications", () => {
-  const MESSAGE = "Agent ready";
-  const INITIAL = "Conversation 123 | OpenHands";
-  const RENAMED = "My renamed title | OpenHands";
-
-  beforeEach(() => {
-    vi.useFakeTimers();
-    // reset title for each test
-    document.title = INITIAL;
-  });
-
-  afterEach(() => {
-    browserTab.stopNotification();
-    vi.runOnlyPendingTimers();
-    vi.useRealTimers();
-  });
-
-  it("updates baseline when title changes during an active notification and restores to the new title", () => {
-    // Start flashing
-    browserTab.startNotification(MESSAGE);
-
-    // Tick once: should switch to the message
-    vi.advanceTimersByTime(1000);
-    expect(document.title).toBe(MESSAGE);
-
-    // Simulate an external rename while flashing (e.g., user edits title)
-    document.title = RENAMED;
-
-    // Next tick: flasher observes the external change and updates baseline
-    vi.advanceTimersByTime(1000);
-    // On this tick, we toggle back to the message
-    expect(document.title).toBe(MESSAGE);
-
-    // Next tick should toggle to the updated baseline (renamed title)
-    vi.advanceTimersByTime(1000);
-    expect(document.title).toBe(RENAMED);
-
-    // Stop flashing: title should remain the updated baseline
-    browserTab.stopNotification();
-    expect(document.title).toBe(RENAMED);
-  });
-});
@@ -1,9 +0,0 @@
-import { test, expect } from "vitest";
-import { formatMs } from "../../src/utils/format-ms";
-
-test("formatMs", () => {
-  expect(formatMs(1000)).toBe("00:01");
-  expect(formatMs(1000 * 60)).toBe("01:00");
-  expect(formatMs(1000 * 60 * 2.5)).toBe("02:30");
-  expect(formatMs(1000 * 60 * 12)).toBe("12:00");
-});
@@ -1,29 +0,0 @@
-import { ReactNode } from "react";
-import { I18nextProvider } from "react-i18next";
-
-const mockI18n = {
-  language: "ja",
-  t: (key: string) => {
-    const translations: Record<string, string> = {
-      "SUGGESTIONS$TODO_APP": "ToDoリストアプリを開発する",
-      "LANDING$BUILD_APP_BUTTON": "プルリクエストを表示するアプリを開発する",
-      "SUGGESTIONS$HACKER_NEWS": "Hacker Newsのトップ記事を表示するbashスクリプトを作成する",
-      "LANDING$TITLE": "一緒に開発を始めましょう！",
-      "OPEN_IN_VSCODE": "VS Codeで開く",
-      "INCREASE_TEST_COVERAGE": "テストカバレッジを向上",
-      "AUTO_MERGE_PRS": "PRを自動マージ",
-      "FIX_README": "READMEを修正",
-      "CLEAN_DEPENDENCIES": "依存関係を整理"
-    };
-    return translations[key] || key;
-  },
-  exists: () => true,
-  changeLanguage: () => new Promise(() => {}),
-  use: () => mockI18n,
-};
-
-export function I18nTestProvider({ children }: { children: ReactNode }) {
-  return (
-    <I18nextProvider i18n={mockI18n as any}>{children}</I18nextProvider>
-  );
-}
@@ -1,20 +0,0 @@
-import { expect, test } from "vitest";
-import { parseGithubUrl } from "../../src/utils/parse-github-url";
-
-test("parseGithubUrl", () => {
-  expect(
-    parseGithubUrl("https://github.com/alexreardon/tiny-invariant"),
-  ).toEqual(["alexreardon", "tiny-invariant"]);
-
-  expect(parseGithubUrl("https://github.com/All-Hands-AI/OpenHands")).toEqual([
-    "All-Hands-AI",
-    "OpenHands",
-  ]);
-
-  expect(parseGithubUrl("https://github.com/All-Hands-AI/")).toEqual([
-    "All-Hands-AI",
-    "",
-  ]);
-
-  expect(parseGithubUrl("https://github.com/")).toEqual([]);
-});
@@ -1,12 +1,12 @@
 {
  "name": "openhands-frontend",
-  "version": "0.56.0",
+  "version": "0.57.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "openhands-frontend",
-      "version": "0.56.0",
+      "version": "0.57.0",
      "dependencies": {
        "@heroui/react": "^2.8.3",
        "@heroui/use-infinite-scroll": "^2.2.11",
@@ -59,7 +59,8 @@
        "tailwind-scrollbar": "^4.0.2",
        "vite": "^7.1.4",
        "web-vitals": "^5.1.0",
-        "ws": "^8.18.2"
+        "ws": "^8.18.2",
+        "zustand": "^5.0.8"
      },
      "devDependencies": {
        "@babel/parser": "^7.28.3",
@@ -18326,6 +18327,35 @@
      "dev": true,
      "license": "MIT"
    },
+    "node_modules/zustand": {
+      "version": "5.0.8",
+      "resolved": "https://registry.npmjs.org/zustand/-/zustand-5.0.8.tgz",
+      "integrity": "sha512-gyPKpIaxY9XcO2vSMrLbiER7QMAMGOQZVRdJ6Zi782jkbzZygq5GI9nG8g+sMgitRtndwaBSl7uiqC49o1SSiw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=12.20.0"
+      },
+      "peerDependencies": {
+        "@types/react": ">=18.0.0",
+        "immer": ">=9.0.6",
+        "react": ">=18.0.0",
+        "use-sync-external-store": ">=1.2.0"
+      },
+      "peerDependenciesMeta": {
+        "@types/react": {
+          "optional": true
+        },
+        "immer": {
+          "optional": true
+        },
+        "react": {
+          "optional": true
+        },
+        "use-sync-external-store": {
+          "optional": true
+        }
+      }
+    },
    "node_modules/zwitch": {
      "version": "2.0.4",
      "resolved": "https://registry.npmjs.org/zwitch/-/zwitch-2.0.4.tgz",
@@ -1,6 +1,6 @@
 {
  "name": "openhands-frontend",
-  "version": "0.56.0",
+  "version": "0.57.0",
  "private": true,
  "type": "module",
  "engines": {
@@ -58,7 +58,8 @@
    "tailwind-scrollbar": "^4.0.2",
    "vite": "^7.1.4",
    "web-vitals": "^5.1.0",
-    "ws": "^8.18.2"
+    "ws": "^8.18.2",
+    "zustand": "^5.0.8"
  },
  "scripts": {
    "dev": "npm run make-i18n && cross-env VITE_MOCK_API=false react-router dev",
@@ -77,12 +78,19 @@
    "lint:fix": "eslint src --ext .ts,.tsx,.js --fix && prettier --write src/**/*.{ts,tsx}",
    "prepare": "cd .. && husky frontend/.husky",
    "typecheck": "react-router typegen && tsc",
+    "typecheck:staged": "react-router typegen && npx tsc --noEmit --skipLibCheck",
    "check-translation-completeness": "node scripts/check-translation-completeness.cjs"
  },
  "lint-staged": {
    "src/**/*.{ts,tsx,js}": [
      "eslint --fix",
      "prettier --write"
+    ],
+    "src/**/*.{ts,tsx}": [
+      "bash -c 'npm run typecheck:staged'"
+    ],
+    "src/**/*": [
+      "npm run check-translation-completeness"
    ]
  },
  "devDependencies": {
@@ -0,0 +1,52 @@
+import { openHands } from "../open-hands-axios";
+import { AuthenticateResponse, GitHubAccessTokenResponse } from "./auth.types";
+import { GetConfigResponse } from "../option-service/option.types";
+
+/**
+ * Authentication service for handling all authentication-related API calls
+ */
+class AuthService {
+  /**
+   * Authenticate with GitHub token
+   * @param appMode The application mode (saas or oss)
+   * @returns Response with authentication status and user info if successful
+   */
+  static async authenticate(
+    appMode: GetConfigResponse["APP_MODE"],
+  ): Promise<boolean> {
+    if (appMode === "oss") return true;
+
+    // Just make the request, if it succeeds (no exception thrown), return true
+    await openHands.post<AuthenticateResponse>("/api/authenticate");
+    return true;
+  }
+
+  /**
+   * Get GitHub access token from Keycloak callback
+   * @param code Code provided by GitHub
+   * @returns GitHub access token
+   */
+  static async getGitHubAccessToken(
+    code: string,
+  ): Promise<GitHubAccessTokenResponse> {
+    const { data } = await openHands.post<GitHubAccessTokenResponse>(
+      "/api/keycloak/callback",
+      {
+        code,
+      },
+    );
+    return data;
+  }
+
+  /**
+   * Logout user from the application
+   * @param appMode The application mode (saas or oss)
+   */
+  static async logout(appMode: GetConfigResponse["APP_MODE"]): Promise<void> {
+    const endpoint =
+      appMode === "saas" ? "/api/logout" : "/api/unset-provider-tokens";
+    await openHands.post(endpoint);
+  }
+}
+
+export default AuthService;
@@ -0,0 +1,8 @@
+export interface AuthenticateResponse {
+  message?: string;
+  error?: string;
+}
+
+export interface GitHubAccessTokenResponse {
+  access_token: string;
+}
@@ -0,0 +1,84 @@
+import { openHands } from "../open-hands-axios";
+import {
+  CancelSubscriptionResponse,
+  SubscriptionAccess,
+} from "./billing.types";
+
+/**
+ * Billing Service API - Handles all billing-related API endpoints
+ */
+class BillingService {
+  /**
+   * Create a Stripe checkout session for credit purchase
+   * @param amount The amount to charge in dollars
+   * @returns The redirect URL for the checkout session
+   */
+  static async createCheckoutSession(amount: number): Promise<string> {
+    const { data } = await openHands.post(
+      "/api/billing/create-checkout-session",
+      {
+        amount,
+      },
+    );
+    return data.redirect_url;
+  }
+
+  /**
+   * Create a customer setup session for payment method management
+   * @returns The redirect URL for the customer setup session
+   */
+  static async createBillingSessionResponse(): Promise<string> {
+    const { data } = await openHands.post(
+      "/api/billing/create-customer-setup-session",
+    );
+    return data.redirect_url;
+  }
+
+  /**
+   * Get the user's current credit balance
+   * @returns The user's credit balance as a string
+   */
+  static async getBalance(): Promise<string> {
+    const { data } = await openHands.get<{ credits: string }>(
+      "/api/billing/credits",
+    );
+    return data.credits;
+  }
+
+  /**
+   * Get the user's subscription access information
+   * @returns The user's subscription access details or null if not available
+   */
+  static async getSubscriptionAccess(): Promise<SubscriptionAccess | null> {
+    const { data } = await openHands.get<SubscriptionAccess | null>(
+      "/api/billing/subscription-access",
+    );
+    return data;
+  }
+
+  /**
+   * Create a subscription checkout session for subscribing to a plan
+   * @returns The redirect URL for the subscription checkout session
+   */
+  static async createSubscriptionCheckoutSession(): Promise<{
+    redirect_url?: string;
+  }> {
+    const { data } = await openHands.post(
+      "/api/billing/subscription-checkout-session",
+    );
+    return data;
+  }
+
+  /**
+   * Cancel the user's subscription
+   * @returns The response indicating the result of the cancellation request
+   */
+  static async cancelSubscription(): Promise<CancelSubscriptionResponse> {
+    const { data } = await openHands.post<CancelSubscriptionResponse>(
+      "/api/billing/cancel-subscription",
+    );
+    return data;
+  }
+}
+
+export default BillingService;
@@ -0,0 +1,12 @@
+export type SubscriptionAccess = {
+  start_at: string;
+  end_at: string;
+  created_at: string;
+  cancelled_at?: string | null;
+  stripe_subscription_id?: string | null;
+};
+
+export interface CancelSubscriptionResponse {
+  status: string;
+  message: string;
+}
@@ -2,38 +2,23 @@ import { AxiosHeaders } from "axios";
 import {
  Feedback,
  FeedbackResponse,
-  GitHubAccessTokenResponse,
-  GetConfigResponse,
  GetVSCodeUrlResponse,
-  AuthenticateResponse,
  Conversation,
  ResultSet,
  GetTrajectoryResponse,
-  GitChangeDiff,
-  GitChange,
  GetMicroagentsResponse,
  GetMicroagentPromptResponse,
  CreateMicroagent,
-  MicroagentContentResponse,
  FileUploadSuccessResponse,
  GetFilesResponse,
  GetFileResponse,
-} from "./open-hands.types";
-import { openHands } from "./open-hands-axios";
-import { ApiSettings, PostApiSettings, Provider } from "#/types/settings";
+} from "../open-hands.types";
+import { openHands } from "../open-hands-axios";
+import { Provider } from "#/types/settings";
 import { SuggestedTask } from "#/utils/types";
-import {
-  GitUser,
-  GitRepository,
-  PaginatedBranchesResponse,
-  Branch,
-} from "#/types/git";
-import { extractNextPageFromLink } from "#/utils/extract-next-page-from-link";
-import { RepositoryMicroagent } from "#/types/microagent-management";
 import { BatchFeedbackData } from "#/hooks/query/use-batch-feedback";
-import { SubscriptionAccess } from "#/types/billing";

-class OpenHands {
+class ConversationService {
  private static currentConversation: Conversation | null = null;

  /**
@@ -66,42 +51,6 @@ class OpenHands {
    return `/api/conversations/${conversationId}`;
  }

-  /**
-   * Retrieve the list of models available
-   * @returns List of models available
-   */
-  static async getModels(): Promise<string[]> {
-    const { data } = await openHands.get<string[]>("/api/options/models");
-    return data;
-  }
-
-  /**
-   * Retrieve the list of agents available
-   * @returns List of agents available
-   */
-  static async getAgents(): Promise<string[]> {
-    const { data } = await openHands.get<string[]>("/api/options/agents");
-    return data;
-  }
-
-  /**
-   * Retrieve the list of security analyzers available
-   * @returns List of security analyzers available
-   */
-  static async getSecurityAnalyzers(): Promise<string[]> {
-    const { data } = await openHands.get<string[]>(
-      "/api/options/security-analyzers",
-    );
-    return data;
-  }
-
-  static async getConfig(): Promise<GetConfigResponse> {
-    const { data } = await openHands.get<GetConfigResponse>(
-      "/api/options/config",
-    );
-    return data;
-  }
-
  static getConversationHeaders(): AxiosHeaders {
    const headers = new AxiosHeaders();
    const sessionApiKey = this.currentConversation?.session_api_key;
@@ -210,20 +159,6 @@ class OpenHands {
    return data;
  }

-  /**
-   * Authenticate with GitHub token
-   * @returns Response with authentication status and user info if successful
-   */
-  static async authenticate(
-    appMode: GetConfigResponse["APP_MODE"],
-  ): Promise<boolean> {
-    if (appMode === "oss") return true;
-
-    // Just make the request, if it succeeds (no exception thrown), return true
-    await openHands.post<AuthenticateResponse>("/api/authenticate");
-    return true;
-  }
-
  /**
   * Get the blob of the workspace zip
   * @returns Blob of the workspace zip
@@ -249,22 +184,6 @@ class OpenHands {
    return Object.keys(response.data.hosts);
  }

-  /**
-   * @param code Code provided by GitHub
-   * @returns GitHub access token
-   */
-  static async getGitHubAccessToken(
-    code: string,
-  ): Promise<GitHubAccessTokenResponse> {
-    const { data } = await openHands.post<GitHubAccessTokenResponse>(
-      "/api/keycloak/callback",
-      {
-        code,
-      },
-    );
-    return data;
-  }
-
  /**
   * Get the VSCode URL
   * @returns VSCode URL
@@ -391,92 +310,6 @@ class OpenHands {
    return data;
  }

-  /**
-   * Get the settings from the server or use the default settings if not found
-   */
-  static async getSettings(): Promise<ApiSettings> {
-    const { data } = await openHands.get<ApiSettings>("/api/settings");
-    return data;
-  }
-
-  /**
-   * Save the settings to the server. Only valid settings are saved.
-   * @param settings - the settings to save
-   */
-  static async saveSettings(
-    settings: Partial<PostApiSettings>,
-  ): Promise<boolean> {
-    const data = await openHands.post("/api/settings", settings);
-    return data.status === 200;
-  }
-
-  static async createCheckoutSession(amount: number): Promise<string> {
-    const { data } = await openHands.post(
-      "/api/billing/create-checkout-session",
-      {
-        amount,
-      },
-    );
-    return data.redirect_url;
-  }
-
-  static async createBillingSessionResponse(): Promise<string> {
-    const { data } = await openHands.post(
-      "/api/billing/create-customer-setup-session",
-    );
-    return data.redirect_url;
-  }
-
-  static async getBalance(): Promise<string> {
-    const { data } = await openHands.get<{ credits: string }>(
-      "/api/billing/credits",
-    );
-    return data.credits;
-  }
-
-  static async getSubscriptionAccess(): Promise<SubscriptionAccess | null> {
-    const { data } = await openHands.get<SubscriptionAccess | null>(
-      "/api/billing/subscription-access",
-    );
-    return data;
-  }
-
-  static async getGitUser(): Promise<GitUser> {
-    const response = await openHands.get<GitUser>("/api/user/info");
-
-    const { data } = response;
-
-    const user: GitUser = {
-      id: data.id,
-      login: data.login,
-      avatar_url: data.avatar_url,
-      company: data.company,
-      name: data.name,
-      email: data.email,
-    };
-
-    return user;
-  }
-
-  static async searchGitRepositories(
-    query: string,
-    per_page = 5,
-    selected_provider?: Provider,
-  ): Promise<GitRepository[]> {
-    const response = await openHands.get<GitRepository[]>(
-      "/api/user/search/repositories",
-      {
-        params: {
-          query,
-          per_page,
-          selected_provider,
-        },
-      },
-    );
-
-    return response.data;
-  }
-
  static async getTrajectory(
    conversationId: string,
  ): Promise<GetTrajectoryResponse> {
@@ -487,131 +320,6 @@ class OpenHands {
    return data;
  }

-  static async logout(appMode: GetConfigResponse["APP_MODE"]): Promise<void> {
-    const endpoint =
-      appMode === "saas" ? "/api/logout" : "/api/unset-provider-tokens";
-    await openHands.post(endpoint);
-  }
-
-  static async getGitChanges(conversationId: string): Promise<GitChange[]> {
-    const url = `${this.getConversationUrl(conversationId)}/git/changes`;
-    const { data } = await openHands.get<GitChange[]>(url, {
-      headers: this.getConversationHeaders(),
-    });
-    return data;
-  }
-
-  static async getGitChangeDiff(
-    conversationId: string,
-    path: string,
-  ): Promise<GitChangeDiff> {
-    const url = `${this.getConversationUrl(conversationId)}/git/diff`;
-    const { data } = await openHands.get<GitChangeDiff>(url, {
-      params: { path },
-      headers: this.getConversationHeaders(),
-    });
-    return data;
-  }
-
-  /**
-   * @returns A list of repositories
-   */
-  static async retrieveUserGitRepositories(
-    selected_provider: Provider,
-    page = 1,
-    per_page = 30,
-  ) {
-    const { data } = await openHands.get<GitRepository[]>(
-      "/api/user/repositories",
-      {
-        params: {
-          selected_provider,
-          sort: "pushed",
-          page,
-          per_page,
-        },
-      },
-    );
-
-    const link =
-      data.length > 0 && data[0].link_header ? data[0].link_header : "";
-    const nextPage = extractNextPageFromLink(link);
-
-    return { data, nextPage };
-  }
-
-  static async retrieveInstallationRepositories(
-    selected_provider: Provider,
-    installationIndex: number,
-    installations: string[],
-    page = 1,
-    per_page = 30,
-  ) {
-    const installationId = installations[installationIndex];
-    const response = await openHands.get<GitRepository[]>(
-      "/api/user/repositories",
-      {
-        params: {
-          selected_provider,
-          sort: "pushed",
-          page,
-          per_page,
-          installation_id: installationId,
-        },
-      },
-    );
-    const link =
-      response.data.length > 0 && response.data[0].link_header
-        ? response.data[0].link_header
-        : "";
-    const nextPage = extractNextPageFromLink(link);
-    let nextInstallation: number | null;
-    if (nextPage) {
-      nextInstallation = installationIndex;
-    } else if (installationIndex + 1 < installations.length) {
-      nextInstallation = installationIndex + 1;
-    } else {
-      nextInstallation = null;
-    }
-    return {
-      data: response.data,
-      nextPage,
-      installationIndex: nextInstallation,
-    };
-  }
-
-  static async getRepositoryBranches(
-    repository: string,
-    page: number = 1,
-    perPage: number = 30,
-  ): Promise<PaginatedBranchesResponse> {
-    const { data } = await openHands.get<PaginatedBranchesResponse>(
-      `/api/user/repository/branches?repository=${encodeURIComponent(repository)}&page=${page}&per_page=${perPage}`,
-    );
-
-    return data;
-  }
-
-  static async searchRepositoryBranches(
-    repository: string,
-    query: string,
-    perPage: number = 30,
-    selectedProvider?: Provider,
-  ): Promise<Branch[]> {
-    const { data } = await openHands.get<Branch[]>(
-      `/api/user/search/branches`,
-      {
-        params: {
-          repository,
-          query,
-          per_page: perPage,
-          selected_provider: selectedProvider,
-        },
-      },
-    );
-    return data;
-  }
-
  /**
   * Get the available microagents associated with a conversation
   * @param conversationId The ID of the conversation
@@ -627,43 +335,6 @@ class OpenHands {
    return data;
  }

-  /**
-   * Get the available microagents for a repository
-   * @param owner The repository owner
-   * @param repo The repository name
-   * @returns The available microagents for the repository
-   */
-  static async getRepositoryMicroagents(
-    owner: string,
-    repo: string,
-  ): Promise<RepositoryMicroagent[]> {
-    const { data } = await openHands.get<RepositoryMicroagent[]>(
-      `/api/user/repository/${owner}/${repo}/microagents`,
-    );
-    return data;
-  }
-
-  /**
-   * Get the content of a specific microagent from a repository
-   * @param owner The repository owner
-   * @param repo The repository name
-   * @param filePath The path to the microagent file within the repository
-   * @returns The microagent content and metadata
-   */
-  static async getRepositoryMicroagentContent(
-    owner: string,
-    repo: string,
-    filePath: string,
-  ): Promise<MicroagentContentResponse> {
-    const { data } = await openHands.get<MicroagentContentResponse>(
-      `/api/user/repository/${owner}/${repo}/microagents/content`,
-      {
-        params: { file_path: filePath },
-      },
-    );
-    return data;
-  }
-
  static async getMicroagentPrompt(
    conversationId: string,
    eventId: number,
@@ -751,39 +422,6 @@ class OpenHands {
    );
    return response.data;
  }
-
-  /**
-   * Get the user installation IDs
-   * @param provider The provider to get installation IDs for (github, bitbucket, etc.)
-   * @returns List of installation IDs
-   */
-  static async getUserInstallationIds(provider: Provider): Promise<string[]> {
-    const { data } = await openHands.get<string[]>(
-      `/api/user/installations?provider=${provider}`,
-    );
-    return data;
-  }
-
-  static async getMicroagentManagementConversations(
-    selectedRepository: string,
-    pageId?: string,
-    limit: number = 100,
-  ): Promise<Conversation[]> {
-    const params: Record<string, string | number> = {
-      limit,
-      selected_repository: selectedRepository,
-    };
-
-    if (pageId) {
-      params.page_id = pageId;
-    }
-
-    const { data } = await openHands.get<ResultSet<Conversation>>(
-      "/api/microagent-management/conversations",
-      { params },
-    );
-    return data.results;
-  }
 }

-export default OpenHands;
+export default ConversationService;
@@ -1,4 +1,4 @@
-import OpenHands from "#/api/open-hands";
+import ConversationService from "#/api/conversation-service/conversation-service.api";

 /**
 * Returns a URL compatible for the file service
@@ -6,4 +6,4 @@ import OpenHands from "#/api/open-hands";
 * @returns URL of the conversation
 */
 export const getConversationUrl = (conversationId: string) =>
-  OpenHands.getConversationUrl(conversationId);
+  ConversationService.getConversationUrl(conversationId);
@@ -0,0 +1,251 @@
+import { openHands } from "../open-hands-axios";
+import { Provider } from "#/types/settings";
+import { GitRepository, PaginatedBranchesResponse, Branch } from "#/types/git";
+import { extractNextPageFromLink } from "#/utils/extract-next-page-from-link";
+import { RepositoryMicroagent } from "#/types/microagent-management";
+import {
+  MicroagentContentResponse,
+  GitChange,
+  GitChangeDiff,
+} from "../open-hands.types";
+import ConversationService from "../conversation-service/conversation-service.api";
+
+/**
+ * Git Service API - Handles all Git-related API endpoints
+ */
+class GitService {
+  /**
+   * Search for Git repositories
+   * @param query Search query
+   * @param per_page Number of results per page
+   * @param selected_provider Git provider to search in
+   * @returns List of matching repositories
+   */
+  static async searchGitRepositories(
+    query: string,
+    per_page = 5,
+    selected_provider?: Provider,
+  ): Promise<GitRepository[]> {
+    const response = await openHands.get<GitRepository[]>(
+      "/api/user/search/repositories",
+      {
+        params: {
+          query,
+          per_page,
+          selected_provider,
+        },
+      },
+    );
+
+    return response.data;
+  }
+
+  /**
+   * Retrieve user's Git repositories
+   * @param selected_provider Git provider
+   * @param page Page number
+   * @param per_page Number of results per page
+   * @returns User's repositories with pagination info
+   */
+  static async retrieveUserGitRepositories(
+    selected_provider: Provider,
+    page = 1,
+    per_page = 30,
+  ) {
+    const { data } = await openHands.get<GitRepository[]>(
+      "/api/user/repositories",
+      {
+        params: {
+          selected_provider,
+          sort: "pushed",
+          page,
+          per_page,
+        },
+      },
+    );
+
+    const link =
+      data.length > 0 && data[0].link_header ? data[0].link_header : "";
+    const nextPage = extractNextPageFromLink(link);
+
+    return { data, nextPage };
+  }
+
+  /**
+   * Retrieve repositories from a specific installation
+   * @param selected_provider Git provider
+   * @param installationIndex Current installation index
+   * @param installations List of installation IDs
+   * @param page Page number
+   * @param per_page Number of results per page
+   * @returns Installation repositories with pagination info
+   */
+  static async retrieveInstallationRepositories(
+    selected_provider: Provider,
+    installationIndex: number,
+    installations: string[],
+    page = 1,
+    per_page = 30,
+  ) {
+    const installationId = installations[installationIndex];
+    const response = await openHands.get<GitRepository[]>(
+      "/api/user/repositories",
+      {
+        params: {
+          selected_provider,
+          sort: "pushed",
+          page,
+          per_page,
+          installation_id: installationId,
+        },
+      },
+    );
+    const link =
+      response.data.length > 0 && response.data[0].link_header
+        ? response.data[0].link_header
+        : "";
+    const nextPage = extractNextPageFromLink(link);
+    let nextInstallation: number | null;
+    if (nextPage) {
+      nextInstallation = installationIndex;
+    } else if (installationIndex + 1 < installations.length) {
+      nextInstallation = installationIndex + 1;
+    } else {
+      nextInstallation = null;
+    }
+    return {
+      data: response.data,
+      nextPage,
+      installationIndex: nextInstallation,
+    };
+  }
+
+  /**
+   * Get repository branches
+   * @param repository Repository name
+   * @param page Page number
+   * @param perPage Number of results per page
+   * @returns Paginated branches response
+   */
+  static async getRepositoryBranches(
+    repository: string,
+    page: number = 1,
+    perPage: number = 30,
+  ): Promise<PaginatedBranchesResponse> {
+    const { data } = await openHands.get<PaginatedBranchesResponse>(
+      `/api/user/repository/branches?repository=${encodeURIComponent(repository)}&page=${page}&per_page=${perPage}`,
+    );
+
+    return data;
+  }
+
+  /**
+   * Search repository branches
+   * @param repository Repository name
+   * @param query Search query
+   * @param perPage Number of results per page
+   * @param selectedProvider Git provider
+   * @returns List of matching branches
+   */
+  static async searchRepositoryBranches(
+    repository: string,
+    query: string,
+    perPage: number = 30,
+    selectedProvider?: Provider,
+  ): Promise<Branch[]> {
+    const { data } = await openHands.get<Branch[]>(
+      `/api/user/search/branches`,
+      {
+        params: {
+          repository,
+          query,
+          per_page: perPage,
+          selected_provider: selectedProvider,
+        },
+      },
+    );
+    return data;
+  }
+
+  /**
+   * Get the available microagents for a repository
+   * @param owner The repository owner
+   * @param repo The repository name
+   * @returns The available microagents for the repository
+   */
+  static async getRepositoryMicroagents(
+    owner: string,
+    repo: string,
+  ): Promise<RepositoryMicroagent[]> {
+    const { data } = await openHands.get<RepositoryMicroagent[]>(
+      `/api/user/repository/${owner}/${repo}/microagents`,
+    );
+    return data;
+  }
+
+  /**
+   * Get the content of a specific microagent from a repository
+   * @param owner The repository owner
+   * @param repo The repository name
+   * @param filePath The path to the microagent file within the repository
+   * @returns The microagent content and metadata
+   */
+  static async getRepositoryMicroagentContent(
+    owner: string,
+    repo: string,
+    filePath: string,
+  ): Promise<MicroagentContentResponse> {
+    const { data } = await openHands.get<MicroagentContentResponse>(
+      `/api/user/repository/${owner}/${repo}/microagents/content`,
+      {
+        params: { file_path: filePath },
+      },
+    );
+    return data;
+  }
+
+  /**
+   * Get the user installation IDs
+   * @param provider The provider to get installation IDs for (github, bitbucket, etc.)
+   * @returns List of installation IDs
+   */
+  static async getUserInstallationIds(provider: Provider): Promise<string[]> {
+    const { data } = await openHands.get<string[]>(
+      `/api/user/installations?provider=${provider}`,
+    );
+    return data;
+  }
+
+  /**
+   * Get git changes for a conversation
+   * @param conversationId The conversation ID
+   * @returns List of git changes
+   */
+  static async getGitChanges(conversationId: string): Promise<GitChange[]> {
+    const url = `${ConversationService.getConversationUrl(conversationId)}/git/changes`;
+    const { data } = await openHands.get<GitChange[]>(url, {
+      headers: ConversationService.getConversationHeaders(),
+    });
+    return data;
+  }
+
+  /**
+   * Get git change diff for a specific file
+   * @param conversationId The conversation ID
+   * @param path The file path
+   * @returns Git change diff
+   */
+  static async getGitChangeDiff(
+    conversationId: string,
+    path: string,
+  ): Promise<GitChangeDiff> {
+    const url = `${ConversationService.getConversationUrl(conversationId)}/git/diff`;
+    const { data } = await openHands.get<GitChangeDiff>(url, {
+      params: { path },
+      headers: ConversationService.getConversationHeaders(),
+    });
+    return data;
+  }
+}
+
+export default GitService;
@@ -26,10 +26,6 @@ export interface FeedbackResponse {
  body: FeedbackBodyResponse;
 }

-export interface GitHubAccessTokenResponse {
-  access_token: string;
-}
-
 export interface AuthenticationResponse {
  message: string;
  login?: string; // Only present when allow list is enabled
@@ -44,25 +40,6 @@ export interface Feedback {
  trajectory: unknown[];
 }

-export interface GetConfigResponse {
-  APP_MODE: "saas" | "oss";
-  APP_SLUG?: string;
-  GITHUB_CLIENT_ID: string;
-  POSTHOG_CLIENT_KEY: string;
-  PROVIDERS_CONFIGURED?: Provider[];
-  AUTH_URL?: string;
-  FEATURE_FLAGS: {
-    ENABLE_BILLING: boolean;
-    HIDE_LLM_SETTINGS: boolean;
-    ENABLE_JIRA: boolean;
-    ENABLE_JIRA_DC: boolean;
-    ENABLE_LINEAR: boolean;
-  };
-  MAINTENANCE?: {
-    startTime: string;
-  };
-}
-
 export interface GetVSCodeUrlResponse {
  vscode_url: string | null;
  error?: string;
@@ -73,11 +50,6 @@ export interface GetTrajectoryResponse {
  error?: string;
 }

-export interface AuthenticateResponse {
-  message?: string;
-  error?: string;
-}
-
 export interface RepositorySelection {
  selected_repository: string | null;
  selected_branch: string | null;
@@ -0,0 +1,49 @@
+import { openHands } from "../open-hands-axios";
+import { GetConfigResponse } from "./option.types";
+
+/**
+ * Service for handling API options endpoints
+ */
+class OptionService {
+  /**
+   * Retrieve the list of models available
+   * @returns List of models available
+   */
+  static async getModels(): Promise<string[]> {
+    const { data } = await openHands.get<string[]>("/api/options/models");
+    return data;
+  }
+
+  /**
+   * Retrieve the list of agents available
+   * @returns List of agents available
+   */
+  static async getAgents(): Promise<string[]> {
+    const { data } = await openHands.get<string[]>("/api/options/agents");
+    return data;
+  }
+
+  /**
+   * Retrieve the list of security analyzers available
+   * @returns List of security analyzers available
+   */
+  static async getSecurityAnalyzers(): Promise<string[]> {
+    const { data } = await openHands.get<string[]>(
+      "/api/options/security-analyzers",
+    );
+    return data;
+  }
+
+  /**
+   * Get the configuration from the server
+   * @returns Configuration response
+   */
+  static async getConfig(): Promise<GetConfigResponse> {
+    const { data } = await openHands.get<GetConfigResponse>(
+      "/api/options/config",
+    );
+    return data;
+  }
+}
+
+export default OptionService;
@@ -0,0 +1,20 @@
+import { Provider } from "#/types/settings";
+
+export interface GetConfigResponse {
+  APP_MODE: "saas" | "oss";
+  APP_SLUG?: string;
+  GITHUB_CLIENT_ID: string;
+  POSTHOG_CLIENT_KEY: string;
+  PROVIDERS_CONFIGURED?: Provider[];
+  AUTH_URL?: string;
+  FEATURE_FLAGS: {
+    ENABLE_BILLING: boolean;
+    HIDE_LLM_SETTINGS: boolean;
+    ENABLE_JIRA: boolean;
+    ENABLE_JIRA_DC: boolean;
+    ENABLE_LINEAR: boolean;
+  };
+  MAINTENANCE?: {
+    startTime: string;
+  };
+}
@@ -0,0 +1,30 @@
+import { openHands } from "../open-hands-axios";
+import { GitUser } from "#/types/git";
+
+/**
+ * User Service API - Handles all user-related API endpoints
+ */
+class UserService {
+  /**
+   * Get the current user's Git information
+   * @returns Git user information
+   */
+  static async getUser(): Promise<GitUser> {
+    const response = await openHands.get<GitUser>("/api/user/info");
+
+    const { data } = response;
+
+    const user: GitUser = {
+      id: data.id,
+      login: data.login,
+      avatar_url: data.avatar_url,
+      company: data.company,
+      name: data.name,
+      email: data.email,
+    };
+
+    return user;
+  }
+}
+
+export default UserService;
@@ -1,22 +0,0 @@
-import React from "react";
-
-function ArrowIcon() {
-  return (
-    <svg
-      xmlns="http://www.w3.org/2000/svg"
-      fill="none"
-      viewBox="0 0 24 24"
-      strokeWidth={1.5}
-      stroke="currentColor"
-      className="w-5 h-5"
-    >
-      <path
-        strokeLinecap="round"
-        strokeLinejoin="round"
-        d="M16.023 9.348h4.992v-.001M2.985 19.644v-4.992m0 0h4.992m-4.993 0 3.181 3.183a8.25 8.25 0 0 0 13.803-3.7M4.031 9.865a8.25 8.25 0 0 1 13.803-3.7l3.181 3.182m0-4.991v4.99"
-      />
-    </svg>
-  );
-}
-
-export default ArrowIcon;
@@ -1,35 +0,0 @@
-<svg width="70" height="46" viewBox="0 0 70 46" fill="none" xmlns="http://www.w3.org/2000/svg">
-  <g clip-path="url(#clip0_8467_33285)">
-    <g clip-path="url(#clip1_8467_33285)">
-      <path
-        d="M66.7813 13.7968C64.5776 12.4773 63.1054 14.4995 63.286 17.2452L63.2677 17.2659C63.2738 14.3987 62.8759 11.232 61.5537 8.67021C61.0854 7.7629 60.1366 6.27147 58.2604 6.97419C57.4371 7.28256 56.6903 8.21062 57.0759 10.6064C57.0759 10.6064 57.5044 13.1208 57.4248 16.2815V16.326C56.8892 7.60872 54.8692 4.94905 51.9799 5.12103C51.0555 5.28114 49.7915 5.66956 50.2169 8.34998C50.2169 8.34998 50.6791 11.146 50.8291 13.3728L50.8382 13.4855H50.8291C49.4701 8.6791 47.6398 8.61387 46.3146 8.80067C45.1117 8.96968 43.7987 10.1854 44.4628 12.5811C46.5472 20.0976 46.1401 29.1499 45.984 30.4456C45.5586 29.5591 45.427 28.8564 44.8362 27.8838C42.4612 23.9788 41.3318 23.6912 39.9453 23.6319C38.568 23.5726 37.0805 24.3999 37.1784 25.9743C37.2794 27.5488 38.1027 27.8097 39.2719 30.0038C40.184 31.7117 40.4442 33.9503 42.2806 38.0184C43.8017 41.3867 47.7776 45.0812 55.0191 44.6424C60.8865 44.4526 69.6492 42.4541 68.125 29.3308C67.7454 27.0506 68.0301 25.1411 68.229 23.1842C68.5382 20.148 68.9911 15.1163 66.7844 13.7938L66.7813 13.7968Z"
-        fill="#FFE165" />
-      <path
-        d="M30.1451 23.724C28.7586 23.81 27.6384 24.1154 25.3368 28.0619C24.7644 29.0433 24.6481 29.749 24.238 30.6415C24.0574 29.3487 23.479 20.3053 25.4194 12.7533C26.0377 10.3486 24.7032 9.15665 23.4973 9.0084C22.169 8.84532 20.3356 8.94317 19.0685 13.797H19.0532L19.0716 13.6576C19.1787 11.4279 19.5888 8.62591 19.5888 8.62591C19.9592 5.93659 18.6921 5.57189 17.7647 5.4266C14.8815 5.308 12.9165 7.97952 12.537 16.6197H12.5309C12.3993 13.4916 12.7758 11.0009 12.7758 11.0009C13.1155 8.59626 12.3503 7.68302 11.5209 7.38948C9.63244 6.71937 8.71117 8.22859 8.26125 9.14479C6.98801 11.7303 6.64827 14.9029 6.70949 17.7702L6.69112 17.7494C6.81661 15.0008 5.30769 13.0053 3.12849 14.3633C0.949283 15.7243 1.49715 20.7471 1.86443 23.7774C2.10316 25.7314 2.42147 27.6349 2.0848 29.921C0.811553 43.0681 9.61101 44.9094 15.4814 44.9954C22.7291 45.3067 26.6345 41.5381 28.0914 38.1431C29.8482 34.0454 30.0686 31.8008 30.947 30.0781C32.0734 27.8632 32.8936 27.5875 32.964 26.013C33.0344 24.4386 31.5316 23.638 30.1543 23.721L30.1451 23.724Z"
-        fill="#FFE165" />
-      <path
-        d="M33.0474 23.7441C32.3129 23.0473 31.208 22.6766 30.0847 22.7419C28.285 22.8516 27.0087 23.4891 25.0468 26.6024C24.9948 23.0413 25.1998 17.6953 26.4057 12.9927C26.8587 11.2256 26.4027 10.0722 25.9375 9.41688C25.3957 8.6519 24.554 8.14783 23.6236 8.03516C22.7758 7.93139 21.6678 7.92545 20.5874 8.78532C20.5874 8.77346 20.5905 8.75864 20.5905 8.75864C20.9394 6.2324 20.0395 4.78545 17.9185 4.4593L17.8022 4.44743C16.4953 4.3911 15.3751 4.80621 14.4722 5.67794C13.9886 6.14345 13.5692 6.74536 13.205 7.48959C12.798 6.9114 12.2746 6.61786 11.8614 6.46961C9.33633 5.57119 7.94066 7.49552 7.33464 8.72306C6.63068 10.1522 6.19301 11.7534 5.94815 13.3871C5.89612 13.3545 5.84715 13.3219 5.79512 13.2922C5.23807 12.9809 4.07808 12.6014 2.57222 13.5413C0.0196132 15.1365 0.344046 19.7205 0.852119 23.8953C0.879665 24.1177 0.907211 24.3371 0.934757 24.5595C1.15207 26.2703 1.35713 27.8863 1.07555 29.7869L1.06943 29.8343C0.558293 35.1092 1.58362 39.1683 4.11787 41.9051C6.55417 44.5381 10.3708 45.9079 15.4301 45.9821C15.8005 45.9969 16.1616 46.0028 16.5136 45.9998C25.157 45.9227 28.2575 40.3039 29.0196 38.5249C30.0051 36.224 30.5162 34.5073 30.9233 33.1255C31.2386 32.0581 31.4895 31.216 31.8446 30.5163C32.2486 29.7216 32.6036 29.2057 32.9158 28.7491C33.4484 27.9722 33.9075 27.3021 33.9626 26.0568C34.0024 25.1554 33.684 24.3549 33.0382 23.7441H33.0474ZM15.9076 7.07152C16.3943 6.60304 16.9544 6.39548 17.6644 6.41031C18.2796 6.50815 18.8367 6.65344 18.5826 8.49178C18.5643 8.60742 18.1664 11.3679 18.0593 13.6154C18.0593 13.6302 18.0593 13.6451 18.0593 13.6599C17.4992 15.854 17.0309 19.0148 16.7799 23.6344C15.6934 23.6996 14.6099 23.8123 13.557 23.9605C13.2173 14.588 14.0039 8.90393 15.9045 7.07152H15.9076ZM9.17105 9.57106C9.93316 8.02923 10.5728 8.10632 11.1666 8.31684C11.9899 8.61038 11.8614 10.2026 11.7665 10.8638C11.7512 10.9706 11.3839 13.4523 11.5125 16.6101C11.4176 18.825 11.4298 21.3779 11.5431 24.2985C10.5361 24.4913 9.58118 24.7136 8.70889 24.9538C8.2957 23.6077 6.46847 15.0564 9.17105 9.57403V9.57106ZM31.2324 27.6609C30.9019 28.1413 30.4918 28.7402 30.0296 29.6475C29.5919 30.5044 29.3226 31.4236 28.9767 32.5829C28.5819 33.9142 28.0891 35.5717 27.1495 37.7688C26.4823 39.3225 23.6787 44.3691 15.4914 44.0132C10.9279 43.948 7.70192 42.8272 5.62984 40.5886C3.49349 38.2818 2.63956 34.7296 3.08948 30.0359C3.40167 27.8892 3.17212 26.0716 2.94869 24.3163C2.92114 24.0969 2.89359 23.8805 2.86605 23.661C2.62119 21.6359 1.96621 16.2543 3.67101 15.1899C4.13011 14.9022 4.50657 14.837 4.78509 14.9912C5.2595 15.2551 5.73697 16.2158 5.66963 17.7042C5.66657 17.7843 5.67575 17.8614 5.69106 17.9385C5.77675 21.4076 6.41644 24.426 6.78066 25.5587C6.18383 25.7751 5.65739 26.0005 5.21665 26.2258C4.72082 26.4808 4.53412 27.0738 4.79734 27.5542C4.98098 27.8892 5.33602 28.079 5.7033 28.076C5.85939 28.076 6.01855 28.0375 6.16852 27.9604C8.64461 26.6884 14.5181 25.3956 19.6937 25.535C20.2568 25.5439 20.719 25.1228 20.7343 24.5802C20.7496 24.0376 20.3089 23.5869 19.7488 23.5721C19.4396 23.5632 19.1274 23.5632 18.8153 23.5632C19.3325 14.247 20.7557 11.2078 21.8637 10.3094C22.3228 9.93873 22.7605 9.90908 23.3665 9.98321C23.5348 10.004 23.9572 10.0988 24.2602 10.5258C24.5877 10.9913 24.6489 11.6792 24.4347 12.5154C22.5615 19.8124 22.9839 28.3933 23.1982 30.4748C23.1614 30.5459 23.1278 30.6171 23.088 30.6912C22.6503 31.4888 21.8361 32.319 20.8659 32.2627C20.3119 32.236 19.8253 32.6452 19.7916 33.1848C19.7579 33.7274 20.1834 34.193 20.7435 34.2256C22.384 34.3205 23.9297 33.3449 24.8785 31.6163C24.9795 31.4325 25.0652 31.2575 25.1417 31.0885C25.1478 31.0767 25.1539 31.0618 25.16 31.05C25.3376 30.6645 25.4661 30.3117 25.5794 29.9944C25.7569 29.5022 25.9099 29.0753 26.216 28.5475C28.3891 24.8174 29.2705 24.7641 30.2041 24.7077C30.7519 24.6751 31.2967 24.8441 31.6181 25.1525C31.8476 25.3689 31.9486 25.6387 31.9333 25.9768C31.9027 26.6736 31.7038 26.9641 31.2233 27.6639L31.2324 27.6609Z"
-        fill="black" />
-      <path
-        d="M69.1207 29.176C68.8055 27.2813 68.9799 25.6624 69.1636 23.9485C69.188 23.7262 69.2125 23.5068 69.234 23.2844C69.6625 19.1036 69.9012 14.5107 67.3149 12.963C65.7907 12.0497 64.6368 12.45 64.0859 12.7703C64.0339 12.7999 63.9849 12.8355 63.9329 12.8681C63.6543 11.2403 63.1891 9.64804 62.4576 8.23074C61.8302 7.01506 60.4008 5.11446 57.8911 6.05735C57.4809 6.21153 56.9667 6.51397 56.5689 7.10105C56.1893 6.36275 55.7578 5.76974 55.265 5.31312C54.3468 4.45918 53.2174 4.06186 51.9136 4.14192L51.7973 4.15378C49.6823 4.51848 48.81 5.98026 49.2079 8.50649C49.2079 8.50649 49.2079 8.51835 49.211 8.52725C48.1152 7.68517 47.0073 7.71185 46.1625 7.83046C45.2352 7.96092 44.4026 8.47981 43.8762 9.25369C43.4263 9.91786 42.9886 11.0772 43.4753 12.8355C44.773 17.5173 45.0791 22.8604 45.0944 26.4214C43.0743 23.3437 41.7858 22.7299 39.9861 22.6528C38.8659 22.6054 37.761 22.9997 37.0417 23.7084C36.4081 24.331 36.1051 25.1375 36.1633 26.036C36.2429 27.2783 36.7142 27.9425 37.2621 28.7075C37.5834 29.1582 37.9477 29.6682 38.367 30.4539C38.7373 31.1477 39.0036 31.9839 39.3403 33.0454C39.7749 34.4182 40.3166 36.1261 41.3481 38.4092C42.1439 40.1734 45.3515 45.7388 53.9703 45.6587C54.3193 45.6558 54.6804 45.6439 55.0477 45.6202C60.1345 45.4571 63.9237 44.0161 66.311 41.3416C68.7902 38.5604 69.739 34.4834 69.1299 29.2175L69.1238 29.17L69.1207 29.176ZM58.0747 10.4575C57.9676 9.7874 57.8054 8.19813 58.6256 7.89272C59.2133 7.67034 59.856 7.58436 60.6457 9.11136C63.4523 14.5463 61.7904 23.1302 61.4017 24.4823C60.5233 24.2569 59.5653 24.0523 58.5552 23.8774C58.6103 20.9568 58.5736 18.4009 58.4389 16.189C58.5063 13.0312 58.0931 10.5554 58.0747 10.4575ZM52.0911 6.10182C52.8042 6.07217 53.3674 6.27083 53.8601 6.73338C55.7945 8.53318 56.6913 14.1994 56.5291 23.5779C55.4731 23.4474 54.3896 23.3555 53.3 23.3081C52.9634 18.6915 52.4339 15.5426 51.8309 13.3573C51.8309 13.3425 51.8309 13.3277 51.8309 13.3129C51.6809 11.0653 51.228 8.31376 51.2096 8.20702C50.9188 6.36572 51.4728 6.21153 52.088 6.10182H52.0911ZM67.116 29.4665C67.6546 34.1513 66.868 37.7183 64.7776 40.0637C62.7484 42.3379 59.5438 43.518 54.9528 43.6662C46.8114 44.1644 43.9038 39.1712 43.209 37.6294C42.2234 35.4471 41.7001 33.8015 41.2808 32.4761C40.9135 31.3197 40.6258 30.4094 40.1728 29.5584C39.6953 28.66 39.2729 28.07 38.9332 27.5956C38.4404 26.9047 38.2354 26.6171 38.1895 25.9203C38.168 25.5823 38.266 25.3095 38.4894 25.0901C38.8077 24.7758 39.3464 24.5949 39.8973 24.6216C40.8308 24.6631 41.7123 24.6987 43.9588 28.3902C44.2772 28.9121 44.4363 29.3361 44.623 29.8253C44.7454 30.1426 44.8801 30.4954 45.0668 30.8779C45.0729 30.8898 45.076 30.9016 45.0821 30.9105C45.1648 31.0795 45.2535 31.2515 45.3576 31.4353C46.3401 33.1462 47.9041 34.095 49.5415 33.9705C50.0986 33.929 50.5179 33.4545 50.475 32.9149C50.4322 32.3753 49.9455 31.975 49.3854 32.0106C48.4152 32.0817 47.5858 31.2663 47.1328 30.4776C47.0899 30.4035 47.0563 30.3353 47.0195 30.2641C47.194 28.1827 47.4541 19.5929 45.4402 12.3314C45.2076 11.4982 45.2566 10.8103 45.5749 10.3389C45.8718 9.906 46.2911 9.80222 46.4594 9.7785C47.0624 9.69252 47.5031 9.71624 47.9683 10.078C49.0947 10.9586 50.576 13.9711 51.2678 23.2755C50.9556 23.2784 50.6434 23.2873 50.3373 23.3022C49.7772 23.3259 49.3456 23.7855 49.3701 24.3281C49.3946 24.8707 49.8598 25.2799 50.4291 25.265C55.5986 25.0338 61.4996 26.2198 63.9971 27.4503C64.1471 27.5244 64.3063 27.557 64.4654 27.557C64.8327 27.5541 65.1847 27.3584 65.3622 27.0174C65.6162 26.5341 65.4173 25.9411 64.9153 25.695C64.4715 25.4756 63.939 25.2621 63.3391 25.0545C63.6819 23.9159 64.2665 20.8856 64.2848 17.4165C64.3001 17.3394 64.3063 17.2623 64.3001 17.1823C64.2022 15.6968 64.6644 14.7272 65.1326 14.4544C65.4081 14.2943 65.7846 14.3536 66.2498 14.6323C67.976 15.6671 67.4251 21.0576 67.217 23.0887C67.1955 23.3081 67.1711 23.5245 67.1466 23.744C66.9568 25.5052 66.7609 27.3228 67.116 29.4665Z"
-        fill="black" />
-      <path
-        d="M38.7381 10.5084C38.5759 10.5084 38.4106 10.4788 38.2545 10.4076C37.6821 10.1526 37.4312 9.49736 37.6944 8.94289C38.5453 7.1431 39.791 5.48266 41.2938 4.14245C41.7559 3.73031 42.4782 3.75699 42.9037 4.20768C43.3291 4.65541 43.3016 5.35516 42.8363 5.76731C41.5539 6.91182 40.4919 8.32912 39.7634 9.86502C39.5737 10.2653 39.1666 10.5055 38.7381 10.5084Z"
-        fill="black" />
-      <path
-        d="M34.898 9.87074C34.3073 9.87667 33.8023 9.43784 33.7533 8.85669C33.536 6.25633 33.5268 3.62039 33.7319 1.02003C33.7808 0.412188 34.3287 -0.0414663 34.9531 0.00300963C35.5805 0.0504507 36.0488 0.578232 36.0029 1.18607C35.807 3.67079 35.8162 6.1911 36.0243 8.67582C36.0763 9.28366 35.6081 9.81737 34.9806 9.86481C34.9531 9.86481 34.9255 9.86778 34.898 9.86778V9.87074Z"
-        fill="black" />
-      <path
-        d="M30.976 10.5558C30.4649 10.5618 29.9935 10.2267 29.8619 9.7256C29.3783 7.88726 28.4632 6.14084 27.2175 4.67906C26.8165 4.20762 26.8869 3.51379 27.3705 3.12537C27.8572 2.73695 28.5734 2.80514 28.9743 3.27362C30.4312 4.98743 31.5024 7.03036 32.0656 9.18003C32.2217 9.77008 31.8514 10.372 31.2423 10.5232C31.1505 10.5469 31.0617 10.5558 30.9699 10.5588L30.976 10.5558Z"
-        fill="black" />
-    </g>
-  </g>
-  <defs>
-    <clipPath id="clip0_8467_33285">
-      <rect width="69" height="46" fill="white" transform="translate(0.5)" />
-    </clipPath>
-    <clipPath id="clip1_8467_33285">
-      <rect width="69" height="46" fill="white" transform="translate(0.5)" />
-    </clipPath>
-  </defs>
-</svg>
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
chuckbutkus	7c556d6396	Merge branch 'main' into chuck-build	2025-09-23 14:25:16 -04:00
openhands	8bb5aa21b9	test	2025-09-23 14:19:20 -04:00
BenYao21	d3d70fcc60	issue #9388 , this will fix the issue (#10450 ) Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-09-22 16:56:53 -04:00
Xinyi He	7906eab6b1	Add inference generation of SWE-Perf Benchmark (#10246 ) Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-22 20:35:30 +00:00
juanmichelini	547e1049f1	Multi swe gym (#10605 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-22 15:56:26 -04:00
mamoodi	818cc60b52	New label for not going stale (#11069 )	2025-09-22 11:53:47 -04:00
Robert Brennan	431d2c1f43	security: upgrade setuptools to >=78.1.1 to address CVE-2025-47273 and CVE-2024-6345 (#11038 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: enyst <engel.nyst@gmail.com>	2025-09-22 04:05:45 +00:00
Engel Nyst	07f23641a3	build(deps): pin litellm to avoid build failure (#11054 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-22 03:54:37 +02:00
Hiep Le	de84af5586	feat(frontend): display lock icon when confirmation mode is enabled (#11030 )	2025-09-20 10:55:19 +07:00
Hiep Le	b7765ba3f7	refactor(frontend): fix typecheck (#11037 )	2025-09-19 13:43:00 -04:00
Hiep Le	b89f2e51e4	refactor(frontend): migration of metrics-slice.ts to zustand (#11018 )	2025-09-19 23:52:21 +07:00
mamoodi	e09f93aa75	Release 0.57.0 (#10981 ) Co-authored-by: Ray Myers <ray.myers@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com> Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>	2025-09-19 12:40:56 -04:00
Hiep Le	9f529b105a	refactor(frontend): migration of command-slice.ts to zustand (#11003 )	2025-09-19 23:33:59 +07:00
Graham Neubig	89e3d2a867	Improve OpenHands provider pricing documentation (#10974 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-20 00:22:44 +08:00
Hiep Le	a7b9a4f291	refactor(frontend): migration of status-slice.ts to zustand (#11017 )	2025-09-19 22:27:55 +07:00
Hiep Le	88cd16ae21	refactor(frontend): migration of initial-query-slice.ts to zustand (#11020 )	2025-09-19 22:27:20 +07:00
Hiep Le	a8a3e9e604	refactor(frontend): remove the code-slice.ts file (#11021 )	2025-09-19 21:22:29 +07:00
Hiep Le	0061bcc0b0	refactor(frontend): custom chat input (#10984 )	2025-09-19 21:06:18 +07:00
Hiep Le	9c9fa780b0	refactor(frontend): task tracking observation content (#11002 )	2025-09-19 20:03:05 +07:00
Alona	569ac16163	Improve token refresh error logging (#11026 )	2025-09-19 14:18:38 +07:00
openhands	08096db29f	test	2025-09-18 22:50:21 -04:00
openhands	b2b6ddf90c	test	2025-09-18 22:24:35 -04:00
openhands	87fe36d811	test	2025-09-18 21:44:34 -04:00
openhands	39d255d313	test	2025-09-18 21:27:03 -04:00
openhands	e334b67f21	Add logging	2025-09-18 20:48:24 -04:00
Robert Brennan	46f7738f41	Update Python packages to latest versions (#11023 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-18 19:52:46 +00:00
Rohit Malhotra	3f3669dd34	Hotfix: rm model choice override (#11022 )	2025-09-18 14:40:06 -04:00
sp.wack	cd65645eea	Hide Tavily search API key help text in SaaS mode (#11014 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-18 16:40:29 +00:00
Robert Brennan	8e88a7a277	fix: resolve critical and high CVEs in enterprise Docker image (#10987 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-18 11:25:33 -04:00
Hiep Le	b393d52439	refactor(frontend): conversation main (#10985 )	2025-09-18 20:23:13 +07:00
Hiep Le	faeec48365	refactor(frontend): conversation card (#10986 )	2025-09-18 20:22:59 +07:00
chuckbutkus	d5c02bf87b	Merge branch 'main' into allow-custom-user	2025-09-17 22:43:30 -04:00
openhands	14a4664fe8	Make su commands optional	2025-09-17 22:40:21 -04:00
sp.wack	774caf0607	feat: refactor status indicators (#10983 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-17 22:32:55 +04:00
chuckbutkus	3a7df33acf	Merge branch 'main' into test-user	2025-09-17 14:02:52 -04:00
sp.wack	7222730df0	Fix SaaS callback URLs and pro pill positioning (#10998 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-17 16:56:02 +00:00
Hiep Le	910177fc57	refactor(frontend): system message modal (#10969 )	2025-09-17 21:56:14 +07:00
Hiep Le	ac9badbd20	refactor(frontend): metrics modal (#10968 )	2025-09-17 21:55:25 +07:00
Ray Myers	02c299d88f	Fix Slack resolver failing on AWAITING_USER_INPUT state (#10992 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-17 09:20:12 -05:00
mamoodi	f65fbef649	Remove runtime settings (#10996 )	2025-09-17 13:59:29 +00:00
Hiep Le	3c2acad28d	refactor(frontend): microagents modal (#10970 )	2025-09-16 22:32:23 +07:00
Boxuan Li	0f1780728e	Update str_replace_editor tool to use dynamic workspace path from SANDBOX_VOLUMES (#10965 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-15 17:46:54 -07:00
sp.wack	d3f3378a4c	feat: Upgrade banner for unsubscribed SaaS users (#10890 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-15 23:04:44 +00:00
Engel Nyst	65f4164749	[Docs] Add environment variables reference table (#10926 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-15 18:31:44 +00:00
Hiep Le	3f984d878b	refactor(frontend): move conversation APIs to a dedicated service handler (#10957 )	2025-09-16 00:57:15 +07:00
Eliot Jones	10b871f4ab	feat: Add Cygnal integration (#10898 )	2025-09-15 09:57:03 -04:00
Hiep Le	d664f516db	refactor(frontend): conversation tab content component (#10956 )	2025-09-15 20:56:38 +07:00
Hiep Le	e74bbd81d1	fix(frontend): suppressing event display in the absence of user messages (#10955 )	2025-09-15 20:56:16 +07:00
Hiep Le	ab893f93f0	refactor(frontend): use-auto-resize hook (#10959 )	2025-09-15 20:49:15 +07:00
Hiep Le	5aba498e77	refactor(frontend): move billing APIs to a dedicated service handler (#10958 )	2025-09-15 20:37:07 +07:00
Hiep Le	1523555eea	refactor(frontend): remove dead code (#10839 )	2025-09-15 20:35:56 +07:00
Kaushik Ashodiya	30604c40fc	fix: improve CLI help and version command performance (#10908 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-09-12 14:23:01 -04:00
Hiep Le	8dc46b7206	refactor(frontend): optimize pre-commit lint script (#10870 ) Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>	2025-09-12 15:23:29 +00:00
Hiep Le	69498bebb4	refactor(frontend): new conversation component (#10937 ) Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-09-12 22:15:26 +07:00
tksrmz	77ee9e25d9	fix(frontend): highlight preceding stars on hover in LikertScale (#10948 )	2025-09-12 18:01:40 +04:00
Hiep Le	74753036bb	refactor(frontend): move user APIs to a dedicated service handler (#10943 )	2025-09-12 09:08:15 +07:00
Hiep Le	95d7c10608	refactor(frontend): move option APIs to a dedicated service handler (#10933 )	2025-09-12 00:43:15 +07:00
Hiep Le	c142cc27ff	refactor(frontend): home header component (#10930 )	2025-09-12 00:10:58 +07:00
Hiep Le	0e20fc206b	refactor(frontend): move settings APIs to a dedicated service handler (#10941 )	2025-09-11 23:39:23 +07:00
Hiep Le	e21475a88e	feat(frontend): persist drawer open/close state on page refresh (#10935 ) Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2025-09-11 15:58:00 +00:00
Hiep Le	921fec0019	refactor(frontend): expand repository pill to full available width (#10936 )	2025-09-11 22:37:44 +07:00
Hiep Le	049f839a62	refactor(frontend): move auth APIs to a dedicated service handler (#10932 )	2025-09-11 22:31:41 +07:00
Hiep Le	0dde758e13	refactor(frontend): move microagent management API to a dedicated service handler (#10934 )	2025-09-11 22:27:56 +07:00
Tim O'Farrell	8257ae70cc	Additional logs to debug container working directories (#10902 ) Co-authored-by: Chuck Butkus <chuck@all-hands.dev>	2025-09-11 11:06:19 -04:00
Ray Myers	4513bcc622	chore - MyPy check Enterprise with OpenHands (#10858 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com>	2025-09-11 11:05:50 -04:00
Hiep Le	b5b9a3f40b	refactor(frontend): create waiting for runtime component (#10931 )	2025-09-11 21:30:05 +07:00
chuckbutkus	69fddecc7f	Merge branch 'main' into test-user	2025-09-07 21:55:39 -04:00
Chuck Butkus	3afe5ccee5	Add Logging	2025-09-05 20:52:48 -04:00
chuckbutkus	3d5a8dcf5a	Merge branch 'main' into test-user	2025-09-05 14:20:10 -04:00
Chuck Butkus	2ee1abe22c	Lint fix	2025-09-05 13:16:03 -04:00
Chuck Butkus	148940f553	Added logging around alive checks	2025-09-05 11:10:57 -04:00
Chuck Butkus	1f09296136	Fix username checks	2025-09-03 21:40:13 -04:00
Chuck Butkus	70e5d12ba9	Revert "Change to a non-login shell" This reverts commit `bcb3160d95`.	2025-08-29 01:48:47 -04:00
Chuck Butkus	bcb3160d95	Change to a non-login shell	2025-08-29 01:37:02 -04:00
Chuck Butkus	174c691744	Update	2025-08-28 02:25:05 -04:00
Chuck Butkus	af34d446e9	Remove vscode username restriction	2025-08-28 02:22:27 -04:00
Chuck Butkus	6604924f76	Fix bash username	2025-08-28 02:21:41 -04:00
chuckbutkus	b2def1e438	Merge branch 'main' into test-user	2025-08-27 23:33:45 -04:00
Chuck Butkus	2b8e47aca9	Add runtime user env vars	2025-08-27 23:02:39 -04:00
Chuck Butkus	dba8b28824	Logging	2025-08-27 21:30:47 -04:00