Add missing imports to ActionExecutionClient

Move all run_action logic to ActionExecutionClient
Refactor runtime action execution
2026-04-29 03:00:45 -04:00 · 2024-12-25 15:54:31 +00:00 · 2024-12-25 15:52:08 +00:00 · 2024-12-25 15:47:02 +00:00 · 2024-12-24 15:28:27 -05:00 · 2024-12-24 18:08:33 +00:00
168 changed files with 2780 additions and 2260 deletions
--- a/.github/workflows/openhands-resolver.yml
+++ b/.github/workflows/openhands-resolver.yml
@@ -116,7 +116,7 @@ jobs:
          PAT_USERNAME: ${{ secrets.PAT_USERNAME }}
          GITHUB_TOKEN: ${{ github.token }}
        run: |
-          required_vars=("LLM_MODEL" "LLM_API_KEY")
+          required_vars=("LLM_API_KEY")
          for var in "${required_vars[@]}"; do
            if [ -z "${!var}" ]; then
              echo "Error: Required environment variable $var is not set."
@@ -125,14 +125,14 @@ jobs:
          done

          # Check optional variables and warn about fallbacks
-          if [ -z "$PAT_TOKEN" ]; then
-            echo "Warning: PAT_TOKEN is not set, falling back to GITHUB_TOKEN"
-          fi
-
          if [ -z "$LLM_BASE_URL" ]; then
            echo "Warning: LLM_BASE_URL is not set, will use default API endpoint"
          fi

+          if [ -z "$PAT_TOKEN" ]; then
+            echo "Warning: PAT_TOKEN is not set, falling back to GITHUB_TOKEN"
+          fi
+
          if [ -z "$PAT_USERNAME" ]; then
            echo "Warning: PAT_USERNAME is not set, will use openhands-agent"
          fi
@@ -313,11 +313,13 @@ jobs:
          github-token: ${{ secrets.PAT_TOKEN || github.token }}
          script: |
            const fs = require('fs');
+            const path = require('path');
            const issueNumber = ${{ env.ISSUE_NUMBER }};
            const success = ${{ steps.check_result.outputs.RESOLUTION_SUCCESS }};

            let prNumber = '';
            let branchName = '';
+            let resultExplanation = '';

            try {
              if (success) {
@@ -330,6 +332,25 @@ jobs:
            }


+            try {
+              if (!success){
+                // Read result_explanation from JSON file for failed resolution
+                const outputFilePath = path.resolve('/tmp/output/output.jsonl');
+                if (fs.existsSync(outputFilePath)) {
+                  const outputContent = fs.readFileSync(outputFilePath, 'utf8');
+                  const jsonLines = outputContent.split('\n').filter(line => line.trim() !== '');
+
+                  if (jsonLines.length > 0) {
+                    // First entry in JSON lines has the key 'result_explanation'
+                    const firstEntry = JSON.parse(jsonLines[0]);
+                    resultExplanation = firstEntry.result_explanation || '';
+                  }
+                }
+              }
+            } catch (error){
+              console.error('Error reading file:', error);
+            }
+
            // Check "success" log from resolver output
            if (success && prNumber) {
              github.rest.issues.createComment({
@@ -340,11 +361,17 @@ jobs:
              });
              process.env.AGENT_RESPONDED = 'true';
            } else if (!success && branchName) {
+              let commentBody = `An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named '${branchName}' has been created with the attempted changes. You can view the branch [here](https://github.com/${context.repo.owner}/${context.repo.repo}/tree/${branchName}). Manual intervention may be required.`;
+
+              if (resultExplanation) {
+                commentBody += `\n\nAdditional details about the failure:\n${resultExplanation}`;
+              }
+
              github.rest.issues.createComment({
                issue_number: issueNumber,
                owner: context.repo.owner,
                repo: context.repo.repo,
-                body: `An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named '${branchName}' has been created with the attempted changes. You can view the branch [here](https://github.com/${context.repo.owner}/${context.repo.repo}/tree/${branchName}). Manual intervention may be required.`
+                body: commentBody
              });
              process.env.AGENT_RESPONDED = 'true';
            }
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@
  <a href="https://codecov.io/github/All-Hands-AI/OpenHands?branch=main"><img alt="CodeCov" src="https://img.shields.io/codecov/c/github/All-Hands-AI/OpenHands?style=for-the-badge&color=blue"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
  <br/>
-  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community"></a>
+  <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community"></a>
  <a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Join our Discord community"></a>
  <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="Credits"></a>
  <br/>
@@ -71,6 +71,14 @@ or run it on tagged issues with [a github action](https://github.com/All-Hands-A

 Visit [Installation](https://docs.all-hands.dev/modules/usage/installation) for more information and setup instructions.

+> [!CAUTION]
+> OpenHands is meant to be run by a single user on their local workstation.
+> It is not appropriate for multi-tenant deployments, where multiple users share the same instance--there is no built-in isolation or scalability.
+>
+> If you're interested in running OpenHands in a multi-tenant environment, please
+> [get in touch with us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
+> for advanced deployment options.
+
 If you want to modify the OpenHands source code, check out [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).

 Having issues? The [Troubleshooting Guide](https://docs.all-hands.dev/modules/usage/troubleshooting) can help.
@@ -88,7 +96,7 @@ troubleshooting resources, and advanced configuration options.
 OpenHands is a community-driven project, and we welcome contributions from everyone. We do most of our communication
 through Slack, so this is the best place to start, but we also are happy to have you contact us on Discord or Github:

- [Join our Slack workspace](https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg) - Here we talk about research, architecture, and future development.
+- [Join our Slack workspace](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw) - Here we talk about research, architecture, and future development.
 - [Join our Discord server](https://discord.gg/ESHStjSjD4) - This is a community-run server for general discussion, questions, and feedback.
 - [Read or post Github Issues](https://github.com/All-Hands-AI/OpenHands/issues) - Check out the issues we're working on, or add your own ideas.

--- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/about.md
+++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/about.md
@@ -27,7 +27,7 @@ Pour plus de détails, veuillez consulter [ce document](https://github.com/All-H

 Nous avons à la fois un espace de travail Slack pour la collaboration sur la construction d'OpenHands et un serveur Discord pour discuter de tout ce qui est lié, par exemple, à ce projet, LLM, agent, etc.

- [Espace de travail Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg)
+- [Espace de travail Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw)
 - [Serveur Discord](https://discord.gg/ESHStjSjD4)

 Si vous souhaitez contribuer, n'hésitez pas à rejoindre notre communauté. Simplifions ensemble l'ingénierie logicielle !
--- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/custom_sandbox_guide.md
+++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/custom_sandbox_guide.md
@@ -98,4 +98,4 @@ Si vous voyez un message d'erreur indiquant que le port est utilisé ou indispon

 ## Discuter

-Pour d'autres problèmes ou questions rejoignez le [Slack](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) ou le [Discord](https://discord.gg/ESHStjSjD4) et demandez!
+Pour d'autres problèmes ou questions rejoignez le [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw) ou le [Discord](https://discord.gg/ESHStjSjD4) et demandez!
--- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/how-to/custom-sandbox-guide.md
+++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/how-to/custom-sandbox-guide.md
@@ -80,4 +80,4 @@ Si vous voyez une erreur concernant un port déjà utilisé ou indisponible, ess

 ## Discuter

-Pour d'autres problèmes ou questions, rejoignez le [Slack](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) ou le [Discord](https://discord.gg/ESHStjSjD4) et demandez !
+Pour d'autres problèmes ou questions, rejoignez le [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw) ou le [Discord](https://discord.gg/ESHStjSjD4) et demandez !
--- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/how-to/openshift-example.md
+++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/how-to/openshift-example.md
@@ -1,338 +0,0 @@
-
-
-# Kubernetes
-
-Il existe différentes façons d'exécuter OpenHands sur Kubernetes ou OpenShift. Ce guide présente une façon possible :
-1. Créer un PV "en tant qu'administrateur du cluster" pour mapper les données workspace_base et le répertoire docker au pod via le nœud worker
-2. Créer un PVC pour pouvoir monter ces PV sur le pod
-3. Créer un pod qui contient deux conteneurs : les conteneurs OpenHands et Sandbox
-
-## Étapes détaillées pour l'exemple ci-dessus
-
-> Remarque : Assurez-vous d'être connecté au cluster avec le compte approprié pour chaque étape. La création de PV nécessite un administrateur de cluster !
-
-> Assurez-vous d'avoir les autorisations de lecture/écriture sur le hostPath utilisé ci-dessous (c'est-à-dire /tmp/workspace)
-
-1. Créer le PV :
-Le fichier yaml d'exemple ci-dessous peut être utilisé par un administrateur de cluster pour créer le PV.
- workspace-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: workspace-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /tmp/workspace
-```
-
-```bash
-# appliquer le fichier yaml
-$ oc create -f workspace-pv.yaml
-persistentvolume/workspace-pv created
-
-# vérifier :
-$ oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
- docker-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: docker-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /var/run/docker.sock
-```
-
-```bash
-# appliquer le fichier yaml
-$ oc create -f docker-pv.yaml
-persistentvolume/docker-pv created
-
-# vérifier :
-oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-docker-pv                                  2Gi        RWO            Retain           Available                                                  6m55s
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
-2. Créer le PVC :
-Exemple de fichier yaml PVC ci-dessous :
-
- workspace-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: workspace-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# créer le pvc
-$ oc create -f workspace-pvc.yaml
-persistentvolumeclaim/workspace-pvc created
-
-# vérifier
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-workspace-pvc   Pending                                      hcloud-volumes   4s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-8s          Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
- docker-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: docker-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# créer le pvc
-$ oc create -f docker-pvc.yaml
-persistentvolumeclaim/docker-pvc created
-
-# vérifier
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Pending                                      hcloud-volumes   4s
-workspace-pvc   Pending                                      hcloud-volumes   2m53s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
-3. Créer le fichier yaml du pod :
-Exemple de fichier yaml de pod ci-dessous :
-
- pod.yaml
-
-```yamlfile
-apiVersion: v1
-kind: Pod
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  containers:
-  - name: openhands-app-2024
-    image: ghcr.io/all-hands-ai/openhands:main
-    env:
-    - name: SANDBOX_USER_ID
-      value: "1000"
-    - name: WORKSPACE_MOUNT_PATH
-      value: "/opt/workspace_base"
-    volumeMounts:
-    - name: workspace-volume
-      mountPath: /opt/workspace_base
-    - name: docker-sock
-      mountPath: /var/run/docker.sock
-    ports:
-    - containerPort: 3000
-  - name: openhands-sandbox-2024
-    image: ghcr.io/all-hands-ai/sandbox:main
-    ports:
-    - containerPort: 51963
-    command: ["/usr/sbin/sshd", "-D", "-p 51963", "-o", "PermitRootLogin=yes"]
-  volumes:
-  - name: workspace-volume
-    persistentVolumeClaim:
-      claimName: workspace-pvc
-  - name: docker-sock
-    persistentVolumeClaim:
-      claimName: docker-pvc
-```
-
-
-```bash
-# créer le pod
-$ oc create -f pod.yaml
-W0716 11:22:07.776271  107626 warnings.go:70] would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
-pod/openhands-app-2024 created
-
-# L'avertissement ci-dessus peut être ignoré pour l'instant car nous ne modifierons pas les restrictions SCC.
-
-# vérifier
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   0/2     Pending   0          5s
-
-$ oc get pods
-NAME                 READY   STATUS              RESTARTS   AGE
-openhands-app-2024   0/2     ContainerCreating   0          15s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                   OBJECT                                MESSAGE
-38s         Normal   WaitForFirstConsumer     persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-23s         Normal   ExternalProvisioning     persistentvolumeclaim/docker-pvc      waiting for a volume to be created, either by external provisioner "csi.hetzner.cloud" or manually created by system administrator
-27s         Normal   Provisioning             persistentvolumeclaim/docker-pvc      External provisioner is provisioning volume for claim "openhands/docker-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/docker-pvc      Successfully provisioned volume pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252
-16s         Normal   Scheduled                pod/openhands-app-2024                Successfully assigned All-Hands-AI/OpenHands-app-2024 to worker1.hub.internal.blakane.com
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252"
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-31f15b25-faad-4665-a25f-201a530379af"
-6s          Normal   AddedInterface           pod/openhands-app-2024                Add eth0 [10.128.2.48/23] from openshift-sdn
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "ghcr.io/all-hands-ai/openhands:main" already present on machine
-6s          Normal   Created                  pod/openhands-app-2024                Created container openhands-app-2024
-6s          Normal   Started                  pod/openhands-app-2024                Started container openhands-app-2024
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "ghcr.io/all-hands-ai/sandbox:main" already present on machine
-5s          Normal   Created                  pod/openhands-app-2024                Created container openhands-sandbox-2024
-5s          Normal   Started                  pod/openhands-app-2024                Started container openhands-sandbox-2024
-83s         Normal   WaitForFirstConsumer     persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-27s         Normal   Provisioning             persistentvolumeclaim/workspace-pvc   External provisioner is provisioning volume for claim "openhands/workspace-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/workspace-pvc   Successfully provisioned volume pvc-31f15b25-faad-4665-a25f-201a530379af
-
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   2/2     Running   0          23s
-
-$ oc get pvc
-NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Bound    pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252   10Gi       RWO            hcloud-volumes   10m
-workspace-pvc   Bound    pvc-31f15b25-faad-4665-a25f-201a530379af   10Gi       RWO            hcloud-volumes   13m
-
-```
-
-4. Créer un service NodePort.
-Exemple de commande de création de service ci-dessous :
-
-```bash
-# créer le service de type NodePort
-$ oc create svc nodeport  openhands-app-2024  --tcp=3000:3000
-service/openhands-app-2024 created
-
-# vérifier
-
-$ oc get svc
-NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
-openhands-app-2024   NodePort   172.30.225.42   <none>        3000:30495/TCP   4s
-
-$ oc describe svc openhands-app-2024
-Name:                     openhands-app-2024
-Namespace:                openhands
-Labels:                   app=openhands-app-2024
-Annotations:              <none>
-Selector:                 app=openhands-app-2024
-Type:                     NodePort
-IP Family Policy:         SingleStack
-IP Families:              IPv4
-IP:                       172.30.225.42
-IPs:                      172.30.225.42
-Port:                     3000-3000  3000/TCP
-TargetPort:               3000/TCP
-NodePort:                 3000-3000  30495/TCP
-Endpoints:                10.128.2.48:3000
-Session Affinity:         None
-External Traffic Policy:  Cluster
-Events:                   <none>
-```
-
-6. Se connecter à l'interface utilisateur d'OpenHands, configurer l'Agent, puis tester :
-
-![image](https://github.com/user-attachments/assets/12f94804-a0c7-4744-b873-e003c9caf40e)
-
-
-
-## Déploiement d'Openhands sur GCP GKE
-
-**Avertissement** : ce déploiement accorde à l'application OpenHands l'accès au socket docker de Kubernetes, ce qui crée un risque de sécurité. Utilisez à vos propres risques.
-1- Créer une politique pour l'accès privilégié
-2- Créer des informations d'identification gke (facultatif)
-3- Créer le déploiement openhands
-4- Commandes de vérification et d'accès à l'interface utilisateur
-5- Dépanner le pod pour vérifier le conteneur interne
-
-1. créer une politique pour l'accès privilégié
-```bash
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRole
-metadata:
-  name: privileged-role
-rules:
- apiGroups: [""]
-  resources: ["pods"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: ["apps"]
-  resources: ["deployments"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: [""]
-  resources: ["pods/exec"]
-  verbs: ["create"]
- apiGroups: [""]
-  resources: ["pods/log"]
-  verbs: ["get"]
---
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
-  name: privileged-role-binding
-roleRef:
-  apiGroup: rbac.authorization.k8s.io
-  kind: ClusterRole
-  name: privileged-role
-subjects:
- kind: ServiceAccount
-  name: default  # Remplacez par le nom de votre compte de service
-  namespace: default
-```
-2. créer des informations d'identification gke (facultatif)
-```bash
-kubectl create secret generic google-cloud-key \
-  --from-file=key.json=/path/to/your/google-cloud-key.json
-  ```
-3. créer le déploiement openhands
-## comme cela est testé pour le nœud worker unique, si vous en avez plusieurs, spécifiez l'indicateur pour le worker unique
-
-```bash
-kind: Deployment
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  replicas: 1  # Vous pouvez augmenter ce nombre pour plusieurs réplicas
-  selector:
-    matchLabels:
-      app: openhands-app-2024
-  template:
-    metadata:
-      labels:
-        app: openhands-app-2024
-    spec:
-      containers:
-      -
--- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/intro.mdx
+++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/usage/intro.mdx
@@ -42,7 +42,7 @@ Explorez le code source d'OpenHands sur [GitHub](https://github.com/All-Hands-AI
  />
 </a>
 <br></br>
-<a href="https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA">
+<a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw">
  <img
    src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge"
    alt="Join our Slack community"
--- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/about.md
+++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/about.md
@@ -27,7 +27,7 @@ OpenHands 是一个社区驱动的项目，我们欢迎每个人的贡献。无

 我们有 Slack 工作区用于协作构建 OpenHands，也有 Discord 服务器用于讨论任何相关的内容，例如此项目、大语言模型、代理等。

- [Slack 工作区](https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg)
+- [Slack 工作区](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw)
 - [Discord 服务器](https://discord.gg/ESHStjSjD4)

 如果你想做出贡献，欢迎加入我们的社区。让我们一起简化软件工程！
--- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/custom_sandbox_guide.md
+++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/custom_sandbox_guide.md
@@ -99,4 +99,4 @@ sandbox_user_id="1001"

 ## 讨论

-对于其他问题或疑问，请加入 [Slack](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) 或 [Discord](https://discord.gg/ESHStjSjD4) 提问！
+对于其他问题或疑问，请加入 [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw) 或 [Discord](https://discord.gg/ESHStjSjD4) 提问！
--- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/how-to/custom-sandbox-guide.md
+++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/how-to/custom-sandbox-guide.md
@@ -78,4 +78,4 @@ sandbox_user_id="1001"

 ## 讨论

-对于其他问题或疑问，请加入 [Slack](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) 或 [Discord](https://discord.gg/ESHStjSjD4) 并提问！
+对于其他问题或疑问，请加入 [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw) 或 [Discord](https://discord.gg/ESHStjSjD4) 并提问！
--- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/how-to/openshift-example.md
+++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/how-to/openshift-example.md
@@ -1,343 +0,0 @@
-以下是翻译后的内容:
-
-# Kubernetes
-
-在 Kubernetes 或 OpenShift 上运行 OpenHands 有不同的方式。本指南介绍了一种可能的方式:
-1. 作为集群管理员,创建一个 PV 将 workspace_base 数据和 docker 目录映射到 worker 节点上的 pod
-2. 创建一个 PVC 以便将这些 PV 挂载到 pod
-3. 创建一个包含两个容器的 pod:OpenHands 和 Sandbox 容器
-
-## 上述示例的详细步骤
-
-> 注意:确保首先使用适当的帐户登录到集群以执行每个步骤。创建 PV 需要集群管理员权限!
-
-> 确保你对下面使用的 hostPath(即 /tmp/workspace)有读写权限
-
-1. 创建 PV:
-集群管理员可以使用下面的示例 yaml 文件创建 PV。
- workspace-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: workspace-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /tmp/workspace
-```
-
-```bash
-# 应用 yaml 文件
-$ oc create -f workspace-pv.yaml
-persistentvolume/workspace-pv created
-
-# 查看:
-$ oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
- docker-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: docker-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /var/run/docker.sock
-```
-
-```bash
-# 应用 yaml 文件
-$ oc create -f docker-pv.yaml
-persistentvolume/docker-pv created
-
-# 查看:
-oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-docker-pv                                  2Gi        RWO            Retain           Available                                                  6m55s
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
-2. 创建 PVC:
-下面是示例 PVC yaml 文件:
-
- workspace-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: workspace-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# 创建 pvc
-$ oc create -f workspace-pvc.yaml
-persistentvolumeclaim/workspace-pvc created
-
-# 查看
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-workspace-pvc   Pending                                      hcloud-volumes   4s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-8s          Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
- docker-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: docker-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# 创建 pvc
-$ oc create -f docker-pvc.yaml
-persistentvolumeclaim/docker-pvc created
-
-# 查看
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Pending                                      hcloud-volumes   4s
-workspace-pvc   Pending                                      hcloud-volumes   2m53s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
-3. 创建 pod yaml 文件:
-下面是示例 pod yaml 文件:
-
- pod.yaml
-
-```yamlfile
-apiVersion: v1
-kind: Pod
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  containers:
-  - name: openhands-app-2024
-    image: ghcr.io/all-hands-ai/openhands:main
-    env:
-    - name: SANDBOX_USER_ID
-      value: "1000"
-    - name: WORKSPACE_MOUNT_PATH
-      value: "/opt/workspace_base"
-    volumeMounts:
-    - name: workspace-volume
-      mountPath: /opt/workspace_base
-    - name: docker-sock
-      mountPath: /var/run/docker.sock
-    ports:
-    - containerPort: 3000
-  - name: openhands-sandbox-2024
-    image: ghcr.io/all-hands-ai/sandbox:main
-    ports:
-    - containerPort: 51963
-    command: ["/usr/sbin/sshd", "-D", "-p 51963", "-o", "PermitRootLogin=yes"]
-  volumes:
-  - name: workspace-volume
-    persistentVolumeClaim:
-      claimName: workspace-pvc
-  - name: docker-sock
-    persistentVolumeClaim:
-      claimName: docker-pvc
-```
-
-
-```bash
-# 创建 pod
-$ oc create -f pod.yaml
-W0716 11:22:07.776271  107626 warnings.go:70] would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
-pod/openhands-app-2024 created
-
-# 上面的警告可以暂时忽略,因为我们不会修改 SCC 限制。
-
-# 查看
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   0/2     Pending   0          5s
-
-$ oc get pods
-NAME                 READY   STATUS              RESTARTS   AGE
-openhands-app-2024   0/2     ContainerCreating   0          15s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                   OBJECT                                MESSAGE
-38s         Normal   WaitForFirstConsumer     persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-23s         Normal   ExternalProvisioning     persistentvolumeclaim/docker-pvc      waiting for a volume to be created, either by external provisioner "csi.hetzner.cloud" or manually created by system administrator
-27s         Normal   Provisioning             persistentvolumeclaim/docker-pvc      External provisioner is provisioning volume for claim "openhands/docker-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/docker-pvc      Successfully provisioned volume pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252
-16s         Normal   Scheduled                pod/openhands-app-2024                Successfully assigned All-Hands-AI/OpenHands-app-2024 to worker1.hub.internal.blakane.com
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252"
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-31f15b25-faad-4665-a25f-201a530379af"
-6s          Normal   AddedInterface           pod/openhands-app-2024                Add eth0 [10.128.2.48/23] from openshift-sdn
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "ghcr.io/all-hands-ai/openhands:main" already present on machine
-6s          Normal   Created                  pod/openhands-app-2024                Created container openhands-app-2024
-6s          Normal   Started                  pod/openhands-app-2024                Started container openhands-app-2024
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "ghcr.io/all-hands-ai/sandbox:main" already present on machine
-5s          Normal   Created                  pod/openhands-app-2024                Created container openhands-sandbox-2024
-5s          Normal   Started                  pod/openhands-app-2024                Started container openhands-sandbox-2024
-83s         Normal   WaitForFirstConsumer     persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-27s         Normal   Provisioning             persistentvolumeclaim/workspace-pvc   External provisioner is provisioning volume for claim "openhands/workspace-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/workspace-pvc   Successfully provisioned volume pvc-31f15b25-faad-4665-a25f-201a530379af
-
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   2/2     Running   0          23s
-
-$ oc get pvc
-NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Bound    pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252   10Gi       RWO            hcloud-volumes   10m
-workspace-pvc   Bound    pvc-31f15b25-faad-4665-a25f-201a530379af   10Gi       RWO            hcloud-volumes   13m
-
-```
-
-4. 创建一个 NodePort 服务。
-下面是示例服务创建命令:
-
-```bash
-# 创建 NodePort 类型的服务
-$ oc create svc nodeport  openhands-app-2024  --tcp=3000:3000
-service/openhands-app-2024 created
-
-# 查看
-
-$ oc get svc
-NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
-openhands-app-2024   NodePort   172.30.225.42   <none>        3000:30495/TCP   4s
-
-$ oc describe svc openhands-app-2024
-Name:                     openhands-app-2024
-Namespace:                openhands
-Labels:                   app=openhands-app-2024
-Annotations:              <none>
-Selector:                 app=openhands-app-2024
-Type:                     NodePort
-IP Family Policy:         SingleStack
-IP Families:              IPv4
-IP:                       172.30.225.42
-IPs:                      172.30.225.42
-Port:                     3000-3000  3000/TCP
-TargetPort:               3000/TCP
-NodePort:                 3000-3000  30495/TCP
-Endpoints:                10.128.2.48:3000
-Session Affinity:         None
-External Traffic Policy:  Cluster
-Events:                   <none>
-```
-
-6. 连接到 OpenHands UI,配置 Agent,然后测试:
-
-![image](https://github.com/user-attachments/assets/12f94804-a0c7-4744-b873-e003c9caf40e)
-
-
-
-## GCP GKE OpenHands 部署
-
-**警告**:此部署授予 OpenHands 应用程序访问 Kubernetes docker socket 的权限,这会带来安全风险。请自行决定是否使用。
-1- 创建特权访问策略
-2- 创建 gke 凭证(可选)
-3- 创建 openhands 部署
-4- 验证和 UI 访问命令
-5- 排查 pod 以验证内部容器
-
-1. 创建特权访问策略
-```bash
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRole
-metadata:
-  name: privileged-role
-rules:
- apiGroups: [""]
-  resources: ["pods"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: ["apps"]
-  resources: ["deployments"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: [""]
-  resources: ["pods/exec"]
-  verbs: ["create"]
- apiGroups: [""]
-  resources: ["pods/log"]
-  verbs: ["get"]
---
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
-  name: privileged-role-binding
-roleRef:
-  apiGroup: rbac.authorization.k8s.io
-  kind: ClusterRole
-  name: privileged-role
-subjects:
- kind: ServiceAccount
-  name: default  # 更改为你的服务帐户名称
-  namespace: default
-```
-2. 创建 gke 凭证(可选)
-```bash
-kubectl create secret generic google-cloud-key \
-  --from-file=key.json=/path/to/your/google-cloud-key.json
-  ```
-3. 创建 openhands 部署
-## 由于这是针对单个工作节点进行测试的,如果你有多个节点,请指定单个工作节点的标志
-
-```bash
-kind: Deployment
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  replicas: 1  # 你可以增加这个数字以获得多个副本
-  selector:
-    matchLabels:
-      app: openhands-app-2024
-  template:
-    metadata:
-      labels:
-        app: openhands-app-2024
-    spec:
-      containers:
-      - name: openhands-app-2024
-        image: ghcr.io/all-hands-ai/openhands:main
-        env:
-        - name: SANDBOX_USER_ID
-          value: "1000"
-        - name: SANDBOX_API
--- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/intro.mdx
+++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/current/usage/intro.mdx
@@ -42,7 +42,7 @@ OpenHands 是一个**自主 AI 软件工程师**，能够执行复杂的工程
  />
 </a>
 <br></br>
-<a href="https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA">
+<a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw">
  <img
    src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge"
    alt="Join our Slack community"
--- a/docs/modules/usage/how-to/github-action.md
+++ b/docs/modules/usage/how-to/github-action.md
@@ -39,23 +39,28 @@ You can provide custom directions for OpenHands by following the [README for the

 ### Custom configurations

-Github resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. The customization options you can set are:
+Github resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior.
+The customization options you can set are:

-| **Attribute name**               | **Type** | **Purpose**                                                                                         | **Example**                                     |
-| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
-| `OPENHANDS_MAX_ITER`             | Variable | Set max limit for agent iterations                                                                  | `OPENHANDS_MAX_ITER=10`                         |
-| `OPENHANDS_MACRO`                | Variable | Customize default macro for invoking the resolver                                                   | `OPENHANDS_MACRO=@resolveit`                    |
-| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` |
+| **Attribute name**               | **Type** | **Purpose**                                                                                                 | **Example**                                          |
+|----------------------------------| -------- |-------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
+| `LLM_MODEL`                      | Variable | Set the LLM to use with OpenHands                                                                           | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"`   |
+| `OPENHANDS_MAX_ITER`             | Variable | Set max limit for agent iterations                                                                          | `OPENHANDS_MAX_ITER=10`                              |
+| `OPENHANDS_MACRO`                | Variable | Customize default macro for invoking the resolver                                                           | `OPENHANDS_MACRO=@resolveit`                         |
+| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide))         | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"`      |

 ## Writing Effective .openhands_instructions Files

-The `.openhands_instructions` file is a file that you can put in the root directory of your repository to guide OpenHands in understanding and working with your repository effectively. Here are key tips for writing high-quality instructions:
+The `.openhands_instructions` file is a file that you can put in the root directory of your repository to guide OpenHands
+in understanding and working with your repository effectively. Here are key tips for writing high-quality instructions:

 ### Core Principles

-1. **Concise but Informative**: Provide a clear, focused overview of the repository that emphasizes the most common actions OpenHands will need to perform.
+1. **Concise but Informative**: Provide a clear, focused overview of the repository that emphasizes the most common
+     actions OpenHands will need to perform.

-2. **Repository Structure**: Explain the key directories and their purposes, especially highlighting where different types of code (e.g., frontend, backend) are located.
+2. **Repository Structure**: Explain the key directories and their purposes, especially highlighting where different
+     types of code (e.g., frontend, backend) are located.

 3. **Development Workflows**: Document the essential commands for:

--- a/docs/modules/usage/how-to/openshift-example.md
+++ b/docs/modules/usage/how-to/openshift-example.md
@@ -1,429 +0,0 @@
-# Kubernetes
-
-There are different ways you might run OpenHands on Kubernetes or OpenShift. This guide goes through one possible way:
-1. Create a PV "as a cluster admin" to map workspace_base data and docker directory to the pod through the worker node
-2. Create a PVC to be able to mount those PVs to the pod
-3. Create a pod which contains two containers; the OpenHands and Sandbox containers
-
-## Detailed Steps for the Example Above
-
-> Note: Make sure you are logged in to the cluster first with the proper account for each step. PV creation requires cluster administrator!
-
-> Make sure you have read/write permissions on the hostPath used below (i.e. /tmp/workspace)
-
-1. Create the PV:
-Sample yaml file below can be used by a cluster admin to create the PV.
- workspace-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: workspace-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /tmp/workspace
-```
-
-```bash
-# apply yaml file
-$ oc create -f workspace-pv.yaml
-persistentvolume/workspace-pv created
-
-# review:
-$ oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
- docker-pv.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: docker-pv
-spec:
-  capacity:
-    storage: 2Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /var/run/docker.sock
-```
-
-```bash
-# apply yaml file
-$ oc create -f docker-pv.yaml
-persistentvolume/docker-pv created
-
-# review:
-oc get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                STORAGECLASS     REASON   AGE
-docker-pv                                  2Gi        RWO            Retain           Available                                                  6m55s
-workspace-pv                               2Gi        RWO            Retain           Available                                                  7m23s
-```
-
-2. Create the PVC:
-Sample PVC yaml file below:
-
- workspace-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: workspace-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# create the pvc
-$ oc create -f workspace-pvc.yaml
-persistentvolumeclaim/workspace-pvc created
-
-# review
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-workspace-pvc   Pending                                      hcloud-volumes   4s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-8s          Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
- docker-pvc.yaml
-
-```yamlfile
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: docker-pvc
-spec:
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1Gi
-```
-
-```bash
-# create pvc
-$ oc create -f docker-pvc.yaml
-persistentvolumeclaim/docker-pvc created
-
-# review
-$ oc get pvc
-NAME            STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Pending                                      hcloud-volumes   4s
-workspace-pvc   Pending                                      hcloud-volumes   2m53s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                 OBJECT                                MESSAGE
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-10s         Normal   WaitForFirstConsumer   persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-```
-
-3. Create the pod yaml file:
-Sample pod yaml file below:
-
- pod.yaml
-
-```yamlfile
-apiVersion: v1
-kind: Pod
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  containers:
-  - name: openhands-app-2024
-    image: docker.all-hands.dev/all-hands-ai/openhands:main
-    env:
-    - name: SANDBOX_USER_ID
-      value: "1000"
-    - name: WORKSPACE_MOUNT_PATH
-      value: "/opt/workspace_base"
-    volumeMounts:
-    - name: workspace-volume
-      mountPath: /opt/workspace_base
-    - name: docker-sock
-      mountPath: /var/run/docker.sock
-    ports:
-    - containerPort: 3000
-  - name: openhands-sandbox-2024
-    image: docker.all-hands.dev/all-hands-ai/runtime:main
-    ports:
-    - containerPort: 51963
-    command: ["/usr/sbin/sshd", "-D", "-p 51963", "-o", "PermitRootLogin=yes"]
-  volumes:
-  - name: workspace-volume
-    persistentVolumeClaim:
-      claimName: workspace-pvc
-  - name: docker-sock
-    persistentVolumeClaim:
-      claimName: docker-pvc
-```
-
-
-```bash
-# create the pod
-$ oc create -f pod.yaml
-W0716 11:22:07.776271  107626 warnings.go:70] would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "openhands-app-2024", "openhands-sandbox-2024" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
-pod/openhands-app-2024 created
-
-# Above warning can be ignored for now as we will not modify SCC restrictions.
-
-# review
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   0/2     Pending   0          5s
-
-$ oc get pods
-NAME                 READY   STATUS              RESTARTS   AGE
-openhands-app-2024   0/2     ContainerCreating   0          15s
-
-$ oc get events
-LAST SEEN   TYPE     REASON                   OBJECT                                MESSAGE
-38s         Normal   WaitForFirstConsumer     persistentvolumeclaim/docker-pvc      waiting for first consumer to be created before binding
-23s         Normal   ExternalProvisioning     persistentvolumeclaim/docker-pvc      waiting for a volume to be created, either by external provisioner "csi.hetzner.cloud" or manually created by system administrator
-27s         Normal   Provisioning             persistentvolumeclaim/docker-pvc      External provisioner is provisioning volume for claim "openhands/docker-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/docker-pvc      Successfully provisioned volume pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252
-16s         Normal   Scheduled                pod/openhands-app-2024                Successfully assigned All-Hands-AI/OpenHands-app-2024 to worker1.hub.internal.blakane.com
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252"
-9s          Normal   SuccessfulAttachVolume   pod/openhands-app-2024                AttachVolume.Attach succeeded for volume "pvc-31f15b25-faad-4665-a25f-201a530379af"
-6s          Normal   AddedInterface           pod/openhands-app-2024                Add eth0 [10.128.2.48/23] from openshift-sdn
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "docker.all-hands.dev/all-hands-ai/openhands:main" already present on machine
-6s          Normal   Created                  pod/openhands-app-2024                Created container openhands-app-2024
-6s          Normal   Started                  pod/openhands-app-2024                Started container openhands-app-2024
-6s          Normal   Pulled                   pod/openhands-app-2024                Container image "docker.all-hands.dev/all-hands-ai/sandbox:main" already present on machine
-5s          Normal   Created                  pod/openhands-app-2024                Created container openhands-sandbox-2024
-5s          Normal   Started                  pod/openhands-app-2024                Started container openhands-sandbox-2024
-83s         Normal   WaitForFirstConsumer     persistentvolumeclaim/workspace-pvc   waiting for first consumer to be created before binding
-27s         Normal   Provisioning             persistentvolumeclaim/workspace-pvc   External provisioner is provisioning volume for claim "openhands/workspace-pvc"
-17s         Normal   ProvisioningSucceeded    persistentvolumeclaim/workspace-pvc   Successfully provisioned volume pvc-31f15b25-faad-4665-a25f-201a530379af
-
-$ oc get pods
-NAME                 READY   STATUS    RESTARTS   AGE
-openhands-app-2024   2/2     Running   0          23s
-
-$ oc get pvc
-NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
-docker-pvc      Bound    pvc-2b1d223a-1c8f-4990-8e3d-68061a9ae252   10Gi       RWO            hcloud-volumes   10m
-workspace-pvc   Bound    pvc-31f15b25-faad-4665-a25f-201a530379af   10Gi       RWO            hcloud-volumes   13m
-
-```
-
-4. Create a NodePort service.
-Sample service creation command below:
-
-```bash
-# create the service of type NodePort
-$ oc create svc nodeport  openhands-app-2024  --tcp=3000:3000
-service/openhands-app-2024 created
-
-# review
-
-$ oc get svc
-NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
-openhands-app-2024   NodePort   172.30.225.42   <none>        3000:30495/TCP   4s
-
-$ oc describe svc openhands-app-2024
-Name:                     openhands-app-2024
-Namespace:                openhands
-Labels:                   app=openhands-app-2024
-Annotations:              <none>
-Selector:                 app=openhands-app-2024
-Type:                     NodePort
-IP Family Policy:         SingleStack
-IP Families:              IPv4
-IP:                       172.30.225.42
-IPs:                      172.30.225.42
-Port:                     3000-3000  3000/TCP
-TargetPort:               3000/TCP
-NodePort:                 3000-3000  30495/TCP
-Endpoints:                10.128.2.48:3000
-Session Affinity:         None
-External Traffic Policy:  Cluster
-Events:                   <none>
-```
-
-6. Connect to OpenHands UI, configure the Agent, then test:
-
-![image](https://github.com/user-attachments/assets/12f94804-a0c7-4744-b873-e003c9caf40e)
-
-
-
-## GCP GKE Openhands deployment
-
-**Warning**: this deployment grants the OpenHands application access to the Kubernetes docker socket, which creates security risk. Use at your own discretion.
-1- Create policy for privillege access
-2- Create gke credentials(optional)
-3- Create openhands deployment
-4- Verification and ui access commands
-5- Tshoot pod to verify the internal container
-
-1. create policy for privillege access
-```bash
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRole
-metadata:
-  name: privileged-role
-rules:
- apiGroups: [""]
-  resources: ["pods"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: ["apps"]
-  resources: ["deployments"]
-  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: [""]
-  resources: ["pods/exec"]
-  verbs: ["create"]
- apiGroups: [""]
-  resources: ["pods/log"]
-  verbs: ["get"]
---
-apiVersion: rbac.authorization.k8s.io/v1
-kind: ClusterRoleBinding
-metadata:
-  name: privileged-role-binding
-roleRef:
-  apiGroup: rbac.authorization.k8s.io
-  kind: ClusterRole
-  name: privileged-role
-subjects:
- kind: ServiceAccount
-  name: default  # Change to your service account name
-  namespace: default
-```
-2. create gke credentials(optional)
-```bash
-kubectl create secret generic google-cloud-key \
-  --from-file=key.json=/path/to/your/google-cloud-key.json
-  ```
-3. create openhands deployment
-## as this is tested for the single worker node if you have multiple specify the flag for the single worker
-
-```bash
-kind: Deployment
-metadata:
-  name: openhands-app-2024
-  labels:
-    app: openhands-app-2024
-spec:
-  replicas: 1  # You can increase this number for multiple replicas
-  selector:
-    matchLabels:
-      app: openhands-app-2024
-  template:
-    metadata:
-      labels:
-        app: openhands-app-2024
-    spec:
-      containers:
-      - name: openhands-app-2024
-        image: docker.all-hands.dev/all-hands-ai/openhands:main
-        env:
-        - name: SANDBOX_USER_ID
-          value: "1000"
-        - name: SANDBOX_API_HOSTNAME
-          value: '10.164.0.4'
-        - name: WORKSPACE_MOUNT_PATH
-          value: "/tmp/workspace_base"
-        - name: GOOGLE_APPLICATION_CREDENTIALS
-          value: "/tmp/workspace_base/google-cloud-key.json"
-        volumeMounts:
-        - name: workspace-volume
-          mountPath: /tmp/workspace_base
-        - name: docker-sock
-          mountPath: /var/run/docker.sock
-        - name: google-credentials
-          mountPath: "/tmp/workspace_base/google-cloud-key.json"
-        securityContext:
-          privileged: true  # Add this to allow privileged access
-        ports:
-        - containerPort: 3000
-      - name: openhands-sandbox-2024
-        image: docker.all-hands.dev/all-hands-ai/runtime:main
-    #    securityContext:
-    #      privileged: true  # Add this to allow privileged access
-        ports:
-        - containerPort: 51963
-        command: ["/usr/sbin/sshd", "-D", "-p 51963", "-o", "PermitRootLogin=yes"]
-      volumes:
-      #- name: workspace-volume
-      #  persistentVolumeClaim:
-      #    claimName: workspace-pvc
-      - name: workspace-volume
-        emptyDir: {}
-      - name: docker-sock
-        hostPath:
-          path: /var/run/docker.sock       # Use host's Docker socket
-          type: Socket
-      - name: google-credentials
-        secret:
-          secretName: google-cloud-key
---
-apiVersion: v1
-kind: Service
-metadata:
-  name: openhands-app-2024-svc
-spec:
-  selector:
-    app: openhands-app-2024
-  ports:
-  - name: http
-    protocol: TCP
-    port: 80
-    targetPort: 3000
-  - name: ssh
-    protocol: TCP
-    port: 51963
-    targetPort: 51963
-  type: LoadBalancer
-  ```
-
-5. Tshoot pod to verify the internal container
-### if you want to know more regarding the internal container runtime use below mention pod deployment use kubectl exec -it to enter into container and you can check the contaienr run time using normal docker commands like "docker ps -a"
-
-```bash
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: docker-in-docker
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: docker-in-docker
-  template:
-    metadata:
-      labels:
-        app: docker-in-docker
-    spec:
-      containers:
-      - name: dind
-        image: docker:20.10-dind
-        securityContext:
-          privileged: true
-        volumeMounts:
-        - name: docker-sock
-          mountPath: /var/run/docker.sock
-      volumes:
-      - name: docker-sock
-        hostPath:
-          path: /var/run/docker.sock
-          type: Socket
-```
--- a/docs/modules/usage/runtimes.md
+++ b/docs/modules/usage/runtimes.md
@@ -28,12 +28,22 @@ You can also [build your own runtime image](how-to/custom-sandbox-guide).
 ### Connecting to Your filesystem
 One useful feature here is the ability to connect to your local filesystem.

-To mount your filesystem into the runtime, add the following options to
-the `docker run` command:
-
+To mount your filesystem into the runtime, first set WORKSPACE_BASE:
 ```bash
 export WORKSPACE_BASE=/path/to/your/code

+# Linux and Mac Example
+# export WORKSPACE_BASE=$HOME/OpenHands
+# Will set $WORKSPACE_BASE to /home/<username>/OpenHands
+#
+# WSL on Windows Example
+# export WORKSPACE_BASE=/mnt/c/dev/OpenHands
+# Will set $WORKSPACE_BASE to C:\dev\OpenHands
+```
+
+then add the following options to the `docker run` command:
+
+```bash
 docker run # ...
    -e SANDBOX_USER_ID=$(id -u) \
    -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
--- a/docs/package-lock.json
+++ b/docs/package-lock.json
@@ -14,7 +14,7 @@
        "@docusaurus/theme-mermaid": "^3.6.3",
        "@mdx-js/react": "^3.1.0",
        "clsx": "^2.0.0",
-        "prism-react-renderer": "^2.4.0",
+        "prism-react-renderer": "^2.4.1",
        "react": "^18.3.1",
        "react-dom": "^18.3.1",
        "react-icons": "^5.4.0",
@@ -14781,9 +14781,9 @@
      }
    },
    "node_modules/prism-react-renderer": {
-      "version": "2.4.0",
-      "resolved": "https://registry.npmjs.org/prism-react-renderer/-/prism-react-renderer-2.4.0.tgz",
-      "integrity": "sha512-327BsVCD/unU4CNLZTWVHyUHKnsqcvj2qbPlQ8MiBE2eq2rgctjigPA1Gp9HLF83kZ20zNN6jgizHJeEsyFYOw==",
+      "version": "2.4.1",
+      "resolved": "https://registry.npmjs.org/prism-react-renderer/-/prism-react-renderer-2.4.1.tgz",
+      "integrity": "sha512-ey8Ls/+Di31eqzUxC46h8MksNuGx/n0AAC8uKpwFau4RPDYLuE3EXTp8N8G2vX2N7UC/+IXeNUnlWBGGcAG+Ig==",
      "dependencies": {
        "@types/prismjs": "^1.26.0",
        "clsx": "^2.0.0"
--- a/docs/package.json
+++ b/docs/package.json
@@ -21,7 +21,7 @@
    "@docusaurus/theme-mermaid": "^3.6.3",
    "@mdx-js/react": "^3.1.0",
    "clsx": "^2.0.0",
-    "prism-react-renderer": "^2.4.0",
+    "prism-react-renderer": "^2.4.1",
    "react": "^18.3.1",
    "react-dom": "^18.3.1",
    "react-icons": "^5.4.0",
--- a/docs/sidebars.ts
+++ b/docs/sidebars.ts
@@ -168,11 +168,6 @@ const sidebars: SidebarsConfig = {
          label: 'Evaluation',
          id: 'usage/how-to/evaluation-harness',
        },
-        {
-          type: 'doc',
-          label: 'Kubernetes Deployment',
-          id: 'usage/how-to/openshift-example',
-        },
      ],
    },
    {
--- a/docs/src/components/CustomFooter.tsx
+++ b/docs/src/components/CustomFooter.tsx
@@ -8,7 +8,7 @@ function CustomFooter() {
    <footer className="custom-footer">
      <div className="footer-content">
        <div className="footer-icons">
-          <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg" target="_blank" rel="noopener noreferrer">
+          <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw" target="_blank" rel="noopener noreferrer">
            <FaSlack />
          </a>
          <a href="https://discord.gg/ESHStjSjD4" target="_blank" rel="noopener noreferrer">
--- a/docs/src/components/HomepageHeader/HomepageHeader.tsx
+++ b/docs/src/components/HomepageHeader/HomepageHeader.tsx
@@ -23,7 +23,7 @@ export function HomepageHeader() {
          <a href="https://codecov.io/github/All-Hands-AI/OpenHands?branch=main"><img alt="CodeCov" src="https://img.shields.io/codecov/c/github/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" /></a>
          <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License" /></a>
          <br/>
-          <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2vbfigwev-G03twSpXaErwzYVD4CFiBg"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community" /></a>
+          <a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-2wkh4pklz-w~h_DVDtEe9H5kyQlcNxVw"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community" /></a>
          <a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Join our Discord community" /></a>
          <a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="Credits" /></a>
          <br/>
--- a/evaluation/README.md
+++ b/evaluation/README.md
@@ -42,7 +42,7 @@ temperature = 0.0

 ## Supported Benchmarks

-The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), and [miscellaneous assistance](#misc-assistance) tasks.
+The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.

 ### Software Engineering

@@ -73,6 +73,10 @@ The OpenHands evaluation harness supports a wide variety of benchmarks across [s
 - ProofWriter: [`evaluation/benchmarks/logic_reasoning`](./benchmarks/logic_reasoning)
 - ScienceAgentBench: [`evaluation/benchmarks/scienceagentbench`](./benchmarks/scienceagentbench)

+### Real World
+
+- TheAgentCompany: [`evaluation/benchmarks/the_agent_company`](./benchmarks/the_agent_company)
+
 ## Result Visualization

 Check [this huggingface space](https://huggingface.co/spaces/OpenHands/evaluation) for visualization of existing experimental results.
--- a/evaluation/benchmarks/logic_reasoning/run_infer.py
+++ b/evaluation/benchmarks/logic_reasoning/run_infer.py
@@ -272,7 +272,7 @@ if __name__ == '__main__':
        default='ProofWriter',
    )
    parser.add_argument(
-        '--data_split',
+        '--data-split',
        type=str,
        help='data split to evaluate on {validation}',  # right now we only support validation split
        default='validation',
--- a/evaluation/benchmarks/scienceagentbench/run_infer.py
+++ b/evaluation/benchmarks/scienceagentbench/run_infer.py
@@ -251,7 +251,7 @@ If the program uses some packages that are incompatible, please figure out alter
 if __name__ == '__main__':
    parser = get_parser()
    parser.add_argument(
-        '--use_knowledge',
+        '--use-knowledge',
        type=str,
        default='false',
        choices=['true', 'false'],
--- a/evaluation/benchmarks/scienceagentbench/scripts/run_infer.sh
+++ b/evaluation/benchmarks/scienceagentbench/scripts/run_infer.sh
@@ -35,7 +35,7 @@ echo "MODEL_CONFIG: $MODEL_CONFIG"
 COMMAND="poetry run python evaluation/benchmarks/scienceagentbench/run_infer.py \
  --agent-cls $AGENT \
  --llm-config $MODEL_CONFIG \
-  --use_knowledge $USE_KNOWLEDGE \
+  --use-knowledge $USE_KNOWLEDGE \
  --max-iterations 30 \
  --eval-num-workers $NUM_WORKERS \
  --eval-note $OPENHANDS_VERSION" \
--- a/evaluation/benchmarks/swe_bench/run_infer.py
+++ b/evaluation/benchmarks/swe_bench/run_infer.py
@@ -15,6 +15,7 @@ from evaluation.utils.shared import (
    EvalOutput,
    assert_and_raise,
    codeact_user_response,
+    is_fatal_evaluation_error,
    make_metadata,
    prepare_dataset,
    reset_logger_for_multiprocessing,
@@ -369,6 +370,7 @@ def process_instance(
    instance: pd.Series,
    metadata: EvalMetadata,
    reset_logger: bool = True,
+    runtime_failure_count: int = 0,
 ) -> EvalOutput:
    config = get_config(instance, metadata)

@@ -379,6 +381,15 @@ def process_instance(
    else:
        logger.info(f'Starting evaluation for instance {instance.instance_id}.')

+    # Increase resource_factor with increasing attempt_id
+    if runtime_failure_count > 0:
+        config.sandbox.remote_runtime_resource_factor = min(
+            config.sandbox.remote_runtime_resource_factor * (2**runtime_failure_count),
+            2,  # hardcode maximum resource factor to 2
+        )
+        logger.warning(
+            f'This is the second attempt for instance {instance.instance_id}, setting resource factor to {config.sandbox.remote_runtime_resource_factor}'
+        )
    runtime = create_runtime(config)
    call_async_from_sync(runtime.connect)

@@ -400,11 +411,7 @@ def process_instance(
        )

        # if fatal error, throw EvalError to trigger re-run
-        if (
-            state.last_error
-            and 'fatal error during agent execution' in state.last_error
-            and 'stuck in a loop' not in state.last_error
-        ):
+        if is_fatal_evaluation_error(state.last_error):
            raise EvalException('Fatal error detected: ' + state.last_error)

        # ======= THIS IS SWE-Bench specific =======
--- a/evaluation/benchmarks/swe_bench/scripts/eval/summarize_outputs.py
+++ b/evaluation/benchmarks/swe_bench/scripts/eval/summarize_outputs.py
@@ -6,6 +6,8 @@ import os
 from collections import Counter

 import pandas as pd
+import random
+import numpy as np

 from openhands.events.serialization import event_from_dict
 from openhands.events.utils import get_pairs_from_events
@@ -18,6 +20,18 @@ ERROR_KEYWORDS = [
 ]


+def get_bootstrap_accuracy_error_bars(values: float | int | bool, num_samples: int = 1000, p_value=0.05) -> tuple[float, float]:
+    sorted_vals = np.sort(
+        [
+            np.mean(random.sample(values, len(values) // 2))
+            for _ in range(num_samples)
+        ]
+    )
+    bottom_idx = int(num_samples * p_value / 2)
+    top_idx = int(num_samples * (1.0 - p_value / 2))
+    return (sorted_vals[bottom_idx], sorted_vals[top_idx])
+
+
 def process_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()
@@ -26,6 +40,7 @@ def process_file(file_path):
    num_error_lines = 0
    num_agent_stuck_in_loop = 0
    num_resolved = 0
+    resolved_arr = []
    num_empty_patch = 0
    num_unfinished_runs = 0
    error_counter = Counter()
@@ -74,6 +89,9 @@ def process_file(file_path):
        resolved = report.get('resolved', False)
        if resolved:
            num_resolved += 1
+            resolved_arr.append(1)
+        else:
+            resolved_arr.append(0)

        # Error
        error = _d.get('error', None)
@@ -100,6 +118,7 @@ def process_file(file_path):
        'resolved': {
            'count': num_resolved,
            'percentage': (num_resolved / num_lines * 100) if num_lines > 0 else 0,
+            'ci': tuple(x * 100 for x in get_bootstrap_accuracy_error_bars(resolved_arr)),
        },
        'empty_patches': {
            'count': num_empty_patch,
@@ -174,6 +193,7 @@ def aggregate_directory(input_path) -> pd.DataFrame:
    )

    df['resolve_rate'] = df['resolved'].apply(lambda x: x['percentage'])
+    df['resolve_rate_ci'] = df['resolved'].apply(lambda x: x['ci'])
    df['empty_patch_rate'] = df['empty_patches'].apply(lambda x: x['percentage'])
    df['unfinished_rate'] = df['unfinished_runs'].apply(lambda x: x['percentage'])
    df['avg_turns'] = df['statistics'].apply(lambda x: x['avg_turns'])
@@ -242,7 +262,7 @@ if __name__ == '__main__':
            # Print detailed results for single file
            print(f'\nResults for {args.input_path}:')
            print(
-                f"Number of resolved: {result['resolved']['count']} / {result['total_instances']} ({result['resolved']['percentage']:.2f}%)"
+                f"Number of resolved: {result['resolved']['count']} / {result['total_instances']} ({result['resolved']['percentage']:.2f}% [{result['resolved']['ci'][0]:.2f}%, {result['resolved']['ci'][1]:.2f}%])"
            )
            print(
                f"Number of empty patch: {result['empty_patches']['count']} / {result['total_instances']} ({result['empty_patches']['percentage']:.2f}%)"
--- a/evaluation/benchmarks/the_agent_company/README.md
+++ b/evaluation/benchmarks/the_agent_company/README.md
@@ -0,0 +1,43 @@
+# The Agent Company Evaluation with OpenHands
+
+This folder contains the evaluation harness that we built on top of the original [The Agent Company](https://github.com/TheAgentCompany/TheAgentCompany/tree/main/evaluation) ([paper](https://arxiv.org/abs/2412.14161)).
+
+The evaluation consists of three steps:
+
+1. Environment setup: [install python environment](../../README.md#development-environment), [configure LLM config](../../README.md#configure-openhands-and-your-llm), [launch services](https://github.com/TheAgentCompany/TheAgentCompany/blob/main/docs/SETUP.md).
+2. [Run Evaluation](#run-inference-on-the-agent-company-instances): Run all tasks and get the evaluation results.
+
+## Setup Environment and LLM Configuration
+
+Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
+
+## Run Inference on The Agent Company Tasks
+
+When the `run_infer.sh` script is started, it will automatically pull all task images. Every task image will be used to create an OpenHands runtime image where the agent will operate on.
+
+```bash
+./evaluation/benchmarks/the_agent_company/scripts/run_infer.sh \
+  --agent-llm-config <agent-llm-config>  \
+  --env-llm-config <env-llm-config> \
+  --outputs-path <outputs-path> \
+  --server-hostname <server-hostname> \
+  --version <version>
+
+# Example
+./evaluation/benchmarks/the_agent_company/scripts/run_infer.sh \
+  --agent-llm-config claude-3-5-sonnet-20240620 \
+  --env-llm-config claude-3-5-sonnet-20240620 \
+  --outputs-path outputs \
+  --server-hostname localhost \
+  --version 1.0.0
+```
+
+- `agent-llm-config`: the config name for the agent LLM. This should match the config name in config.toml. This is the LLM used by the agent (e.g. CodeActAgent).
+- `env-llm-config`: the config name for the environment LLM. This should match the config name in config.toml. This is used by the chat bots (NPCs) and LLM-based evaluators.
+- `outputs-path`: the path to save trajectories and evaluation results.
+- `server-hostname`: the hostname of the server that hosts all the web services. It could be localhost if you are running the evaluation and services on the same machine. If the services are hosted on a remote machine, you must use the hostname of the remote machine rather than IP address.
+- `version`: the version of the task images to use. Currently, the only supported version is 1.0.0.
+
+The script is idempotent. If you run it again, it will resume from the last checkpoint. It would usually take a few days to finish evaluation.
+
+Note: the script will automatically skip a task if it encounters an error. This usually happens when the OpenHands runtime dies due to some unexpected errors. This means even if the script finishes, it might not have evaluated all tasks. You can manually resume the evaluation by running the script again.
--- a/evaluation/benchmarks/the_agent_company/browsing.py
+++ b/evaluation/benchmarks/the_agent_company/browsing.py
@@ -0,0 +1,273 @@
+##################################################################################################
+# Adapted from https://github.com/TheAgentCompany/TheAgentCompany/blob/main/evaluation/browsing.py
+##################################################################################################
+
+import base64
+import os
+import re
+from dataclasses import dataclass
+from enum import Enum, auto
+from typing import Dict, List, Optional, Union
+
+from openhands.core.logger import openhands_logger as logger
+from openhands.events.action import BrowseInteractiveAction
+from openhands.events.observation import BrowserOutputObservation
+from openhands.runtime.base import Runtime
+
+
+class ActionType(Enum):
+    GOTO = auto()
+    FILL = auto()
+    CLICK = auto()
+    NOOP = auto()
+
+
+@dataclass
+class Selector:
+    """
+    Represents either a direct anchor ID or a descriptive selector
+    """
+
+    value: str
+    is_anchor: bool = False
+
+    def __str__(self) -> str:
+        return f'{self.value}'
+
+
+@dataclass
+class BrowserAction:
+    """Base class for all browser actions"""
+
+    action_type: ActionType
+
+    def to_instruction(self) -> str:
+        """Convert the action to a browser instruction string"""
+        raise NotImplementedError
+
+
+@dataclass
+class GotoAction(BrowserAction):
+    url: str
+
+    def __init__(self, url: str):
+        super().__init__(ActionType.GOTO)
+        self.url = url
+
+    def to_instruction(self) -> str:
+        return f'goto("{self.url}")'
+
+
+@dataclass
+class NoopAction(BrowserAction):
+    milliseconds: int
+
+    def __init__(self, milliseconds: int):
+        super().__init__(ActionType.NOOP)
+        self.milliseconds = milliseconds
+
+    def to_instruction(self) -> str:
+        return f'noop({self.milliseconds})'
+
+
+@dataclass
+class InputAction(BrowserAction):
+    selector: Selector
+    value: str
+
+    def __init__(self, selector: Union[str, Selector], value: str):
+        super().__init__(ActionType.FILL)
+        self.selector = (
+            selector if isinstance(selector, Selector) else Selector(selector)
+        )
+        self.value = value
+
+    def to_instruction(self) -> str:
+        return f'fill("{self.selector}", "{self.value}")'
+
+
+@dataclass
+class ClickAction(BrowserAction):
+    selector: Selector
+
+    def __init__(self, selector: Union[str, Selector]):
+        super().__init__(ActionType.CLICK)
+        self.selector = (
+            selector if isinstance(selector, Selector) else Selector(selector)
+        )
+
+    def to_instruction(self) -> str:
+        return f'click("{self.selector}")'
+
+
+def parse_content_to_elements(content: str) -> Dict[str, str]:
+    """Parse the observation content into a dictionary mapping anchors to their descriptions"""
+    elements = {}
+    current_anchor = None
+    description_lines = []
+
+    for line in content.split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+
+        # Check for anchor line
+        anchor_match = re.match(r'\[(\d+)\](.*)', line)
+        if anchor_match:
+            # Save previous element if it exists
+            if current_anchor and description_lines:
+                elements[current_anchor] = ' '.join(description_lines)
+
+            # Start new element
+            current_anchor = anchor_match.group(1)
+            description_lines = [anchor_match.group(2).strip()]
+        else:
+            # Add to current description if we have an anchor
+            if current_anchor:
+                description_lines.append(line)
+
+    # Save last element
+    if current_anchor and description_lines:
+        elements[current_anchor] = ' '.join(description_lines)
+
+    return elements
+
+
+def find_matching_anchor(content: str, selector: str) -> Optional[str]:
+    """Find the anchor ID that matches the given selector description"""
+    elements = parse_content_to_elements(content)
+
+    # Clean up selector and create a pattern
+    selector = selector.lower().strip()
+
+    for anchor, description in elements.items():
+        description = description.lower().strip()
+        if selector in description:
+            return anchor
+
+    return None
+
+
+def resolve_action(action: BrowserAction, content: str) -> BrowserAction:
+    """
+    Resolve any descriptive selectors in the action to anchor IDs based on the content.
+    Returns a new action with resolved selectors.
+    """
+    if isinstance(action, (InputAction, ClickAction)):
+        if not action.selector.is_anchor:
+            anchor = find_matching_anchor(content, action.selector.value)
+            if anchor:
+                new_selector = Selector(anchor, is_anchor=True)
+                if isinstance(action, InputAction):
+                    return InputAction(new_selector, action.value)
+                else:
+                    return ClickAction(new_selector)
+            else:
+                logger.error(f'NO MATCH FOUND FOR SELECTOR, {action.selector}')
+                return None
+    return action
+
+
+def pre_login(
+    runtime: Runtime,
+    services: List[str],
+    save_screenshots=True,
+    screenshots_dir='screenshots',
+):
+    """
+    Logs in to all the websites that are needed for the evaluation.
+    Once logged in, the sessions would be cached in the browser, so OpenHands
+    agent doesn't need to log in to these websites again.
+    """
+    owncloud_login_actions = [
+        GotoAction('http://the-agent-company.com:8092'),
+        NoopAction(1000),
+        InputAction("textbox '', clickable, focused, required", 'theagentcompany'),
+        NoopAction(1000),
+        InputAction("textbox '', clickable, required", 'theagentcompany'),
+        NoopAction(1000),
+        ClickAction("button '', clickable"),
+        NoopAction(1000),
+    ]
+
+    rocketchat_login_actions = [
+        GotoAction('http://the-agent-company.com:3000'),
+        NoopAction(1000),
+        InputAction("textbox '', clickable, focused", 'theagentcompany'),
+        NoopAction(1000),
+        InputAction("textbox '', clickable", 'theagentcompany'),
+        NoopAction(1000),
+        ClickAction("button 'Login', clickable"),
+    ]
+
+    gitlab_login_actions = [
+        GotoAction('http://the-agent-company.com:8929/users/sign_in'),
+        NoopAction(1000),
+        InputAction("textbox 'Username or primary email'", 'root'),
+        NoopAction(1000),
+        InputAction("textbox 'Password'", 'theagentcompany'),
+        NoopAction(1000),
+        ClickAction("button 'Sign in', clickable"),
+    ]
+
+    # devnote: plane reset is not stable, and sometimes it fails to launch
+    # in which case the login action will fail, and then we would skip the task
+    plane_login_actions = [
+        GotoAction('http://the-agent-company.com:8091'),
+        NoopAction(1000),
+        InputAction(
+            "textbox 'Email', clickable, focused",
+            'agent@company.com',
+        ),
+        NoopAction(1000),
+        ClickAction("button 'Continue'"),
+        NoopAction(1000),
+        InputAction("textbox 'Enter password', clickable", 'theagentcompany'),
+        NoopAction(1000),
+        ClickAction("button 'Go to workspace'"),
+    ]
+
+    all_login_actions = [
+        ('owncloud', owncloud_login_actions),
+        ('rocketchat', rocketchat_login_actions),
+        ('gitlab', gitlab_login_actions),
+        ('plane', plane_login_actions),
+    ]
+
+    for website_name, login_actions in all_login_actions:
+        if website_name not in services:
+            logger.info(
+                f"Skipping login for {website_name} because it's not in the list of services to reset"
+            )
+            continue
+
+        if save_screenshots:
+            directory = os.path.join(screenshots_dir, website_name)
+            if not os.path.exists(directory):
+                os.makedirs(directory)
+            image_id = 0
+        obs: BrowserOutputObservation = None
+        for action in login_actions:
+            # Resolve any descriptive selectors to anchor IDs
+            if obs:
+                action = resolve_action(action, obs.get_agent_obs_text())
+
+            if not action:
+                logger.error(f'FAILED TO RESOLVE ACTION, {action}')
+                raise Exception(
+                    'FAILED TO RESOLVE ACTION, maybe the service is not available'
+                )
+
+            # Convert the action to an instruction string
+            instruction = action.to_instruction()
+
+            browser_action = BrowseInteractiveAction(browser_actions=instruction)
+            browser_action.timeout = 10000
+            logger.info(browser_action, extra={'msg_type': 'ACTION'})
+            obs: BrowserOutputObservation = runtime.run_action(browser_action)
+            logger.debug(obs, extra={'msg_type': 'OBSERVATION'})
+            if save_screenshots:
+                image_data = base64.b64decode(obs.screenshot)
+                with open(os.path.join(directory, f'{image_id}.png'), 'wb') as file:
+                    file.write(image_data)
+                    image_id += 1
--- a/evaluation/benchmarks/the_agent_company/run_infer.py
+++ b/evaluation/benchmarks/the_agent_company/run_infer.py
@@ -0,0 +1,319 @@
+##################################################################################################
+# Adapted from https://github.com/TheAgentCompany/TheAgentCompany/blob/main/evaluation/run_eval.py
+##################################################################################################
+
+import asyncio
+import base64
+import json
+import os
+import shutil
+import tempfile
+from typing import List
+
+import yaml
+from browsing import pre_login
+
+from openhands.controller.state.state import State
+from openhands.core.config import (
+    AppConfig,
+    LLMConfig,
+    SandboxConfig,
+    get_llm_config_arg,
+    get_parser,
+)
+from openhands.core.logger import openhands_logger as logger
+from openhands.core.main import create_runtime, run_controller
+from openhands.events.action import CmdRunAction, MessageAction
+from openhands.events.observation import BrowserOutputObservation, CmdOutputObservation
+from openhands.runtime.base import Runtime
+from openhands.utils.async_utils import call_async_from_sync
+
+
+def get_config(
+    base_container_image: str,
+    task_short_name: str,
+    mount_path_on_host: str,
+    llm_config: LLMConfig,
+) -> AppConfig:
+    config = AppConfig(
+        run_as_openhands=False,
+        max_budget_per_task=4,
+        max_iterations=100,
+        trajectories_path=os.path.join(
+            mount_path_on_host, f'traj_{task_short_name}.json'
+        ),
+        sandbox=SandboxConfig(
+            base_container_image=base_container_image,
+            enable_auto_lint=True,
+            # using host network to access the host machine from the container
+            use_host_network=True,
+            # large enough timeout, since some testcases take very long to run
+            timeout=300,
+            api_key=os.environ.get('ALLHANDS_API_KEY', None),
+        ),
+        # we mount trajectories path so that trajectories, generated by OpenHands
+        # controller, can be accessible to the evaluator file in the runtime container
+        workspace_mount_path=mount_path_on_host,
+        workspace_mount_path_in_sandbox='/outputs',
+    )
+    config.set_llm_config(llm_config)
+    return config
+
+
+def load_dependencies(runtime: Runtime) -> List[str]:
+    """
+    Every task has a dependencies.yml file, which lists all the services that the
+    task depends on. This function loads the file and returns all dependent service names.
+    """
+    command = 'cat /utils/dependencies.yml'
+    action = CmdRunAction(command=command)
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs: CmdOutputObservation = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert obs.exit_code == 0
+    dependencies = yaml.safe_load(obs.content)
+    if dependencies is None:
+        dependencies = []
+    return dependencies
+
+
+def init_task_env(runtime: Runtime, hostname: str, env_llm_config: LLMConfig):
+    command = (
+        f'SERVER_HOSTNAME={hostname} '
+        f'LITELLM_API_KEY={env_llm_config.api_key} '
+        f'LITELLM_BASE_URL={env_llm_config.base_url} '
+        f'LITELLM_MODEL={env_llm_config.model} '
+        'bash /utils/init.sh'
+    )
+    action = CmdRunAction(command=command)
+    action.timeout = 900
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert obs.exit_code == 0
+
+
+def codeact_user_response(state: State) -> str:
+    msg = (
+        'Please continue working on the task on whatever approach you think is suitable.\n'
+        'If you think you have solved the task, please finish the interaction.\n'
+        'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n'
+    )
+
+    if state.history:
+        # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up
+        user_msgs = [
+            event
+            for event in state.history
+            if isinstance(event, MessageAction) and event.source == 'user'
+        ]
+        if len(user_msgs) >= 2:
+            # let the agent know that it can give up when it has tried 3 times
+            return (
+                msg
+                + 'If you want to give up, run: <execute_bash> exit </execute_bash>.\n'
+            )
+    return msg
+
+
+def run_solver(
+    runtime: Runtime,
+    task_name: str,
+    config: AppConfig,
+    dependencies: List[str],
+    save_final_state: bool,
+    state_dir: str,
+    save_screenshots: bool,
+    screenshots_dir: str,
+) -> State:
+    instruction = 'Complete the task in /instruction/task.md'
+
+    if 'gitlab' in dependencies:
+        instruction += "\n\nGitlab username is 'root' and password is 'theagentcompany'"
+
+    state: State | None = asyncio.run(
+        run_controller(
+            config=config,
+            sid=task_name,
+            initial_user_action=MessageAction(content=instruction),
+            runtime=runtime,
+            fake_user_response_fn=codeact_user_response,
+        )
+    )
+    logger.info(state)
+
+    if save_screenshots:
+        screenshots_dir = os.path.join(screenshots_dir, task_name)
+        os.makedirs(screenshots_dir, exist_ok=True)
+        for image_id, obs in enumerate(state.history):
+            if isinstance(obs, BrowserOutputObservation):
+                image_data = base64.b64decode(obs.screenshot)
+                with open(
+                    os.path.join(screenshots_dir, f'{image_id}.png'), 'wb'
+                ) as file:
+                    file.write(image_data)
+
+    if save_final_state:
+        os.makedirs(state_dir, exist_ok=True)
+        with open(os.path.join(state_dir, f'state_{task_name}.json'), 'w') as file:
+            json.dump(str(state), file)
+
+    return state
+
+
+def run_evaluator(
+    runtime: Runtime, env_llm_config: LLMConfig, trajectory_path: str, result_path: str
+):
+    command = (
+        f'LITELLM_API_KEY={env_llm_config.api_key} '
+        f'LITELLM_BASE_URL={env_llm_config.base_url} '
+        f'LITELLM_MODEL={env_llm_config.model} '
+        f"DECRYPTION_KEY='theagentcompany is all you need' "  # Hardcoded Key
+        f'python_default /utils/eval.py --trajectory_path {trajectory_path} --result_path {result_path}'
+    )
+    action = CmdRunAction(command=command)
+    action.timeout = 600
+    logger.info(action, extra={'msg_type': 'ACTION'})
+    obs = runtime.run_action(action)
+    logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+    assert obs.exit_code == 0
+
+
+if __name__ == '__main__':
+    parser = get_parser()
+    parser.add_argument(
+        '--task-image-name',
+        type=str,
+        default='ghcr.io/theagentcompany/example-image:1.0.0',
+        help='Task image name',
+    )
+    parser.add_argument(
+        '--outputs-path',
+        type=str,
+        default='./outputs',
+        help='Folder path to save trajectories and evaluation results',
+    )
+    parser.add_argument(
+        '--server-hostname',
+        type=str,
+        default='localhost',
+        help='Server hostname, e.g. localhost to access the host machine from the container, '
+        'assuming the task docker container is run with `--network host` flag',
+    )
+    parser.add_argument(
+        '--agent-llm-config',
+        type=str,
+        default=None,
+        help='LLM config for agent',
+    )
+    parser.add_argument(
+        '--env-llm-config',
+        type=str,
+        default=None,
+        help='LLM config for evaluation environment (NPC & llm-based evaluator)',
+    )
+    args, _ = parser.parse_known_args()
+
+    agent_llm_config: LLMConfig | None = None
+    if args.agent_llm_config:
+        agent_llm_config = get_llm_config_arg(args.agent_llm_config)
+
+    if agent_llm_config is None:
+        raise ValueError(
+            f'Could not find LLM config for agent: --agent-llm-config {args.agent_llm_config}'
+        )
+
+    if agent_llm_config.api_key is None:
+        raise ValueError('LLM API key is not set for agent')
+
+    env_llm_config: LLMConfig | None = None
+    if args.env_llm_config:
+        env_llm_config = get_llm_config_arg(args.env_llm_config)
+
+    if env_llm_config is None:
+        raise ValueError(
+            f'Could not find LLM config for evaluation environment: --env-llm-config {args.env_llm_config}'
+        )
+
+    if env_llm_config.api_key is None:
+        raise ValueError('LLM API key is not set for evaluation environment')
+
+    task_short_name = args.task_image_name.split('/')[-1].split(':')[0]
+    logger.info(
+        f'Task image name is {args.task_image_name}, short name is {task_short_name}'
+    )
+
+    # mount a temporary directory to pass trajectory from host to container, and to
+    # pass the evaluation result from container to host
+    # 1) trajectory is dumped by OpenHands library (on host machine), but it's needed by
+    # evaluator (in container), so we mount a temporary directory to pass it in
+    # 2) evaluation result is written by evaluator (in container), but we need to persist
+    # it on host machine, so we mount a temporary directory to pass it out
+    if os.getenv('TMPDIR') and os.path.exists(os.getenv('TMPDIR')):
+        temp_dir = os.path.abspath(os.getenv('TMPDIR'))
+    else:
+        temp_dir = tempfile.mkdtemp()
+    config: AppConfig = get_config(
+        args.task_image_name, task_short_name, temp_dir, agent_llm_config
+    )
+    runtime: Runtime = create_runtime(config)
+    call_async_from_sync(runtime.connect)
+
+    init_task_env(runtime, args.server_hostname, env_llm_config)
+
+    dependencies = load_dependencies(runtime)
+    logger.info(f'Service dependencies: {dependencies}')
+
+    try:
+        pre_login(
+            runtime,
+            dependencies,
+            save_screenshots=True,
+            screenshots_dir=os.path.join(
+                os.path.abspath(args.outputs_path), 'screenshots'
+            ),
+        )
+    except Exception as e:
+        logger.error(f'Failed to pre-login: {e}')
+
+        # before giving up, let's try to init and login again
+        init_task_env(runtime, args.server_hostname, env_llm_config)
+        pre_login(
+            runtime,
+            dependencies,
+            save_screenshots=True,
+            screenshots_dir=os.path.join(
+                os.path.abspath(args.outputs_path), 'screenshots'
+            ),
+        )
+
+    state = run_solver(
+        runtime,
+        task_short_name,
+        config,
+        dependencies,
+        save_final_state=True,
+        state_dir=os.path.abspath(args.outputs_path),
+        save_screenshots=True,
+        screenshots_dir=os.path.join(os.path.abspath(args.outputs_path), 'screenshots'),
+    )
+
+    # this path is the absolute path in the runtime container
+    trajectory_path = f'/outputs/traj_{task_short_name}.json'
+    result_path = f'/outputs/eval_{task_short_name}.json'
+
+    run_evaluator(runtime, env_llm_config, trajectory_path, result_path)
+
+    # finally, move trajectory file and evaluation result from mount path on host (temp dir) to outputs path
+    shutil.move(
+        os.path.join(temp_dir, f'traj_{task_short_name}.json'),
+        os.path.join(
+            os.path.abspath(args.outputs_path), f'traj_{task_short_name}.json'
+        ),
+    )
+    shutil.move(
+        os.path.join(temp_dir, f'eval_{task_short_name}.json'),
+        os.path.join(
+            os.path.abspath(args.outputs_path), f'eval_{task_short_name}.json'
+        ),
+    )
--- a/evaluation/benchmarks/the_agent_company/scripts/run_infer.sh
+++ b/evaluation/benchmarks/the_agent_company/scripts/run_infer.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+
+##################################################################################################
+# Adapted from https://github.com/TheAgentCompany/TheAgentCompany/blob/main/evaluation/run_eval.sh
+##################################################################################################
+
+# Exit on any error would be useful for debugging
+if [ -n "$DEBUG" ]; then
+    set -e
+fi
+
+# AGENT_LLM_CONFIG is the config name for the agent LLM
+# In config.toml, you should have a section with the name
+# [llm.<AGENT_LLM_CONFIG>], e.g. [llm.agent]
+AGENT_LLM_CONFIG="agent"
+
+# ENV_LLM_CONFIG is the config name for the environment LLM,
+# used by the NPCs and LLM-based evaluators.
+# In config.toml, you should have a section with the name
+# [llm.<ENV_LLM_CONFIG>], e.g. [llm.env]
+ENV_LLM_CONFIG="env"
+
+# OUTPUTS_PATH is the path to save trajectories and evaluation results
+OUTPUTS_PATH="outputs"
+
+# SERVER_HOSTNAME is the hostname of the server that hosts all the web services,
+# including RocketChat, ownCloud, GitLab, and Plane.
+SERVER_HOSTNAME="localhost"
+
+# VERSION is the version of the task images to use
+# If a task doesn't have a published image with this version, it will be skipped
+# 12/15/2024: this is for forward compatibility, in the case where we add new tasks
+# after the 1.0.0 release
+VERSION="1.0.0"
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --agent-llm-config)
+            AGENT_LLM_CONFIG="$2"
+            shift 2
+            ;;
+        --env-llm-config)
+            ENV_LLM_CONFIG="$2"
+            shift 2
+            ;;
+        --outputs-path)
+            OUTPUTS_PATH="$2"
+            shift 2
+            ;;
+        --server-hostname)
+            SERVER_HOSTNAME="$2"
+            shift 2
+            ;;
+        --version)
+            VERSION="$2"
+            shift 2
+            ;;
+        *)
+            echo "Unknown argument: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# Convert outputs_path to absolute path
+if [[ ! "$OUTPUTS_PATH" = /* ]]; then
+    # If path is not already absolute (doesn't start with /), make it absolute
+    OUTPUTS_PATH="$(cd "$(dirname "$OUTPUTS_PATH")" 2>/dev/null && pwd)/$(basename "$OUTPUTS_PATH")"
+fi
+
+echo "Using agent LLM config: $AGENT_LLM_CONFIG"
+echo "Using environment LLM config: $ENV_LLM_CONFIG"
+echo "Outputs path: $OUTPUTS_PATH"
+echo "Server hostname: $SERVER_HOSTNAME"
+echo "Version: $VERSION"
+
+echo "Downloading tasks.md..."
+rm -f tasks.md
+wget https://github.com/TheAgentCompany/TheAgentCompany/releases/download/${VERSION}/tasks.md
+
+while IFS= read -r task_image; do
+    docker pull $task_image
+
+    # Remove prefix using ## to remove longest matching pattern from start
+    task_name=${task_image##ghcr.io/theagentcompany/}
+
+    # Remove suffix using % to remove shortest matching pattern from end
+    task_name=${task_name%-image:*}
+    echo "Use task image $task_image, task name $task_name..."
+
+    # Check if evaluation file exists
+    if [ -f "$OUTPUTS_PATH/eval_${task_name}-image.json" ]; then
+        echo "Skipping $task_name - evaluation file already exists"
+        continue
+    fi
+
+    export PYTHONPATH=evaluation/benchmarks/the_agent_company:\$PYTHONPATH && \
+        poetry run python run_infer.py \
+            --agent-llm-config "$AGENT_LLM_CONFIG" \
+            --env-llm-config "$ENV_LLM_CONFIG" \
+            --outputs-path "$OUTPUTS_PATH" \
+            --server-hostname "$SERVER_HOSTNAME" \
+            --task-image-name "$task_image"
+
+    # Prune unused images and volumes
+    docker image rm "$task_image"
+    docker images "ghcr.io/all-hands-ai/runtime" -q | xargs -r docker rmi -f
+    docker volume prune -f
+    docker system prune -f
+done < tasks.md
+
+rm tasks.md
+
+echo "All evaluation completed successfully!"
--- a/evaluation/benchmarks/toolqa/README.md
+++ b/evaluation/benchmarks/toolqa/README.md
@@ -11,7 +11,7 @@ Please follow instruction [here](../../README.md#setup) to setup your local deve
 Make sure your Docker daemon is running, then run this bash script:

 ```bash
-bash evaluation/benchmarks/toolqa/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [dataset] [hardness] [wolfram_alpha_appid]
+bash evaluation/benchmarks/toolqa/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [dataset] [hardness] [wolfram-alpha-appid]
 ```

 where `model_config` is mandatory, while all other arguments are optional.
@@ -32,7 +32,7 @@ By default, the script evaluates 1 instance.

 `hardness`, the hardness to evaluate. You could choose from `easy` and `hard`. The default is `easy`.

-`wolfram_alpha_appid` is an optional argument. When given `wolfram_alpha_appid`, the agent will be able to access Wolfram Alpha's APIs.
+`wolfram-alpha-appid` is an optional argument. When given `wolfram-alpha-appid`, the agent will be able to access Wolfram Alpha's APIs.

 Note: in order to use `eval_limit`, you must also set `agent`; in order to use `dataset`, you must also set `eval_limit`; in order to use `hardness`, you must also set `dataset`.

--- a/evaluation/benchmarks/toolqa/run_infer.py
+++ b/evaluation/benchmarks/toolqa/run_infer.py
@@ -171,7 +171,7 @@ if __name__ == '__main__':
        default='easy',
    )
    parser.add_argument(
-        '--wolfram_alpha_appid',
+        '--wolfram-alpha-appid',
        type=str,
        help='wolfram alpha appid to use for wolfram alpha related tests',
        default='YOUR_WOLFRAMALPHA_APPID',
--- a/evaluation/benchmarks/toolqa/scripts/run_infer.sh
+++ b/evaluation/benchmarks/toolqa/scripts/run_infer.sh
@@ -53,7 +53,7 @@ COMMAND="poetry run python evaluation/benchmarks/toolqa/run_infer.py \
  --max-iterations 30 \
  --dataset $DATASET \
  --hardness $HARDNESS \
-  --wolfram_alpha_appid $WOLFRAM_APPID\
+  --wolfram-alpha-appid $WOLFRAM_APPID\
  --data-split validation \
  --eval-num-workers $NUM_WORKERS \
  --eval-note ${OPENHANDS_VERSION}_${LEVELS}"
--- a/evaluation/utils/shared.py
+++ b/evaluation/utils/shared.py
@@ -8,6 +8,7 @@ import subprocess
 import time
 import traceback
 from contextlib import contextmanager
+from inspect import signature
 from typing import Any, Awaitable, Callable, TextIO

 import pandas as pd
@@ -16,6 +17,15 @@ from tqdm import tqdm

 from openhands.controller.state.state import State
 from openhands.core.config import LLMConfig
+from openhands.core.exceptions import (
+    AgentRuntimeBuildError,
+    AgentRuntimeDisconnectedError,
+    AgentRuntimeError,
+    AgentRuntimeNotFoundError,
+    AgentRuntimeNotReadyError,
+    AgentRuntimeTimeoutError,
+    AgentRuntimeUnavailableError,
+)
 from openhands.core.logger import get_console_handler
 from openhands.core.logger import openhands_logger as logger
 from openhands.events.action import Action
@@ -306,13 +316,20 @@ def _process_instance_wrapper(
    timeout_seconds: int | None = None,
 ) -> EvalOutput:
    """Wrap the process_instance_func to handle retries and errors."""
+    runtime_failure_count = 0
    for attempt in range(max_retries + 1):
        try:
+            kwargs = {}
+            # check if process_instance_func accepts timeout_seconds parameter
+            sig = signature(process_instance_func)
+            if 'runtime_failure_count' in sig.parameters:
+                kwargs['runtime_failure_count'] = runtime_failure_count
+
            if timeout_seconds is not None:
                with timeout(timeout_seconds):
-                    result = process_instance_func(instance, metadata, use_mp)
+                    result = process_instance_func(instance, metadata, use_mp, **kwargs)
            else:
-                result = process_instance_func(instance, metadata, use_mp)
+                result = process_instance_func(instance, metadata, use_mp, **kwargs)
            return result
        except EvalTimeoutException as e:
            error = f'Timeout after {timeout_seconds} seconds'
@@ -358,6 +375,11 @@ def _process_instance_wrapper(
                + '-' * 10
                + '\n'
            )
+            if isinstance(
+                e, (AgentRuntimeDisconnectedError, AgentRuntimeUnavailableError)
+            ):
+                runtime_failure_count += 1
+                msg += f'Runtime disconnected error detected for instance {instance.instance_id}, runtime failure count: {runtime_failure_count}'
            logger.error(msg)
            if use_mp:
                print(msg)  # use print to directly print to console
@@ -503,3 +525,24 @@ def compatibility_for_eval_history_pairs(
        history_pairs.append((event_to_dict(action), event_to_dict(observation)))

    return history_pairs
+
+
+def is_fatal_evaluation_error(error: str | None) -> bool:
+    if not error:
+        return False
+
+    FATAL_EXCEPTIONS = [
+        AgentRuntimeError,
+        AgentRuntimeBuildError,
+        AgentRuntimeTimeoutError,
+        AgentRuntimeUnavailableError,
+        AgentRuntimeNotReadyError,
+        AgentRuntimeDisconnectedError,
+        AgentRuntimeNotFoundError,
+    ]
+
+    if any(exception.__name__ in error for exception in FATAL_EXCEPTIONS):
+        logger.error(f'Fatal evaluation error detected: {error}')
+        return True
+
+    return False
--- a/frontend/tests/components/browser.test.tsx
+++ b/frontend/tests/components/browser.test.tsx
@@ -1,10 +1,38 @@
+import { describe, it, expect, afterEach, vi } from "vitest";
+import * as router from "react-router";
+
+// Mock useParams before importing components
+vi.mock("react-router", async () => {
+  const actual = await vi.importActual("react-router");
+  return {
+    ...actual as object,
+    useParams: () => ({ conversationId: "test-conversation-id" }),
+  };
+});
+
+// Mock i18next
+vi.mock("react-i18next", async () => {
+  const actual = await vi.importActual("react-i18next");
+  return {
+    ...actual as object,
+    useTranslation: () => ({
+      t: (key: string) => key,
+      i18n: {
+        changeLanguage: () => new Promise(() => {}),
+      },
+    }),
+  };
+});
+
 import { screen } from "@testing-library/react";
-import { describe, it, expect } from "vitest";
 import { renderWithProviders } from "../../test-utils";
 import { BrowserPanel } from "#/components/features/browser/browser";


 describe("Browser", () => {
+  afterEach(() => {
+    vi.clearAllMocks();
+  });
  it("renders a message if no screenshotSrc is provided", () => {
    renderWithProviders(<BrowserPanel />, {
      preloadedState: {
--- a/frontend/tests/components/chat-message.test.tsx
+++ b/frontend/tests/components/chat-message.test.tsx
@@ -26,8 +26,6 @@ describe("ChatMessage", () => {
    expect(screen.getByText("'Hello, World!'")).toBeInTheDocument();
  });

-  it.todo("should support markdown content");
-
  it("should render the copy to clipboard button when the user hovers over the message", async () => {
    const user = userEvent.setup();
    render(<ChatMessage type="user" message="Hello, World!" />);
@@ -50,15 +48,8 @@ describe("ChatMessage", () => {
    expect(navigator.clipboard.readText()).resolves.toBe("Hello, World!");
  });

-  // BUG: vi.useFakeTimers() seems to break the tests
-  it.todo(
-    "should display a checkmark for 200ms and disable the button after copying content to clipboard",
-  );
-
  it("should display an error toast if copying content to clipboard fails", async () => {});

-  test.todo("push a toast after successfully copying content to clipboard");
-
  it("should render a component passed as a prop", () => {
    function Component() {
      return <div data-testid="custom-component">Custom Component</div>;
@@ -72,7 +63,12 @@ describe("ChatMessage", () => {
  });

  it("should apply correct styles to inline code", () => {
-    render(<ChatMessage type="assistant" message="Here is some `inline code` text" />);
+    render(
+      <ChatMessage
+        type="assistant"
+        message="Here is some `inline code` text"
+      />,
+    );
    const codeElement = screen.getByText("inline code");

    expect(codeElement.tagName.toLowerCase()).toBe("code");
--- a/frontend/tests/components/chat/chat-interface.test.tsx
+++ b/frontend/tests/components/chat/chat-interface.test.tsx
@@ -9,7 +9,7 @@ import { WsClientProviderStatus } from "#/context/ws-client-provider";
 import { ChatInterface } from "#/components/features/chat/chat-interface";

 // eslint-disable-next-line @typescript-eslint/no-unused-vars
-const renderChatInterface = (messages: (Message)[]) =>
+const renderChatInterface = (messages: Message[]) =>
  renderWithProviders(<ChatInterface />);

 describe("Empty state", () => {
@@ -20,7 +20,7 @@ describe("Empty state", () => {
  const { useWsClient: useWsClientMock } = vi.hoisted(() => ({
    useWsClient: vi.fn(() => ({
      send: sendMock,
-      status: WsClientProviderStatus.ACTIVE,
+      status: WsClientProviderStatus.CONNECTED,
      isLoadingMessages: false,
    })),
  }));
@@ -90,7 +90,7 @@ describe("Empty state", () => {
      // this is to test that the message is in the UI before the socket is called
      useWsClientMock.mockImplementation(() => ({
        send: sendMock,
-        status: WsClientProviderStatus.ACTIVE,
+        status: WsClientProviderStatus.CONNECTED,
        isLoadingMessages: false,
      }));
      const addUserMessageSpy = vi.spyOn(ChatSlice, "addUserMessage");
@@ -120,7 +120,7 @@ describe("Empty state", () => {
    async () => {
      useWsClientMock.mockImplementation(() => ({
        send: sendMock,
-        status: WsClientProviderStatus.ACTIVE,
+        status: WsClientProviderStatus.CONNECTED,
        isLoadingMessages: false,
      }));
      const user = userEvent.setup();
@@ -138,7 +138,7 @@ describe("Empty state", () => {

      useWsClientMock.mockImplementation(() => ({
        send: sendMock,
-        status: WsClientProviderStatus.ACTIVE,
+        status: WsClientProviderStatus.CONNECTED,
        isLoadingMessages: false,
      }));
      rerender(<ChatInterface />);
@@ -195,7 +195,7 @@ describe.skip("ChatInterface", () => {
    expect(screen.getByTestId("chat-input")).toBeInTheDocument();
  });

-  it.todo("should call socket send when submitting a message", async () => {
+  it("should call socket send when submitting a message", async () => {
    const user = userEvent.setup();
    const messages: Message[] = [];
    renderChatInterface(messages);
@@ -240,8 +240,6 @@ describe.skip("ChatInterface", () => {
    );
  });

-  it.todo("should render confirmation buttons");
-
  it("should render a 'continue' action when there are more than 2 messages and awaiting user input", () => {
    const messages: Message[] = [
      {
@@ -278,7 +276,7 @@ describe.skip("ChatInterface", () => {
  });

  it("should render inline errors", () => {
-    const messages: (Message)[] = [
+    const messages: Message[] = [
      {
        sender: "assistant",
        content: "Hello",
@@ -402,12 +400,4 @@ describe.skip("ChatInterface", () => {

    expect(screen.getByTestId("feedback-actions")).toBeInTheDocument();
  });
-
-  describe("feedback", () => {
-    it.todo("should open the feedback modal when a feedback action is clicked");
-    it.todo(
-      "should submit feedback and hide the actions when feedback is shared",
-    );
-    it.todo("should render the actions once more after new messages are added");
-  });
 });
--- a/frontend/tests/components/features/waitlist-modal.test.tsx
+++ b/frontend/tests/components/features/waitlist-modal.test.tsx
@@ -1,10 +1,18 @@
 import { render, screen } from "@testing-library/react";
-import { it, describe, expect, vi } from "vitest";
+import { it, describe, expect, vi, beforeAll, afterAll } from "vitest";
 import userEvent from "@testing-library/user-event";
 import { WaitlistModal } from "#/components/features/waitlist/waitlist-modal";
 import * as CaptureConsent from "#/utils/handle-capture-consent";

 describe("WaitlistModal", () => {
+  beforeAll(() => {
+    vi.stubGlobal("location", { href: "" });
+  });
+
+  afterAll(() => {
+    vi.unstubAllGlobals();
+  });
+
  it("should render a tos checkbox that is unchecked by default", () => {
    render(<WaitlistModal ghToken={null} githubAuthUrl={null} />);
    const checkbox = screen.getByRole("checkbox");
--- a/frontend/tests/components/feedback-form.test.tsx
+++ b/frontend/tests/components/feedback-form.test.tsx
@@ -1,6 +1,17 @@
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import * as router from "react-router";
+
+// Mock useParams before importing components
+vi.mock("react-router", async () => {
+  const actual = await vi.importActual("react-router");
+  return {
+    ...actual as object,
+    useParams: () => ({ conversationId: "test-conversation-id" }),
+  };
+});
+
 import { screen } from "@testing-library/react";
 import userEvent from "@testing-library/user-event";
-import { afterEach, describe, expect, it, vi } from "vitest";
 import { renderWithProviders } from "test-utils";
 import { FeedbackForm } from "#/components/features/feedback/feedback-form";

--- a/frontend/tests/components/file-explorer/explorer-tree.test.tsx
+++ b/frontend/tests/components/file-explorer/explorer-tree.test.tsx
@@ -25,10 +25,4 @@ describe.skip("ExplorerTree", () => {
    expect(screen.queryByText("folder-1-2")).toBeInTheDocument();
    // TODO: make sure children don't render
  });
-
-  it.todo("should render all children as collapsed when defaultOpen is false");
-
-  it.todo(
-    "should maintain the expanded state of child folders when closing and opening a parent folder",
-  );
 });
--- a/frontend/tests/components/file-explorer/file-explorer.test.tsx
+++ b/frontend/tests/components/file-explorer/file-explorer.test.tsx
@@ -3,7 +3,7 @@ import userEvent from "@testing-library/user-event";
 import { renderWithProviders } from "test-utils";
 import { describe, it, expect, vi, Mock, afterEach } from "vitest";
 import toast from "#/utils/toast";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import OpenHands from "#/api/open-hands";
 import { FileExplorer } from "#/components/features/file-explorer/file-explorer";

@@ -37,8 +37,6 @@ describe.skip("FileExplorer", () => {
    expect(getFilesSpy).toHaveBeenCalledTimes(1); // once for root
  });

-  it.todo("should render an empty workspace");
-
  it("should refetch the workspace when clicking the refresh button", async () => {
    const user = userEvent.setup();
    renderFileExplorerWithRunningAgentState();
@@ -87,14 +85,6 @@ describe.skip("FileExplorer", () => {
    expect(getFilesSpy).toHaveBeenCalled();
  });

-  it.todo("should upload files when dragging them to the explorer", () => {
-    // It will require too much work to mock drag logic, especially for our case
-    // https://github.com/testing-library/user-event/issues/440#issuecomment-685010755
-    // TODO: should be tested in an e2e environment such as Cypress/Playwright
-  });
-
-  it.todo("should download a file");
-
  it("should display an error toast if file upload fails", async () => {
    (uploadFilesSpy as Mock).mockRejectedValue(new Error());
    const user = userEvent.setup();
--- a/frontend/tests/components/modals/settings/model-selector.test.tsx
+++ b/frontend/tests/components/modals/settings/model-selector.test.tsx
@@ -109,11 +109,4 @@ describe("ModelSelector", () => {
    expect(screen.getByLabelText("LLM Provider")).toHaveValue("Azure");
    expect(screen.getByLabelText("LLM Model")).toHaveValue("ada");
  });
-
-  it.todo("should disable provider if isDisabled is true");
-
-  it.todo(
-    "should display the verified models in the correct order",
-    async () => {},
-  );
 });
--- a/frontend/tests/components/project-menu/project-menu-card.test.tsx
+++ b/frontend/tests/components/project-menu/project-menu-card.test.tsx
@@ -1,8 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("PlayMenuCard", () => {
-  it.todo("should render the initial project title");
-  it.todo("should be able to edit the project title");
-  it.todo("should render the menu list items when clicking the ellipses");
-  it.todo("should close the menu list when clicking outside");
-});
--- a/frontend/tests/components/settings/ai-config-form.test.tsx
+++ b/frontend/tests/components/settings/ai-config-form.test.tsx
@@ -1,9 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("AIConfigForm", () => {
-  it.todo("should render the AI config form");
-  it.todo("should toggle the advanced settings when clicked");
-  it.todo("should call the onSubmit callback when the form is submitted");
-  it.todo("should call the onReset callback when the reset button is clicked");
-  it.todo("should call the onClose callback when the close button is clicked");
-});
--- a/frontend/tests/components/settings/dropdown-input.test.tsx
+++ b/frontend/tests/components/settings/dropdown-input.test.tsx
@@ -1,9 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("DropdownInput", () => {
-  it.todo("should render the input");
-  it.todo("should render the placeholder");
-  it.todo("should render the dropdown when clicked");
-  it.todo("should select an option when clicked");
-  it.todo("should filter the options when typing");
-});
--- a/frontend/tests/components/settings/model-selector.test.tsx
+++ b/frontend/tests/components/settings/model-selector.test.tsx
@@ -1,12 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("ModelSelector", () => {
-  it.todo("should render the model selector");
-  it.todo("should display and select the providers");
-  it.todo("should display and select the models");
-  it.todo("should disable the models if a provider is not selected");
-  it.todo("should disable the inputs if isDisabled is true");
-  it.todo(
-    "should set the selected model and provider if the currentModel prop is set",
-  );
-});
--- a/frontend/tests/routes/_oh.app.test.tsx
+++ b/frontend/tests/routes/_oh.app.test.tsx
@@ -1,5 +0,0 @@
-import { describe, it } from "vitest";
-
-describe("App", () => {
-  it.todo("should render");
-});
--- a/frontend/tests/routes/_oh.test.tsx
+++ b/frontend/tests/routes/_oh.test.tsx
@@ -1,4 +1,5 @@
 import { afterEach, beforeAll, describe, expect, it, vi } from "vitest";
+import * as router from "react-router";
 import { createRoutesStub } from "react-router";
 import { screen, waitFor, within } from "@testing-library/react";
 import { renderWithProviders } from "test-utils";
--- a/frontend/tests/services/auth.test.ts
+++ b/frontend/tests/services/auth.test.ts
@@ -1,21 +0,0 @@
-import { beforeEach, describe, expect, it, vi, type Mock } from "vitest";
-import { getToken } from "../../src/services/auth";
-
-Storage.prototype.getItem = vi.fn();
-Storage.prototype.setItem = vi.fn();
-
-describe("Auth Service", () => {
-  beforeEach(() => {
-    vi.clearAllMocks();
-  });
-
-  describe("getToken", () => {
-    it("should fetch and return a token", () => {
-      (Storage.prototype.getItem as Mock).mockReturnValue("newToken");
-
-      const data = getToken();
-      expect(localStorage.getItem).toHaveBeenCalledWith("token"); // Used to set Authorization header
-      expect(data).toEqual("newToken");
-    });
-  });
-});
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@@ -1,12 +1,12 @@
 {
  "name": "openhands-frontend",
-  "version": "0.16.0",
+  "version": "0.16.1",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "openhands-frontend",
-      "version": "0.16.0",
+      "version": "0.16.1",
      "dependencies": {
        "@monaco-editor/react": "^4.6.0",
        "@nextui-org/react": "^2.4.8",
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -1,6 +1,6 @@
 {
  "name": "openhands-frontend",
-  "version": "0.16.0",
+  "version": "0.16.1",
  "private": true,
  "type": "module",
  "engines": {
--- a/frontend/src/api/open-hands.ts
+++ b/frontend/src/api/open-hands.ts
@@ -53,8 +53,12 @@ class OpenHands {
   * @param path Path to list files from
   * @returns List of files available in the given path. If path is not provided, it lists all the files in the workspace
   */
-  static async getFiles(path?: string): Promise<string[]> {
-    const { data } = await openHands.get<string[]>("/api/list-files", {
+  static async getFiles(
+    conversationId: string,
+    path?: string,
+  ): Promise<string[]> {
+    const url = `/api/conversations/${conversationId}/list-files`;
+    const { data } = await openHands.get<string[]>(url, {
      params: { path },
    });
    return data;
@@ -65,8 +69,9 @@ class OpenHands {
   * @param path Full path of the file to retrieve
   * @returns Content of the file
   */
-  static async getFile(path: string): Promise<string> {
-    const { data } = await openHands.get<{ code: string }>("/api/select-file", {
+  static async getFile(conversationId: string, path: string): Promise<string> {
+    const url = `/api/conversations/${conversationId}/select-file`;
+    const { data } = await openHands.get<{ code: string }>(url, {
      params: { file: path },
    });

@@ -80,12 +85,14 @@ class OpenHands {
   * @returns Success message or error message
   */
  static async saveFile(
+    conversationId: string,
    path: string,
    content: string,
  ): Promise<SaveFileSuccessResponse> {
+    const url = `/api/conversations/${conversationId}/save-file`;
    const { data } = await openHands.post<
      SaveFileSuccessResponse | ErrorResponse
-    >("/api/save-file", {
+    >(url, {
      filePath: path,
      content,
    });
@@ -99,13 +106,17 @@ class OpenHands {
   * @param file File to upload
   * @returns Success message or error message
   */
-  static async uploadFiles(files: File[]): Promise<FileUploadSuccessResponse> {
+  static async uploadFiles(
+    conversationId: string,
+    files: File[],
+  ): Promise<FileUploadSuccessResponse> {
+    const url = `/api/conversations/${conversationId}/upload-files`;
    const formData = new FormData();
    files.forEach((file) => formData.append("files", file));

    const { data } = await openHands.post<
      FileUploadSuccessResponse | ErrorResponse
-    >("/api/upload-files", formData);
+    >(url, formData);

    if ("error" in data) throw new Error(data.error);
    return data;
@@ -116,11 +127,12 @@ class OpenHands {
   * @param data Feedback data
   * @returns The stored feedback data
   */
-  static async submitFeedback(feedback: Feedback): Promise<FeedbackResponse> {
-    const { data } = await openHands.post<FeedbackResponse>(
-      "/api/submit-feedback",
-      feedback,
-    );
+  static async submitFeedback(
+    conversationId: string,
+    feedback: Feedback,
+  ): Promise<FeedbackResponse> {
+    const url = `/api/conversations/${conversationId}/submit-feedback`;
+    const { data } = await openHands.post<FeedbackResponse>(url, feedback);
    return data;
  }

@@ -144,11 +156,16 @@ class OpenHands {
   */
  static async refreshToken(
    appMode: GetConfigResponse["APP_MODE"],
+    userId: string,
  ): Promise<string> {
    if (appMode === "oss") return "";

-    const response =
-      await openHands.post<GitHubAccessTokenResponse>("/api/refresh-token");
+    const response = await openHands.post<GitHubAccessTokenResponse>(
+      "/api/refresh-token",
+      {
+        userId,
+      },
+    );
    return response.data.access_token;
  }

@@ -156,8 +173,9 @@ class OpenHands {
   * Get the blob of the workspace zip
   * @returns Blob of the workspace zip
   */
-  static async getWorkspaceZip(): Promise<Blob> {
-    const response = await openHands.get("/api/zip-directory", {
+  static async getWorkspaceZip(conversationId: string): Promise<Blob> {
+    const url = `/api/conversations/${conversationId}/zip-directory`;
+    const response = await openHands.get(url, {
      responseType: "blob",
    });
    return response.data;
@@ -183,18 +201,69 @@ class OpenHands {
   * Get the VSCode URL
   * @returns VSCode URL
   */
-  static async getVSCodeUrl(): Promise<GetVSCodeUrlResponse> {
-    const { data } =
-      await openHands.get<GetVSCodeUrlResponse>("/api/vscode-url");
+  static async getVSCodeUrl(
+    conversationId: string,
+  ): Promise<GetVSCodeUrlResponse> {
+    const { data } = await openHands.get<GetVSCodeUrlResponse>(
+      `/api/conversations/${conversationId}/vscode-url`,
+    );
    return data;
  }

-  static async getRuntimeId(): Promise<{ runtime_id: string }> {
+  static async getRuntimeId(
+    conversationId: string,
+  ): Promise<{ runtime_id: string }> {
    const { data } = await openHands.get<{ runtime_id: string }>(
-      "/api/conversation",
+      `/api/conversations/${conversationId}/config`,
    );
    return data;
  }
+
+  static async searchEvents(
+    conversationId: string,
+    params: {
+      query?: string;
+      startId?: number;
+      limit?: number;
+      eventType?: string;
+      source?: string;
+      startDate?: string;
+      endDate?: string;
+    },
+  ): Promise<{ events: Record<string, unknown>[]; has_more: boolean }> {
+    const { data } = await openHands.get<{
+      events: Record<string, unknown>[];
+      has_more: boolean;
+    }>(`/api/conversations/${conversationId}/events/search`, {
+      params: {
+        query: params.query,
+        start_id: params.startId,
+        limit: params.limit,
+        event_type: params.eventType,
+        source: params.source,
+        start_date: params.startDate,
+        end_date: params.endDate,
+      },
+    });
+    return data;
+  }
+
+  static async newConversation(params: {
+    githubToken?: string;
+    args?: Record<string, unknown>;
+    selectedRepository?: string;
+  }): Promise<{ conversation_id: string }> {
+    const { data } = await openHands.post<{
+      conversation_id: string;
+    }>("/api/conversations", {
+      github_token: params.githubToken,
+      args: params.args,
+      selected_repository: params.selectedRepository,
+    });
+    // TODO: remove this once we have a multi-conversation UI
+    localStorage.setItem("latest_conversation_id", data.conversation_id);
+    return data;
+  }
 }

 export default OpenHands;
--- a/frontend/src/components/agent-status-map.constant.ts
+++ b/frontend/src/components/agent-status-map.constant.ts
@@ -1,5 +1,5 @@
 import { I18nKey } from "#/i18n/declaration";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";

 enum IndicatorColor {
  BLUE = "bg-blue-500",
--- a/frontend/src/components/features/chat/chat-interface.tsx
+++ b/frontend/src/components/features/chat/chat-interface.tsx
@@ -7,7 +7,7 @@ import { createChatMessage } from "#/services/chat-service";
 import { InteractiveChatBox } from "./interactive-chat-box";
 import { addUserMessage } from "#/state/chat-slice";
 import { RootState } from "#/store";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
 import { FeedbackModal } from "../feedback/feedback-modal";
 import { useScrollToBottom } from "#/hooks/use-scroll-to-bottom";
--- a/frontend/src/components/features/chat/chat-message.tsx
+++ b/frontend/src/components/features/chat/chat-message.tsx
@@ -5,6 +5,7 @@ import { code } from "../markdown/code";
 import { cn } from "#/utils/utils";
 import { ul, ol } from "../markdown/list";
 import { CopyToClipboardButton } from "#/components/shared/buttons/copy-to-clipboard-button";
+import { anchor } from "../markdown/anchor";

 interface ChatMessageProps {
  type: "user" | "assistant";
@@ -62,6 +63,7 @@ export function ChatMessage({
          code,
          ul,
          ol,
+          a: anchor,
        }}
        remarkPlugins={[remarkGfm]}
      >
--- a/frontend/src/components/features/controls/agent-control-bar.tsx
+++ b/frontend/src/components/features/controls/agent-control-bar.tsx
@@ -3,7 +3,7 @@ import PauseIcon from "#/assets/pause";
 import PlayIcon from "#/assets/play";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
 import { RootState } from "#/store";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { useWsClient } from "#/context/ws-client-provider";
 import { IGNORE_TASK_STATE_MAP } from "#/ignore-task-state-map.constant";
 import { ActionButton } from "#/components/shared/buttons/action-button";
--- a/frontend/src/components/features/controls/agent-status-bar.tsx
+++ b/frontend/src/components/features/controls/agent-status-bar.tsx
@@ -3,7 +3,7 @@ import { useTranslation } from "react-i18next";
 import { useSelector } from "react-redux";
 import toast from "react-hot-toast";
 import { RootState } from "#/store";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { AGENT_STATUS_MAP } from "../../agent-status-map.constant";

 export function AgentStatusBar() {
--- a/frontend/src/components/features/file-explorer/file-explorer.tsx
+++ b/frontend/src/components/features/file-explorer/file-explorer.tsx
@@ -1,7 +1,7 @@
 import React from "react";
 import { useDispatch, useSelector } from "react-redux";
 import { useTranslation } from "react-i18next";
-import AgentState from "#/types/agent-state";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";
 import { ExplorerTree } from "#/components/features/file-explorer/explorer-tree";
 import toast from "#/utils/toast";
 import { RootState } from "#/store";
@@ -15,10 +15,6 @@ import { FileExplorerHeader } from "./file-explorer-header";
 import { useVSCodeUrl } from "#/hooks/query/use-vscode-url";
 import { OpenVSCodeButton } from "#/components/shared/buttons/open-vscode-button";
 import { addAssistantMessage } from "#/state/chat-slice";
-import {
-  useWsClient,
-  WsClientProviderStatus,
-} from "#/context/ws-client-provider";

 interface FileExplorerProps {
  isOpen: boolean;
@@ -26,7 +22,6 @@ interface FileExplorerProps {
 }

 export function FileExplorer({ isOpen, onToggle }: FileExplorerProps) {
-  const { status } = useWsClient();
  const { t } = useTranslation();
  const dispatch = useDispatch();

@@ -38,7 +33,7 @@ export function FileExplorer({ isOpen, onToggle }: FileExplorerProps) {
  const { data: paths, refetch, error } = useListFiles();
  const { mutate: uploadFiles } = useUploadFiles();
  const { data: vscodeUrl } = useVSCodeUrl({
-    enabled: status === WsClientProviderStatus.ACTIVE,
+    enabled: !RUNTIME_INACTIVE_STATES.includes(curAgentState),
  });

  const handleOpenVSCode = () => {
@@ -96,10 +91,7 @@ export function FileExplorer({ isOpen, onToggle }: FileExplorerProps) {
  };

  const refreshWorkspace = () => {
-    if (
-      curAgentState !== AgentState.LOADING &&
-      curAgentState !== AgentState.STOPPED
-    ) {
+    if (!RUNTIME_INACTIVE_STATES.includes(curAgentState)) {
      refetch();
    }
  };
@@ -170,7 +162,7 @@ export function FileExplorer({ isOpen, onToggle }: FileExplorerProps) {
          {isOpen && (
            <OpenVSCodeButton
              onClick={handleOpenVSCode}
-              isDisabled={status === WsClientProviderStatus.OPENING}
+              isDisabled={RUNTIME_INACTIVE_STATES.includes(curAgentState)}
            />
          )}
        </div>
--- a/frontend/src/components/features/markdown/anchor.tsx
+++ b/frontend/src/components/features/markdown/anchor.tsx
@@ -0,0 +1,20 @@
+import React from "react";
+import { ExtraProps } from "react-markdown";
+
+export function anchor({
+  href,
+  children,
+}: React.ClassAttributes<HTMLAnchorElement> &
+  React.AnchorHTMLAttributes<HTMLAnchorElement> &
+  ExtraProps) {
+  return (
+    <a
+      className="text-blue-500 hover:underline"
+      href={href}
+      target="_blank"
+      rel="noopener noreferrer"
+    >
+      {children}
+    </a>
+  );
+}
--- a/frontend/src/components/features/sidebar/sidebar.tsx
+++ b/frontend/src/components/features/sidebar/sidebar.tsx
@@ -20,7 +20,7 @@ export function Sidebar() {
  const user = useGitHubUser();
  const { data: isAuthed } = useIsAuthed();

-  const { token, logout } = useAuth();
+  const { logout } = useAuth();
  const { settingsAreUpToDate } = useUserPrefs();

  const [accountSettingsModalOpen, setAccountSettingsModalOpen] =
@@ -45,7 +45,7 @@ export function Sidebar() {
  };

  const handleClickLogo = () => {
-    if (location.pathname.startsWith("/app"))
+    if (location.pathname.startsWith("/conversations/"))
      setStartNewProjectModalIsOpen(true);
  };

@@ -68,11 +68,9 @@ export function Sidebar() {
          />
          <SettingsButton onClick={() => setSettingsModalIsOpen(true)} />
          <DocsButton />
-          {!!token && (
-            <ExitProjectButton
-              onClick={() => setStartNewProjectModalIsOpen(true)}
-            />
-          )}
+          <ExitProjectButton
+            onClick={() => setStartNewProjectModalIsOpen(true)}
+          />
        </nav>
      </aside>
      {accountSettingsModalOpen && (
--- a/frontend/src/components/features/terminal/terminal-status-label.tsx
+++ b/frontend/src/components/features/terminal/terminal-status-label.tsx
@@ -1,20 +1,20 @@
-import {
-  useWsClient,
-  WsClientProviderStatus,
-} from "#/context/ws-client-provider";
+import { useSelector } from "react-redux";
 import { cn } from "#/utils/utils";
+import { AgentState } from "#/types/agent-state";
+import { RootState } from "#/store";

 export function TerminalStatusLabel() {
-  const { status } = useWsClient();
+  const { curAgentState } = useSelector((state: RootState) => state.agent);

  return (
    <div className="flex items-center gap-2">
      <div
        className={cn(
          "w-2 h-2 rounded-full",
-          status === WsClientProviderStatus.ACTIVE && "bg-green-500",
-          status !== WsClientProviderStatus.ACTIVE &&
-            "bg-red-500 animate-pulse",
+          curAgentState === AgentState.LOADING ||
+            curAgentState === AgentState.STOPPED
+            ? "bg-red-500 animate-pulse"
+            : "bg-green-500",
        )}
      />
      Terminal
--- a/frontend/src/components/features/terminal/terminal.tsx
+++ b/frontend/src/components/features/terminal/terminal.tsx
@@ -2,23 +2,20 @@ import { useSelector } from "react-redux";
 import { RootState } from "#/store";
 import { useTerminal } from "#/hooks/use-terminal";
 import "@xterm/xterm/css/xterm.css";
-import {
-  useWsClient,
-  WsClientProviderStatus,
-} from "#/context/ws-client-provider";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";

 interface TerminalProps {
  secrets: string[];
 }

 function Terminal({ secrets }: TerminalProps) {
-  const { status } = useWsClient();
  const { commands } = useSelector((state: RootState) => state.cmd);
+  const { curAgentState } = useSelector((state: RootState) => state.agent);

  const ref = useTerminal({
    commands,
    secrets,
-    disabled: status === WsClientProviderStatus.OPENING,
+    disabled: RUNTIME_INACTIVE_STATES.includes(curAgentState),
  });

  return (
--- a/frontend/src/components/shared/buttons/action-button.tsx
+++ b/frontend/src/components/shared/buttons/action-button.tsx
@@ -1,5 +1,5 @@
 import { Tooltip } from "@nextui-org/react";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";

 interface ActionButtonProps {
  isDisabled?: boolean;
--- a/frontend/src/components/shared/buttons/confirmation-buttons.tsx
+++ b/frontend/src/components/shared/buttons/confirmation-buttons.tsx
@@ -1,6 +1,6 @@
 import { useTranslation } from "react-i18next";
 import { I18nKey } from "#/i18n/declaration";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
 import { useWsClient } from "#/context/ws-client-provider";
 import { ActionTooltip } from "../action-tooltip";
--- a/frontend/src/components/shared/modals/exit-project-confirmation-modal.tsx
+++ b/frontend/src/components/shared/modals/exit-project-confirmation-modal.tsx
@@ -1,7 +1,7 @@
 import { useDispatch } from "react-redux";
 import { useEndSession } from "#/hooks/use-end-session";
 import { setCurrentAgentState } from "#/state/agent-slice";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { DangerModal } from "./confirmation-modals/danger-modal";
 import { ModalBackdrop } from "./modal-backdrop";

--- a/frontend/src/components/shared/modals/settings/settings-form.tsx
+++ b/frontend/src/components/shared/modals/settings/settings-form.tsx
@@ -87,7 +87,7 @@ export function SettingsForm({
  const [showWarningModal, setShowWarningModal] = React.useState(false);

  const resetOngoingSession = () => {
-    if (location.pathname.startsWith("/app")) {
+    if (location.pathname.startsWith("/conversations/")) {
      endSession();
      onClose();
    }
@@ -129,7 +129,7 @@ export function SettingsForm({

    if (!apiKey) {
      setShowWarningModal(true);
-    } else if (location.pathname.startsWith("/app")) {
+    } else if (location.pathname.startsWith("/conversations/")) {
      setConfirmEndSessionModalOpen(true);
    } else {
      handleFormSubmission(formData);
--- a/frontend/src/components/shared/task-form.tsx
+++ b/frontend/src/components/shared/task-form.tsx
@@ -1,6 +1,7 @@
 import React from "react";
 import { useNavigate, useNavigation } from "react-router";
 import { useDispatch, useSelector } from "react-redux";
+import { useMutation } from "@tanstack/react-query";
 import posthog from "posthog-js";
 import { RootState } from "#/store";
 import {
@@ -8,6 +9,10 @@ import {
  removeFile,
  setInitialQuery,
 } from "#/state/initial-query-slice";
+import OpenHands from "#/api/open-hands";
+import { useAuth } from "#/context/auth-context";
+import { useUserPrefs } from "#/context/user-prefs-context";
+
 import { SuggestionBubble } from "#/components/features/suggestions/suggestion-bubble";
 import { SUGGESTIONS } from "#/utils/suggestions";
 import { convertImageToBase64 } from "#/utils/convert-image-to-base-64";
@@ -22,6 +27,8 @@ export const TaskForm = React.forwardRef<HTMLFormElement>((_, ref) => {
  const dispatch = useDispatch();
  const navigation = useNavigation();
  const navigate = useNavigate();
+  const { gitHubToken } = useAuth();
+  const { settings } = useUserPrefs();

  const { selectedRepository, files } = useSelector(
    (state: RootState) => state.initalQuery,
@@ -32,6 +39,25 @@ export const TaskForm = React.forwardRef<HTMLFormElement>((_, ref) => {
    getRandomKey(SUGGESTIONS["non-repo"]),
  );
  const [inputIsFocused, setInputIsFocused] = React.useState(false);
+  const newConversationMutation = useMutation({
+    mutationFn: (variables: { q?: string }) => {
+      if (variables.q) dispatch(setInitialQuery(variables.q));
+      return OpenHands.newConversation({
+        githubToken: gitHubToken || undefined,
+        selectedRepository: selectedRepository || undefined,
+        args: settings || undefined,
+      });
+    },
+    onSuccess: ({ conversation_id: conversationId }, { q }) => {
+      posthog.capture("initial_query_submitted", {
+        entry_point: "task_form",
+        query_character_length: q?.length,
+        has_repository: !!selectedRepository,
+        has_files: files.length > 0,
+      });
+      navigate(`/conversations/${conversationId}`);
+    },
+  });

  const onRefreshSuggestion = () => {
    const suggestions = SUGGESTIONS["non-repo"];
@@ -62,16 +88,7 @@ export const TaskForm = React.forwardRef<HTMLFormElement>((_, ref) => {
    const formData = new FormData(event.currentTarget);

    const q = formData.get("q")?.toString();
-    if (q) dispatch(setInitialQuery(q));
-
-    posthog.capture("initial_query_submitted", {
-      entry_point: "task_form",
-      query_character_length: q?.length,
-      has_repository: !!selectedRepository,
-      has_files: files.length > 0,
-    });
-
-    navigate("/app");
+    newConversationMutation.mutate({ q });
  };

  return (
@@ -114,7 +131,10 @@ export const TaskForm = React.forwardRef<HTMLFormElement>((_, ref) => {
            showButton={!!text}
            className="text-[17px] leading-5 py-[17px]"
            buttonClassName="pb-[17px]"
-            disabled={navigation.state === "submitting"}
+            disabled={
+              navigation.state === "submitting" ||
+              newConversationMutation.isPending
+            }
          />
        </div>
      </form>
--- a/frontend/src/context/auth-context.tsx
+++ b/frontend/src/context/auth-context.tsx
@@ -2,10 +2,8 @@ import posthog from "posthog-js";
 import React from "react";
 import OpenHands from "#/api/open-hands";
 import {
-  removeAuthTokenHeader as removeOpenHandsAuthTokenHeader,
  removeGitHubTokenHeader as removeOpenHandsGitHubTokenHeader,
  setGitHubTokenHeader as setOpenHandsGitHubTokenHeader,
-  setAuthTokenHeader as setOpenHandsAuthTokenHeader,
 } from "#/api/open-hands-axios";
 import {
  setAuthTokenHeader as setGitHubAuthTokenHeader,
@@ -14,11 +12,9 @@ import {
 } from "#/api/github-axios-instance";

 interface AuthContextType {
-  token: string | null;
  gitHubToken: string | null;
-  setToken: (token: string | null) => void;
+  setUserId: (userId: string) => void;
  setGitHubToken: (token: string | null) => void;
-  clearToken: () => void;
  clearGitHubToken: () => void;
  refreshToken: () => Promise<boolean>;
  logout: () => void;
@@ -27,39 +23,24 @@ interface AuthContextType {
 const AuthContext = React.createContext<AuthContextType | undefined>(undefined);

 function AuthProvider({ children }: React.PropsWithChildren) {
-  const [tokenState, setTokenState] = React.useState<string | null>(() =>
-    localStorage.getItem("token"),
-  );
  const [gitHubTokenState, setGitHubTokenState] = React.useState<string | null>(
    () => localStorage.getItem("ghToken"),
  );

-  const clearToken = () => {
-    setTokenState(null);
-    localStorage.removeItem("token");
-
-    removeOpenHandsAuthTokenHeader();
-  };
+  const [userIdState, setUserIdState] = React.useState<string>(
+    () => localStorage.getItem("userId") || "",
+  );

  const clearGitHubToken = () => {
    setGitHubTokenState(null);
+    setUserIdState("");
    localStorage.removeItem("ghToken");
+    localStorage.removeItem("userId");

    removeOpenHandsGitHubTokenHeader();
    removeGitHubAuthTokenHeader();
  };

-  const setToken = (token: string | null) => {
-    setTokenState(token);
-
-    if (token) {
-      localStorage.setItem("token", token);
-      setOpenHandsAuthTokenHeader(token);
-    } else {
-      clearToken();
-    }
-  };
-
  const setGitHubToken = (token: string | null) => {
    setGitHubTokenState(token);

@@ -72,6 +53,11 @@ function AuthProvider({ children }: React.PropsWithChildren) {
    }
  };

+  const setUserId = (userId: string) => {
+    setUserIdState(userIdState);
+    localStorage.setItem("userId", userId);
+  };
+
  const logout = () => {
    clearGitHubToken();
    posthog.reset();
@@ -84,7 +70,7 @@ function AuthProvider({ children }: React.PropsWithChildren) {
      return false;
    }

-    const newToken = await OpenHands.refreshToken(config.APP_MODE);
+    const newToken = await OpenHands.refreshToken(config.APP_MODE, userIdState);
    if (newToken) {
      setGitHubToken(newToken);
      return true;
@@ -95,26 +81,25 @@ function AuthProvider({ children }: React.PropsWithChildren) {
  };

  React.useEffect(() => {
-    const storedToken = localStorage.getItem("token");
    const storedGitHubToken = localStorage.getItem("ghToken");

-    setToken(storedToken);
+    const userId = localStorage.getItem("userId") || "";
+
    setGitHubToken(storedGitHubToken);
+    setUserId(userId);
    setupGithubAxiosInterceptors(refreshToken, logout);
  }, []);

  const value = React.useMemo(
    () => ({
-      token: tokenState,
      gitHubToken: gitHubTokenState,
-      setToken,
      setGitHubToken,
-      clearToken,
+      setUserId,
      clearGitHubToken,
      refreshToken,
      logout,
    }),
-    [tokenState, gitHubTokenState],
+    [gitHubTokenState],
  );

  return <AuthContext.Provider value={value}>{children}</AuthContext.Provider>;
--- a/frontend/src/context/conversation-context.tsx
+++ b/frontend/src/context/conversation-context.tsx
@@ -0,0 +1,42 @@
+import React, { useMemo } from "react";
+import { useParams } from "react-router";
+
+interface ConversationContextType {
+  conversationId: string;
+}
+
+const ConversationContext = React.createContext<ConversationContextType | null>(
+  null,
+);
+
+export function ConversationProvider({
+  children,
+}: {
+  children: React.ReactNode;
+}) {
+  const { conversationId } = useParams<{ conversationId: string }>();
+
+  if (!conversationId) {
+    throw new Error(
+      "ConversationProvider must be used within a route that has a conversationId parameter",
+    );
+  }
+
+  const value = useMemo(() => ({ conversationId }), [conversationId]);
+
+  return (
+    <ConversationContext.Provider value={value}>
+      {children}
+    </ConversationContext.Provider>
+  );
+}
+
+export function useConversation() {
+  const context = React.useContext(ConversationContext);
+  if (!context) {
+    throw new Error(
+      "useConversation must be used within a ConversationProvider",
+    );
+  }
+  return context;
+}
--- a/frontend/src/context/ws-client-provider.tsx
+++ b/frontend/src/context/ws-client-provider.tsx
@@ -1,21 +1,17 @@
 import posthog from "posthog-js";
 import React from "react";
 import { io, Socket } from "socket.io-client";
-import { Settings } from "#/services/settings";
-import ActionType from "#/types/action-type";
+
 import EventLogger from "#/utils/event-logger";
 import { handleAssistantMessage } from "#/services/actions";
 import { useRate } from "#/hooks/use-rate";
-import AgentState from "#/types/agent-state";

 const isOpenHandsMessage = (event: Record<string, unknown>) =>
  event.action === "message";

 export enum WsClientProviderStatus {
-  STOPPED,
-  OPENING,
-  ACTIVE,
-  ERROR,
+  CONNECTED,
+  DISCONNECTED,
 }

 interface UseWsClient {
@@ -26,7 +22,7 @@ interface UseWsClient {
 }

 const WsClientContext = React.createContext<UseWsClient>({
-  status: WsClientProviderStatus.STOPPED,
+  status: WsClientProviderStatus.DISCONNECTED,
  isLoadingMessages: true,
  events: [],
  send: () => {
@@ -35,29 +31,23 @@ const WsClientContext = React.createContext<UseWsClient>({
 });

 interface WsClientProviderProps {
-  enabled: boolean;
-  token: string | null;
+  conversationId: string;
  ghToken: string | null;
-  selectedRepository: string | null;
-  settings: Settings | null;
 }

 export function WsClientProvider({
-  enabled,
-  token,
  ghToken,
-  selectedRepository,
-  settings,
+  conversationId,
  children,
 }: React.PropsWithChildren<WsClientProviderProps>) {
  const sioRef = React.useRef<Socket | null>(null);
-  const tokenRef = React.useRef<string | null>(token);
  const ghTokenRef = React.useRef<string | null>(ghToken);
-  const selectedRepositoryRef = React.useRef<string | null>(selectedRepository);
  const disconnectRef = React.useRef<ReturnType<typeof setTimeout> | null>(
    null,
  );
-  const [status, setStatus] = React.useState(WsClientProviderStatus.STOPPED);
+  const [status, setStatus] = React.useState(
+    WsClientProviderStatus.DISCONNECTED,
+  );
  const [events, setEvents] = React.useState<Record<string, unknown>[]>([]);
  const lastEventRef = React.useRef<Record<string, unknown> | null>(null);

@@ -72,26 +62,7 @@ export function WsClientProvider({
  }

  function handleConnect() {
-    setStatus(WsClientProviderStatus.OPENING);
-
-    const initEvent: Record<string, unknown> = {
-      action: ActionType.INIT,
-      args: settings,
-    };
-    if (token) {
-      initEvent.token = token;
-    }
-    if (ghToken) {
-      initEvent.github_token = ghToken;
-    }
-    if (selectedRepository) {
-      initEvent.selected_repository = selectedRepository;
-    }
-    const lastEvent = lastEventRef.current;
-    if (lastEvent) {
-      initEvent.latest_event_id = lastEvent.id;
-    }
-    send(initEvent);
+    setStatus(WsClientProviderStatus.CONNECTED);
  }

  function handleMessage(event: Record<string, unknown>) {
@@ -103,60 +74,47 @@ export function WsClientProvider({
      lastEventRef.current = event;
    }

-    const extras = event.extras as Record<string, unknown>;
-    if (extras?.agent_state === AgentState.INIT) {
-      setStatus(WsClientProviderStatus.ACTIVE);
-    }
-
-    if (
-      status !== WsClientProviderStatus.ACTIVE &&
-      event?.observation === "error"
-    ) {
-      setStatus(WsClientProviderStatus.ERROR);
-      return;
-    }
-
-    if (!event.token) {
-      handleAssistantMessage(event);
-    }
+    handleAssistantMessage(event);
  }

  function handleDisconnect() {
-    setStatus(WsClientProviderStatus.STOPPED);
+    setStatus(WsClientProviderStatus.DISCONNECTED);
+    const sio = sioRef.current;
+    if (!sio) {
+      return;
+    }
+    sio.io.opts.query = sio.io.opts.query || {};
+    sio.io.opts.query.latest_event_id = lastEventRef.current?.id;
  }

  function handleError() {
    posthog.capture("socket_error");
-    setStatus(WsClientProviderStatus.ERROR);
+    setStatus(WsClientProviderStatus.DISCONNECTED);
  }

-  // Connect websocket
  React.useEffect(() => {
+    if (!conversationId) {
+      throw new Error("No conversation ID provided");
+    }
+
    let sio = sioRef.current;

-    // If disabled disconnect any existing websockets...
-    if (!enabled) {
-      if (sio) {
-        sio.disconnect();
-      }
-      return () => {};
-    }
+    const lastEvent = lastEventRef.current;
+    const query = {
+      latest_event_id: lastEvent?.id ?? -1,
+      conversation_id: conversationId,
+    };

-    // If there is no websocket or the tokens have changed or the current websocket is disconnected,
-    // create a new one
-    if (
-      !sio ||
-      (tokenRef.current && token && token !== tokenRef.current) ||
-      ghToken !== ghTokenRef.current
-    ) {
-      sio?.disconnect();
+    const baseUrl =
+      import.meta.env.VITE_BACKEND_BASE_URL || window?.location.host;

-      const baseUrl =
-        import.meta.env.VITE_BACKEND_BASE_URL || window?.location.host;
-      sio = io(baseUrl, {
-        transports: ["websocket"],
-      });
-    }
+    sio = io(baseUrl, {
+      transports: ["websocket"],
+      auth: {
+        github_token: ghToken || undefined,
+      },
+      query,
+    });
    sio.on("connect", handleConnect);
    sio.on("oh_event", handleMessage);
    sio.on("connect_error", handleError);
@@ -164,9 +122,7 @@ export function WsClientProvider({
    sio.on("disconnect", handleDisconnect);

    sioRef.current = sio;
-    tokenRef.current = token;
    ghTokenRef.current = ghToken;
-    selectedRepositoryRef.current = selectedRepository;

    return () => {
      sio.off("connect", handleConnect);
@@ -175,7 +131,7 @@ export function WsClientProvider({
      sio.off("connect_failed", handleError);
      sio.off("disconnect", handleDisconnect);
    };
-  }, [enabled, token, ghToken, selectedRepository]);
+  }, [ghToken, conversationId]);

  // Strict mode mounts and unmounts each component twice, so we have to wait in the destructor
  // before actually disconnecting the socket and cancel the operation if the component gets remounted.
--- a/frontend/src/hooks/mutation/use-save-file.ts
+++ b/frontend/src/hooks/mutation/use-save-file.ts
@@ -1,17 +1,20 @@
 import { useMutation } from "@tanstack/react-query";
 import toast from "react-hot-toast";
 import OpenHands from "#/api/open-hands";
+import { useConversation } from "#/context/conversation-context";

 type SaveFileArgs = {
  path: string;
  content: string;
 };

-export const useSaveFile = () =>
-  useMutation({
+export const useSaveFile = () => {
+  const { conversationId } = useConversation();
+  return useMutation({
    mutationFn: ({ path, content }: SaveFileArgs) =>
-      OpenHands.saveFile(path, content),
+      OpenHands.saveFile(conversationId, path, content),
    onError: (error) => {
      toast.error(error.message);
    },
  });
+};
--- a/frontend/src/hooks/mutation/use-submit-feedback.ts
+++ b/frontend/src/hooks/mutation/use-submit-feedback.ts
@@ -2,16 +2,19 @@ import { useMutation } from "@tanstack/react-query";
 import toast from "react-hot-toast";
 import { Feedback } from "#/api/open-hands.types";
 import OpenHands from "#/api/open-hands";
+import { useConversation } from "#/context/conversation-context";

 type SubmitFeedbackArgs = {
  feedback: Feedback;
 };

-export const useSubmitFeedback = () =>
-  useMutation({
+export const useSubmitFeedback = () => {
+  const { conversationId } = useConversation();
+  return useMutation({
    mutationFn: ({ feedback }: SubmitFeedbackArgs) =>
-      OpenHands.submitFeedback(feedback),
+      OpenHands.submitFeedback(conversationId, feedback),
    onError: (error) => {
      toast.error(error.message);
    },
  });
+};
--- a/frontend/src/hooks/mutation/use-upload-files.ts
+++ b/frontend/src/hooks/mutation/use-upload-files.ts
@@ -1,11 +1,15 @@
 import { useMutation } from "@tanstack/react-query";
 import OpenHands from "#/api/open-hands";
+import { useConversation } from "#/context/conversation-context";

 type UploadFilesArgs = {
  files: File[];
 };

-export const useUploadFiles = () =>
-  useMutation({
-    mutationFn: ({ files }: UploadFilesArgs) => OpenHands.uploadFiles(files),
+export const useUploadFiles = () => {
+  const { conversationId } = useConversation();
+  return useMutation({
+    mutationFn: ({ files }: UploadFilesArgs) =>
+      OpenHands.uploadFiles(conversationId, files),
  });
+};
--- a/frontend/src/hooks/query/use-conversation-config.ts
+++ b/frontend/src/hooks/query/use-conversation-config.ts
@@ -4,15 +4,20 @@ import {
  useWsClient,
  WsClientProviderStatus,
 } from "#/context/ws-client-provider";
+import { useConversation } from "#/context/conversation-context";
 import OpenHands from "#/api/open-hands";

 export const useConversationConfig = () => {
  const { status } = useWsClient();
+  const { conversationId } = useConversation();

  const query = useQuery({
-    queryKey: ["conversation_config"],
-    queryFn: OpenHands.getRuntimeId,
-    enabled: status === WsClientProviderStatus.ACTIVE,
+    queryKey: ["conversation_config", conversationId],
+    queryFn: () => {
+      if (!conversationId) throw new Error("No conversation ID");
+      return OpenHands.getRuntimeId(conversationId);
+    },
+    enabled: status === WsClientProviderStatus.CONNECTED && !!conversationId,
  });

  React.useEffect(() => {
--- a/frontend/src/hooks/query/use-github-user.ts
+++ b/frontend/src/hooks/query/use-github-user.ts
@@ -6,7 +6,7 @@ import { useAuth } from "#/context/auth-context";
 import { useConfig } from "./use-config";

 export const useGitHubUser = () => {
-  const { gitHubToken } = useAuth();
+  const { gitHubToken, setUserId } = useAuth();
  const { data: config } = useConfig();

  const user = useQuery({
@@ -18,6 +18,7 @@ export const useGitHubUser = () => {

  React.useEffect(() => {
    if (user.data) {
+      setUserId(user.data.id.toString());
      posthog.identify(user.data.login, {
        company: user.data.company,
        name: user.data.name,
--- a/frontend/src/hooks/query/use-list-file.ts
+++ b/frontend/src/hooks/query/use-list-file.ts
@@ -1,13 +1,16 @@
 import { useQuery } from "@tanstack/react-query";
 import OpenHands from "#/api/open-hands";
+import { useConversation } from "#/context/conversation-context";

 interface UseListFileConfig {
  path: string;
 }

-export const useListFile = (config: UseListFileConfig) =>
-  useQuery({
-    queryKey: ["file", config.path],
-    queryFn: () => OpenHands.getFile(config.path),
+export const useListFile = (config: UseListFileConfig) => {
+  const { conversationId } = useConversation();
+  return useQuery({
+    queryKey: ["file", conversationId, config.path],
+    queryFn: () => OpenHands.getFile(conversationId, config.path),
    enabled: false, // don't fetch by default, trigger manually via `refetch`
  });
+};
--- a/frontend/src/hooks/query/use-list-files.ts
+++ b/frontend/src/hooks/query/use-list-files.ts
@@ -4,7 +4,7 @@ import {
  WsClientProviderStatus,
 } from "#/context/ws-client-provider";
 import OpenHands from "#/api/open-hands";
-import { useAuth } from "#/context/auth-context";
+import { useConversation } from "#/context/conversation-context";

 interface UseListFilesConfig {
  path?: string;
@@ -12,13 +12,13 @@ interface UseListFilesConfig {
 }

 export const useListFiles = (config?: UseListFilesConfig) => {
-  const { token } = useAuth();
+  const { conversationId } = useConversation();
  const { status } = useWsClient();
-  const isActive = status === WsClientProviderStatus.ACTIVE;
+  const isActive = status === WsClientProviderStatus.CONNECTED;

  return useQuery({
-    queryKey: ["files", token, config?.path],
-    queryFn: () => OpenHands.getFiles(config?.path),
-    enabled: !!(isActive && config?.enabled && token),
+    queryKey: ["files", conversationId, config?.path],
+    queryFn: () => OpenHands.getFiles(conversationId, config?.path),
+    enabled: !!(isActive && config?.enabled),
  });
 };
--- a/frontend/src/hooks/query/use-search-events.ts
+++ b/frontend/src/hooks/query/use-search-events.ts
@@ -0,0 +1,24 @@
+import { useQuery } from "@tanstack/react-query";
+import { useConversation } from "#/context/conversation-context";
+import OpenHands from "#/api/open-hands";
+
+export const useSearchEvents = (params: {
+  query?: string;
+  startId?: number;
+  limit?: number;
+  eventType?: string;
+  source?: string;
+  startDate?: string;
+  endDate?: string;
+}) => {
+  const { conversationId } = useConversation();
+
+  return useQuery({
+    queryKey: ["search_events", conversationId, params],
+    queryFn: () => {
+      if (!conversationId) throw new Error("No conversation ID");
+      return OpenHands.searchEvents(conversationId, params);
+    },
+    enabled: !!conversationId,
+  });
+};
--- a/frontend/src/hooks/query/use-vscode-url.ts
+++ b/frontend/src/hooks/query/use-vscode-url.ts
@@ -1,11 +1,17 @@
 import { useQuery } from "@tanstack/react-query";
 import OpenHands from "#/api/open-hands";
+import { useConversation } from "#/context/conversation-context";

 export const useVSCodeUrl = (config: { enabled: boolean }) => {
+  const { conversationId } = useConversation();
+
  const data = useQuery({
-    queryKey: ["vscode_url"],
-    queryFn: OpenHands.getVSCodeUrl,
-    enabled: config.enabled,
+    queryKey: ["vscode_url", conversationId],
+    queryFn: () => {
+      if (!conversationId) throw new Error("No conversation ID");
+      return OpenHands.getVSCodeUrl(conversationId);
+    },
+    enabled: !!conversationId && config.enabled,
    refetchOnMount: false,
  });

--- a/frontend/src/hooks/use-download-progress.ts
+++ b/frontend/src/hooks/use-download-progress.ts
@@ -1,6 +1,7 @@
 import { useCallback, useEffect, useRef, useState } from "react";
 import { downloadFiles } from "#/utils/download-files";
 import { DownloadProgressState } from "#/components/shared/download-progress";
+import { useConversation } from "#/context/conversation-context";

 export const INITIAL_PROGRESS: DownloadProgressState = {
  filesTotal: 0,
@@ -20,6 +21,7 @@ export function useDownloadProgress(
    useState<DownloadProgressState>(INITIAL_PROGRESS);
  const progressRef = useRef<DownloadProgressState>(INITIAL_PROGRESS);
  const abortController = useRef<AbortController>();
+  const { conversationId } = useConversation();

  // Create AbortController on mount
  useEffect(() => {
@@ -45,7 +47,7 @@ export function useDownloadProgress(
    // Start download
    const download = async () => {
      try {
-        await downloadFiles(initialPath, {
+        await downloadFiles(conversationId, initialPath, {
          onProgress: (p) => {
            // Update both the ref and state
            progressRef.current = { ...p };
--- a/frontend/src/hooks/use-end-session.ts
+++ b/frontend/src/hooks/use-end-session.ts
@@ -1,6 +1,5 @@
 import { useDispatch } from "react-redux";
 import { useNavigate } from "react-router";
-import { useAuth } from "#/context/auth-context";
 import {
  initialState as browserInitialState,
  setScreenshotSrc,
@@ -11,13 +10,11 @@ import { clearSelectedRepository } from "#/state/initial-query-slice";
 export const useEndSession = () => {
  const navigate = useNavigate();
  const dispatch = useDispatch();
-  const { clearToken } = useAuth();

  /**
   * End the current session by clearing the token and redirecting to the home page.
   */
  const endSession = () => {
-    clearToken();
    dispatch(clearSelectedRepository());

    // Reset browser state to initial values
--- a/frontend/src/i18n/translation.json
+++ b/frontend/src/i18n/translation.json
@@ -2014,6 +2014,9 @@
  "ACTION_MESSAGE$READ": {
    "en": "Reading the contents of a file"
  },
+  "ACTION_MESSAGE$EDIT": {
+    "en": "Editing the contents of a file"
+  },
  "ACTION_MESSAGE$WRITE": {
    "en": "Writing to a file"
  },
@@ -2029,6 +2032,9 @@
  "OBSERVATION_MESSAGE$READ": {
    "en": "Read the contents of a file"
  },
+  "OBSERVATION_MESSAGE$EDIT": {
+    "en": "Edited the contents of a file"
+  },
  "OBSERVATION_MESSAGE$WRITE": {
    "en": "Wrote to a file"
  },
--- a/frontend/src/ignore-task-state-map.constant.ts
+++ b/frontend/src/ignore-task-state-map.constant.ts
@@ -1,4 +1,4 @@
-import AgentState from "./types/agent-state";
+import { AgentState } from "./types/agent-state";

 export const IGNORE_TASK_STATE_MAP: Record<string, AgentState[]> = {
  [AgentState.PAUSED]: [
--- a/frontend/src/mocks/handlers.ws.ts
+++ b/frontend/src/mocks/handlers.ws.ts
@@ -1,5 +1,5 @@
 import { delay, WebSocketHandler, ws } from "msw";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import {
  AgentStateChangeObservation,
  CommandObservation,
--- a/frontend/src/routes.ts
+++ b/frontend/src/routes.ts
@@ -8,7 +8,7 @@ import {
 export default [
  layout("routes/_oh/route.tsx", [
    index("routes/_oh._index/route.tsx"),
-    route("app", "routes/_oh.app/route.tsx", [
+    route("conversations/:conversationId", "routes/_oh.app/route.tsx", [
      index("routes/_oh.app._index/route.tsx"),
      route("browser", "routes/_oh.app.browser.tsx"),
      route("jupyter", "routes/_oh.app.jupyter.tsx"),
--- a/frontend/src/routes/_oh._index/route.tsx
+++ b/frontend/src/routes/_oh._index/route.tsx
@@ -1,4 +1,3 @@
-import { useLocation, useNavigate } from "react-router";
 import React from "react";
 import { useDispatch } from "react-redux";
 import posthog from "posthog-js";
@@ -17,12 +16,8 @@ import { HeroHeading } from "#/components/shared/hero-heading";
 import { TaskForm } from "#/components/shared/task-form";

 function Home() {
-  const { token, gitHubToken } = useAuth();
-
+  const { gitHubToken } = useAuth();
  const dispatch = useDispatch();
-  const location = useLocation();
-  const navigate = useNavigate();
-
  const formRef = React.useRef<HTMLFormElement>(null);

  const { data: config } = useConfig();
@@ -36,9 +31,7 @@ function Home() {
    gitHubClientId: config?.GITHUB_CLIENT_ID || null,
  });

-  React.useEffect(() => {
-    if (token) navigate("/app");
-  }, [location.pathname]);
+  const latestConversation = localStorage.getItem("latest_conversation_id");

  return (
    <div
@@ -46,7 +39,7 @@ function Home() {
      className="bg-root-secondary h-full rounded-xl flex flex-col items-center justify-center relative overflow-y-auto"
    >
      <HeroHeading />
-      <div className="flex flex-col gap-16 w-[600px] items-center">
+      <div className="flex flex-col gap-8 w-[600px] items-center">
        <div className="flex flex-col gap-2 w-full">
          <TaskForm ref={formRef} />
        </div>
@@ -76,6 +69,19 @@ function Home() {
          />
        </div>
      </div>
+      {latestConversation && (
+        <div className="flex gap-4 w-full text-center mt-8">
+          <p className="text-center w-full">
+            Or&nbsp;
+            <a
+              className="underline"
+              href={`/conversations/${latestConversation}`}
+            >
+              jump back to your most recent conversation
+            </a>
+          </p>
+        </div>
+      )}
    </div>
  );
 }
--- a/frontend/src/routes/_oh.app._index/route.tsx
+++ b/frontend/src/routes/_oh.app._index/route.tsx
@@ -4,7 +4,7 @@ import { useRouteError } from "react-router";
 import { editor } from "monaco-editor";
 import { EditorProps } from "@monaco-editor/react";
 import { RootState } from "#/store";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import CodeEditorComponent from "../../components/features/editor/code-editor-component";
 import { useFiles } from "#/context/files";
 import { useSaveFile } from "#/hooks/mutation/use-save-file";
--- a/frontend/src/routes/_oh.app/hooks/use-handle-runtime-active.ts
+++ b/frontend/src/routes/_oh.app/hooks/use-handle-runtime-active.ts
@@ -2,10 +2,7 @@ import React from "react";
 import toast from "react-hot-toast";
 import { useDispatch, useSelector } from "react-redux";
 import { useAuth } from "#/context/auth-context";
-import {
-  useWsClient,
-  WsClientProviderStatus,
-} from "#/context/ws-client-provider";
+import { useWsClient } from "#/context/ws-client-provider";
 import { getGitHubTokenCommand } from "#/services/terminal-service";
 import { setImportedProjectZip } from "#/state/initial-query-slice";
 import { RootState } from "#/store";
@@ -13,17 +10,19 @@ import { base64ToBlob } from "#/utils/base64-to-blob";
 import { useUploadFiles } from "../../../hooks/mutation/use-upload-files";
 import { useGitHubUser } from "../../../hooks/query/use-github-user";
 import { isGitHubErrorReponse } from "#/api/github-axios-instance";
+import { RUNTIME_INACTIVE_STATES } from "#/types/agent-state";

 export const useHandleRuntimeActive = () => {
  const { gitHubToken } = useAuth();
-  const { status, send } = useWsClient();
+  const { send } = useWsClient();
+  const { curAgentState } = useSelector((state: RootState) => state.agent);

  const dispatch = useDispatch();

  const { data: user } = useGitHubUser();
  const { mutate: uploadFiles } = useUploadFiles();

-  const runtimeActive = status === WsClientProviderStatus.ACTIVE;
+  const runtimeActive = !RUNTIME_INACTIVE_STATES.includes(curAgentState);

  const { importedProjectZip } = useSelector(
    (state: RootState) => state.initalQuery,
--- a/frontend/src/routes/_oh.app/hooks/use-handle-ws-events.ts
+++ b/frontend/src/routes/_oh.app/hooks/use-handle-ws-events.ts
@@ -1,11 +1,10 @@
 import React from "react";
 import toast from "react-hot-toast";
 import { useDispatch } from "react-redux";
-import { useAuth } from "#/context/auth-context";
 import { useWsClient } from "#/context/ws-client-provider";
 import { generateAgentStateChangeEvent } from "#/services/agent-state-service";
 import { addErrorMessage } from "#/state/chat-slice";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { ErrorObservation } from "#/types/core/observations";
 import { useEndSession } from "../../../hooks/use-end-session";

@@ -22,7 +21,6 @@ const isErrorObservation = (data: object): data is ErrorObservation =>

 export const useHandleWSEvents = () => {
  const { events, send } = useWsClient();
-  const { setToken } = useAuth();
  const endSession = useEndSession();
  const dispatch = useDispatch();

@@ -31,10 +29,6 @@ export const useHandleWSEvents = () => {
      return;
    }
    const event = events[events.length - 1];
-    if (event.token && typeof event.token === "string") {
-      setToken(event.token);
-      return;
-    }

    if (isServerError(event)) {
      if (event.error_code === 401) {
--- a/frontend/src/routes/_oh.app/hooks/use-ws-status-change.ts
+++ b/frontend/src/routes/_oh.app/hooks/use-ws-status-change.ts
@@ -14,11 +14,12 @@ import {
  clearInitialQuery,
 } from "#/state/initial-query-slice";
 import { RootState } from "#/store";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";

 export const useWSStatusChange = () => {
  const { send, status } = useWsClient();
  const { gitHubToken } = useAuth();
+  const { curAgentState } = useSelector((state: RootState) => state.agent);
  const dispatch = useDispatch();

  const statusRef = React.useRef<WsClientProviderStatus | null>(null);
@@ -47,7 +48,7 @@ export const useWSStatusChange = () => {
    dispatch(clearInitialQuery()); // reset initial query
  };

-  const handleOnWSActive = () => {
+  const handleAgentInit = () => {
    let additionalInfo = "";

    if (gitHubToken && selectedRepository) {
@@ -63,6 +64,11 @@ export const useWSStatusChange = () => {
      dispatchInitialQuery(initialQuery, additionalInfo);
    }
  };
+  React.useEffect(() => {
+    if (curAgentState === AgentState.INIT) {
+      handleAgentInit();
+    }
+  }, [curAgentState]);

  React.useEffect(() => {
    if (statusRef.current === status) {
@@ -70,11 +76,7 @@ export const useWSStatusChange = () => {
    }
    statusRef.current = status;

-    if (status === WsClientProviderStatus.ACTIVE) {
-      handleOnWSActive();
-    }
-
-    if (status === WsClientProviderStatus.OPENING && initialQuery) {
+    if (status === WsClientProviderStatus.CONNECTED && initialQuery) {
      dispatch(
        addUserMessage({
          content: initialQuery,
@@ -85,7 +87,7 @@ export const useWSStatusChange = () => {
      );
    }

-    if (status === WsClientProviderStatus.STOPPED) {
+    if (status === WsClientProviderStatus.DISCONNECTED) {
      dispatch(setCurrentAgentState(AgentState.STOPPED));
    }
  }, [status]);
--- a/frontend/src/routes/_oh.app/route.tsx
+++ b/frontend/src/routes/_oh.app/route.tsx
@@ -2,6 +2,10 @@ import { useDisclosure } from "@nextui-org/react";
 import React from "react";
 import { Outlet } from "react-router";
 import { useDispatch, useSelector } from "react-redux";
+import {
+  ConversationProvider,
+  useConversation,
+} from "#/context/conversation-context";
 import { Controls } from "#/components/features/controls/controls";
 import { RootState } from "#/store";
 import { clearMessages } from "#/state/chat-slice";
@@ -24,9 +28,10 @@ import Security from "#/components/shared/modals/security/security";
 import { CountBadge } from "#/components/layout/count-badge";
 import { TerminalStatusLabel } from "#/components/features/terminal/terminal-status-label";

-function App() {
-  const { token, gitHubToken } = useAuth();
+function AppContent() {
+  const { gitHubToken } = useAuth();
  const { settings } = useUserPrefs();
+  const { conversationId } = useConversation();

  const dispatch = useDispatch();
  useConversationConfig();
@@ -42,8 +47,8 @@ function App() {
  });

  const secrets = React.useMemo(
-    () => [gitHubToken, token].filter((secret) => secret !== null),
-    [gitHubToken, token],
+    () => [gitHubToken].filter((secret) => secret !== null),
+    [gitHubToken],
  );

  const Terminal = React.useMemo(
@@ -64,13 +69,7 @@ function App() {
  } = useDisclosure();

  return (
-    <WsClientProvider
-      enabled
-      token={token}
-      ghToken={gitHubToken}
-      selectedRepository={selectedRepository}
-      settings={settings}
-    >
+    <WsClientProvider ghToken={gitHubToken} conversationId={conversationId}>
      <EventHandler>
        <div className="flex flex-col h-full gap-3">
          <div className="flex h-full overflow-auto gap-3">
@@ -131,4 +130,12 @@ function App() {
  );
 }

+function App() {
+  return (
+    <ConversationProvider>
+      <AppContent />
+    </ConversationProvider>
+  );
+}
+
 export default App;
--- a/frontend/src/routes/_oh/route.tsx
+++ b/frontend/src/routes/_oh/route.tsx
@@ -44,7 +44,7 @@ export function ErrorBoundary() {
 }

 export default function MainApp() {
-  const { gitHubToken, clearToken } = useAuth();
+  const { gitHubToken } = useAuth();
  const { settings, settingsAreUpToDate } = useUserPrefs();

  const [consentFormIsOpen, setConsentFormIsOpen] = React.useState(
@@ -55,11 +55,7 @@ export default function MainApp() {
    React.useState(!settingsAreUpToDate);

  const config = useConfig();
-  const {
-    data: isAuthed,
-    isFetched,
-    isFetching: isFetchingAuth,
-  } = useIsAuthed();
+  const { data: isAuthed, isFetching: isFetchingAuth } = useIsAuthed();

  const gitHubAuthUrl = useGitHubAuthUrl({
    gitHubToken,
@@ -67,10 +63,6 @@ export default function MainApp() {
    gitHubClientId: config.data?.GITHUB_CLIENT_ID || null,
  });

-  React.useEffect(() => {
-    if (isFetched && !isAuthed) clearToken();
-  }, [isFetched, isAuthed]);
-
  React.useEffect(() => {
    if (settings.LANGUAGE) {
      i18n.changeLanguage(settings.LANGUAGE);
--- a/frontend/src/services/agent-state-service.ts
+++ b/frontend/src/services/agent-state-service.ts
@@ -1,5 +1,5 @@
 import ActionType from "#/types/action-type";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";

 export const generateAgentStateChangeEvent = (state: AgentState) => ({
  action: ActionType.CHANGE_AGENT_STATE,
--- a/frontend/src/services/auth.ts
+++ b/frontend/src/services/auth.ts
@@ -1,5 +0,0 @@
-const TOKEN_KEY = "token";
-const GITHUB_TOKEN_KEY = "ghToken";
-
-export const getToken = () => localStorage.getItem(TOKEN_KEY);
-export const getGitHubToken = () => localStorage.getItem(GITHUB_TOKEN_KEY);
--- a/frontend/src/services/observations.ts
+++ b/frontend/src/services/observations.ts
@@ -2,7 +2,7 @@ import { setCurrentAgentState } from "#/state/agent-slice";
 import { setUrl, setScreenshotSrc } from "#/state/browser-slice";
 import store from "#/store";
 import { ObservationMessage } from "#/types/message";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";
 import { appendOutput } from "#/state/command-slice";
 import { appendJupyterOutput } from "#/state/jupyter-slice";
 import ObservationType from "#/types/observation-type";
@@ -46,6 +46,9 @@ export function handleObservationMessage(message: ObservationMessage) {
        store.dispatch(addAssistantMessage(message.content));
      }
      break;
+    case ObservationType.READ:
+    case ObservationType.EDIT:
+      break; // We don't display the default message for these observations
    default:
      store.dispatch(addAssistantMessage(message.message));
      break;
@@ -84,6 +87,18 @@ export function handleObservationMessage(message: ObservationMessage) {
          }),
        );
        break;
+      case "read":
+      case "edit":
+        store.dispatch(
+          addAssistantObservation({
+            ...baseObservation,
+            observation,
+            extras: {
+              path: String(message.extras.path || ""),
+            },
+          }),
+        );
+        break;
      case "run_ipython":
        store.dispatch(
          addAssistantObservation({
--- a/frontend/src/state/agent-slice.tsx
+++ b/frontend/src/state/agent-slice.tsx
@@ -1,5 +1,5 @@
 import { createSlice } from "@reduxjs/toolkit";
-import AgentState from "#/types/agent-state";
+import { AgentState } from "#/types/agent-state";

 export const agentSlice = createSlice({
  name: "agent",
--- a/frontend/src/state/chat-slice.ts
+++ b/frontend/src/state/chat-slice.ts
@@ -19,6 +19,7 @@ const HANDLED_ACTIONS: OpenHandsEventType[] = [
  "write",
  "read",
  "browse",
+  "edit",
 ];

 function getRiskText(risk: ActionSecurityRisk) {
@@ -101,8 +102,6 @@ export const chatSlice = createSlice({
          content = `${content.slice(0, MAX_CONTENT_LENGTH)}...`;
        }
        text = `${action.payload.args.path}\n${content}`;
-      } else if (actionID === "read") {
-        text = action.payload.args.path;
      } else if (actionID === "browse") {
        text = `Browsing ${action.payload.args.url}`;
      }
@@ -161,6 +160,9 @@ export const chatSlice = createSlice({
        }
        content = `\`\`\`\n${content}\n\`\`\``;
        causeMessage.content = content; // Observation content includes the action
+      } else if (observationID === "read" || observationID === "edit") {
+        const { content } = observation.payload;
+        causeMessage.content = `\`\`\`${observationID === "edit" ? "diff" : "python"}\n${content}\n\`\`\``; // Content is already truncated by the ACI
      } else if (observationID === "browse") {
        let content = `**URL:** ${observation.payload.extras.url}\n`;
        if (observation.payload.extras.error) {
--- a/frontend/src/types/agent-state.tsx
+++ b/frontend/src/types/agent-state.tsx
@@ -1,4 +1,4 @@
-enum AgentState {
+export enum AgentState {
  LOADING = "loading",
  INIT = "init",
  RUNNING = "running",
@@ -13,4 +13,8 @@ enum AgentState {
  USER_REJECTED = "user_rejected",
 }

-export default AgentState;
+export const RUNTIME_INACTIVE_STATES = [
+  AgentState.LOADING,
+  AgentState.STOPPED,
+  AgentState.ERROR,
+];
--- a/frontend/src/types/core/actions.ts
+++ b/frontend/src/types/core/actions.ts
@@ -104,6 +104,7 @@ export interface FileReadAction extends OpenHandsActionEvent<"read"> {
  args: {
    path: string;
    thought: string;
+    translated_ipython_code: string | null;
  };
 }

@@ -116,6 +117,14 @@ export interface FileWriteAction extends OpenHandsActionEvent<"write"> {
  };
 }

+export interface FileEditAction extends OpenHandsActionEvent<"edit"> {
+  source: "agent";
+  args: {
+    path: string;
+    translated_ipython_code: string;
+  };
+}
+
 export interface RejectAction extends OpenHandsActionEvent<"reject"> {
  source: "agent";
  args: {
@@ -133,6 +142,7 @@ export type OpenHandsAction =
  | BrowseAction
  | BrowseInteractiveAction
  | FileReadAction
+  | FileEditAction
  | FileWriteAction
  | AddTaskAction
  | ModifyTaskAction
--- a/frontend/src/types/core/base.ts
+++ b/frontend/src/types/core/base.ts
@@ -4,6 +4,7 @@ export type OpenHandsEventType =
  | "run"
  | "read"
  | "write"
+  | "edit"
  | "run_ipython"
  | "delegate"
  | "browse"
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
openhands	2959abf4ba	Add missing imports to ActionExecutionClient	2024-12-25 15:54:31 +00:00
openhands	0125a5415f	Move all run_action logic to ActionExecutionClient	2024-12-25 15:52:08 +00:00
openhands	65de07299f	Refactor runtime action execution - Create ActionExecutionClient base class for shared HTTP server interaction logic - Update EventStreamRuntime and RemoteRuntime to inherit from ActionExecutionClient - Remove duplicate code and clean up imports - Update ModalRuntime and RunloopRuntime to use super().__init__()	2024-12-25 15:47:02 +00:00
Robert Brennan	642e962f89	randomize branch names (#5784 )	2024-12-24 15:28:27 -05:00
Robert Brennan	d4e670a3e7	fix latest event id (#5789 )	2024-12-24 18:08:33 +00:00
Robert Brennan	f9cc0bce53	Fix connection check (#5787 )	2024-12-24 16:21:31 +00:00
dependabot[bot]	2c8b1ee136	chore(deps-dev): bump llama-index from 0.12.7 to 0.12.8 in the llama group (#5765 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-24 15:24:36 +00:00
Robert Brennan	31dda63f43	Don't enforce user IDs in oss mode (#5776 )	2024-12-24 06:30:33 -05:00
Boxuan Li	ecff5c67fb	Evaluation README: Add TheAgentCompany (#5777 )	2024-12-24 02:37:42 +00:00
mamoodi	725e71ad22	Update Slack links again (#5773 )	2024-12-23 21:20:08 +00:00
OpenHands	200270ba8f	Fix issue #5752 : Install "jq" by default in OpenHands runtime (#5753 )	2024-12-23 16:16:36 -05:00
Robert Brennan	5bf55a0035	show most recent convo on homepage (#5769 )	2024-12-23 20:04:05 +00:00
Robert Brennan	96329190d1	Session fixes for HA mode (#5766 )	2024-12-23 18:07:56 +00:00
Robert Brennan	faf8b5829c	Fix for dying sessions/runtimes (#5755 )	2024-12-23 16:00:05 +00:00
sp.wack	d62cf7e731	refactor(frontend): Remove test todos and fix light warning (#5554 )	2024-12-23 18:43:36 +04:00
Engel Nyst	4a8bf3d2d0	Fix not initialized response latencies (#5679 )	2024-12-22 16:31:05 -05:00
Robert Brennan	2cfbd26df7	Fixes for VS Code Button (#5754 )	2024-12-22 16:27:30 -05:00
tofarr	b51dd3bc75	Fix stack trace in logs (#5751 )	2024-12-22 14:51:22 -05:00
Boxuan Li	b1719bb3db	Add TheAgentCompany evaluation harness (#5731 )	2024-12-22 14:12:30 -05:00
Rohit Malhotra	ee5f49afc1	[Bug]: Missing path import (#5747 )	2024-12-22 15:58:17 +00:00
Rohit Malhotra	7fe692a7bd	Revert "[Resolver]: Add target branch param" (#5743 )	2024-12-22 01:28:23 +00:00
OpenHands	21948fa81b	Fix issue #5735 : [Bug]: Inconsistent command line arguments in evaluation directory (#5736 )	2024-12-22 04:41:39 +08:00
Robert Brennan	d646b2089d	Fix several async lockups (#5734 )	2024-12-21 19:07:31 +00:00
Robert Brennan	f54d953fe1	Fix unclosed github client (#5733 )	2024-12-21 13:51:37 -05:00
Rohit Malhotra	4e7af78b39	Fix missing closing brace in openhands-resolver.yml (#5729 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-12-21 15:22:41 +00:00
Rohit Malhotra	252c70984c	[Resolver]: Rename success_explanation to result_explanation for better clarity (#5724 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-12-21 01:31:05 +00:00
Rohit Malhotra	5ea096e95b	[Resolver]: Add target branch param (#5642 )	2024-12-21 00:33:45 +00:00
Robert Brennan	a01fb9dca3	Fixes for listing files, clean up references to tokens (#5718 )	2024-12-20 23:13:14 +00:00
Rohit Malhotra	51af29208f	[Resolver]: Indicating more informative failures (#5685 )	2024-12-20 17:22:24 -05:00
mamoodi	e77f435901	Add note about custom configurations (#5721 )	2024-12-20 17:20:11 -05:00
mamoodi	5fb0eec61e	Fix resolver workflow and update docs (#5713 )	2024-12-20 15:59:13 -05:00
Rohit Malhotra	4af84a29dc	Adding more resilience to refresh token logic (#5704 )	2024-12-20 14:37:04 -05:00
Ryan H. Tran	7a0488c012	Use more specific action types for openhands-aci commands (#5508 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-12-21 02:48:27 +08:00
Xingyao Wang	581d5ec7a8	feat(eval): increase resource factor for remote runtime when previous run failed due to resource (#5709 )	2024-12-21 01:47:06 +08:00
Xingyao Wang	cfbe77b367	fix: only register atexit when EventStreamRuntime is initialized (#5712 )	2024-12-20 16:29:45 +00:00
sp.wack	3236602919	fix(frontend): Create a conversation without a query (#5711 )	2024-12-20 16:24:30 +00:00
dependabot[bot]	aa2f34a1f5	chore(deps-dev): bump llama-index from 0.12.6 to 0.12.7 in the llama group (#5708 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-20 17:16:32 +01:00
Robert Brennan	73c38f1163	refactor: move session initialization from WebSocket to REST API (#5493 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-12-20 15:50:09 +00:00
dependabot[bot]	0dd919bacf	Bump prism-react-renderer from 2.4.0 to 2.4.1 in /docs in the version-all group (#5668 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-20 16:43:12 +04:00
d-walsh	5ad361623d	feat: add support for custom PR titles (#5706 ) Co-authored-by: David Walsh <walsha@gmail.com>	2024-12-20 04:00:00 +00:00
Xingyao Wang	c333938384	feat(eval): add standard error to swebench summarize outputs (#5700 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-12-20 08:39:43 +08:00
tofarr	ebf3bf606a	Settings store type is defined in openhands_config rather than main config (#5701 )	2024-12-19 12:44:35 -07:00
dependabot[bot]	c2293ad1dd	Bump the version-all group across 1 directory with 13 updates (#5699 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-19 20:08:22 +01:00
mamoodi	6f7d054385	Add examples for filesystem use (#5697 )	2024-12-19 13:13:09 -05:00
Xingyao Wang	e9cafb0372	chore: Cleanup runtime exception handling (#5696 )	2024-12-19 17:28:29 +00:00
mamoodi	13097f9d1d	Release 0.16.1 (#5693 )	2024-12-19 11:13:26 -05:00
OpenHands	2a66439ca6	Fix issue #5676 : [Bug]: Frontend Hyperlink in Chat window should open link in a new tab (#5677 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-12-19 14:39:00 +00:00