-
-
-
## 🎯 Mission
-[Project Demo Video](https://github.com/OpenDevin/OpenDevin/assets/38853559/71a472cc-df34-430c-8b1d-4d7286c807c9)
-
Welcome to OpenDevin, an open-source project aiming to replicate Devin, an autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects. This project aspires to replicate, enhance, and innovate upon Devin through the power of the open-source community.
-
-
-## 🤔 What is Devin?
-
-Devin represents a cutting-edge autonomous agent designed to navigate the complexities of software engineering. It leverages a combination of tools such as a shell, code editor, and web browser, showcasing the untapped potential of LLMs in software development. Our goal is to explore and expand upon Devin's capabilities, identifying both its strengths and areas for improvement, to guide the progress of open code models.
-
-
-
-## 🐚 Why OpenDevin?
-
-The OpenDevin project is born out of a desire to replicate, enhance, and innovate beyond the original Devin model. By engaging the open-source community, we aim to tackle the challenges faced by Code LLMs in practical scenarios, producing works that significantly contribute to the community and pave the way for future advancements.
-
-
-
-## 🚧 Project Status
-
-OpenDevin is currently a work in progress, but you can already run the alpha version to see the end-to-end system in action. The project team is actively working on the following key milestones:
-
-- **UI**: Developing a user-friendly interface, including a chat interface, a shell demonstrating commands, and a web browser.
-- **Architecture**: Building a stable agent framework with a robust backend that can read, write, and run simple commands.
-- **Agent Capabilities**: Enhancing the agent's abilities to generate bash scripts, run tests, and perform other software engineering tasks.
-- **Evaluation**: Establishing a minimal evaluation pipeline that is consistent with Devin's evaluation criteria.
-
-After completing the MVP, the team will focus on research in various areas, including foundation models, specialist capabilities, evaluation, and agent studies.
-
-
-
-## ⚠️ Caveats and Warnings
-
-- OpenDevin is still an alpha project. It is changing very quickly and is unstable. We are working on getting a stable release out in the coming weeks.
-- OpenDevin will issue many prompts to the LLM you configure. Most of these LLMs cost money--be sure to set spending limits and monitor usage.
-- OpenDevin runs `bash` commands within a Docker sandbox, so it should not affect your machine. But your workspace directory will be attached to that sandbox, and files in the directory may be modified or deleted.
-- Our default Agent is currently the MonologueAgent, which has limited capabilities, but is fairly stable. We're working on other Agent implementations, including [SWE Agent](https://swe-agent.com/). You can [read about our current set of agents here](./docs/Agents.md).
-
-## 🚀 Get Started
-
-The easiest way to run OpenDevin is inside a Docker container.
-
-To start the app, run these commands, replacing `$(pwd)/workspace` with the path to the code you want OpenDevin to work with.
-
-```bash
-# Your OpenAI API key, or any other LLM API key
-export LLM_API_KEY="sk-..."
-
-# The directory you want OpenDevin to modify. MUST be an absolute path!
-export WORKSPACE_BASE=$(pwd)/workspace
-
-docker run \
- -e LLM_API_KEY \
- -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
- -v $WORKSPACE_BASE:/opt/workspace_base \
- -v /var/run/docker.sock:/var/run/docker.sock \
- -p 3000:3000 \
- --add-host host.docker.internal=host-gateway \
- ghcr.io/opendevin/opendevin:0.4.0
-```
-
-You'll find opendevin running at `http://localhost:3000`.
-
-If you want to use the (unstable!) bleeding edge, you can use `ghcr.io/opendevin/opendevin:main` as the image.
-
-See [Development.md](Development.md) for instructions on running OpenDevin without Docker.
-
-Having trouble? Check out our [Troubleshooting Guide](./docs/guides/Troubleshooting.md).
-
-## 🤖 LLM Backends
-
-OpenDevin can work with any LLM backend.
-For a full list of the LM providers and models available, please consult the
-[litellm documentation](https://docs.litellm.ai/docs/providers).
-
-The `LLM_MODEL` environment variable controls which model is used in programmatic interactions.
-But when using the OpenDevin UI, you'll need to choose your model in the settings window (the gear
-wheel on the bottom left).
-
-The following environment variables might be necessary for some LLMs:
-
-- `LLM_API_KEY`
-- `LLM_BASE_URL`
-- `LLM_EMBEDDING_MODEL`
-- `LLM_EMBEDDING_DEPLOYMENT_NAME`
-- `LLM_API_VERSION`
-
-We have a few guides for running OpenDevin with specific model providers:
-
-- [ollama](./docs/guides/LocalLLMs.md)
-- [Azure](./docs/guides/AzureLLMs.md)
-
-If you're using another provider, we encourage you to open a PR to share your setup!
-
-**Note on Alternative Models:**
-The best models are GPT-4 and Claude 3. Current local and open source models are
-not nearly as powerful. When using an alternative model,
-you may see long wait times between messages,
-poor responses, or errors about malformed JSON. OpenDevin
-can only be as powerful as the models driving it--fortunately folks on our team
-are actively working on building better open source models!
-
-**Note on API retries and rate limits:**
-Some LLMs have rate limits and may require retries. OpenDevin will automatically retry requests if it receives a 429 error or API connection error.
-You can set LLM_NUM_RETRIES, LLM_RETRY_MIN_WAIT, LLM_RETRY_MAX_WAIT environment variables to control the number of retries and the time between retries.
-By default, LLM_NUM_RETRIES is 5 and LLM_RETRY_MIN_WAIT, LLM_RETRY_MAX_WAIT are 3 seconds and respectively 60 seconds.
-
-## ⭐️ Research Strategy
-
-Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves:
-
-1. **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling.
-2. **Specialist Abilities:** Enhancing the effectiveness of core components through data curation, training methods, and more.
-3. **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization.
-4. **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our models.
+To learn more and to use OpenDevin, check out our [documentation](https://opendevin.github.io/OpenDevin/).
+ The generation of the backend architecture diagram is partially automated.
+ The diagram is generated from the type hints in the code using the py2puml
+ tool. The diagram is then manually reviewed, adjusted and exported to PNG
+ and SVG.
+
+ ## Prerequisites
+
+ - Running python environment in which opendevin is executable
+ (according to the instructions in the README.md file in the root of the repository)
+ - [py2puml](https://github.com/lucsorel/py2puml) installed
+
+## Steps
+
+1. Autogenerate the diagram by running the following command from the root of the repository:
+ `py2puml opendevin opendevin > docs/architecture/backend_architecture.puml`
+
+2. Open the generated file in a PlantUML editor, e.g. Visual Studio Code with the PlantUML extension or [PlantText](https://www.planttext.com/)
+
+3. Review the generated PUML and make all necessary adjustments to the diagram (add missing parts, fix mistakes, improve positioning).
+ _py2puml creates the diagram based on the type hints in the code, so missing or incorrect type hints may result in an incomplete or incorrect diagram._
+
+4. Review the diff between the new and the previous diagram and manually check if the changes are correct.
+ _Make sure not to remove parts that were manually added to the diagram in the past and are still relevant._
+
+5. Add the commit hash of the commit that was used to generate the diagram to the diagram footer.
+
+6. Export the diagram as PNG and SVG files and replace the existing diagrams in the `docs/architecture` directory. This can be done with (e.g. [PlantText](https://www.planttext.com/))
+
+
+ Welcome to OpenDevin, an open-source project aiming to replicate
+ Devin, an autonomous AI software engineer who is capable of executing
+ complex engineering tasks and collaborating actively with users on
+ software development projects. This project aspires to replicate,
+ enhance, and innovate upon Devin through the power of the open-source
+ community.
+
+
+
+ );
+}
diff --git a/docs/src/components/Welcome/styles.module.css b/docs/src/components/Welcome/styles.module.css
new file mode 100644
index 0000000000..f3ec5dfc24
--- /dev/null
+++ b/docs/src/components/Welcome/styles.module.css
@@ -0,0 +1,27 @@
+.container {
+ display: flex;
+ flex-direction: column;
+ padding-top: 25px;
+ padding-bottom: 25px;
+ width: 100%;
+}
+
+.innerContainer {
+ padding: 50px;
+ width: 100%;
+ max-width: 1300px;
+ padding-top: 30px;
+ margin: auto;
+ display: flex;
+ align-items: center;
+}
+
+.sidebarImage {
+ max-width: 400px;
+ padding-right: 30px;
+}
+
+.welcomeText {
+ text-align: justify;
+ font-size: larger;
+}
diff --git a/docs/src/css/custom.css b/docs/src/css/custom.css
new file mode 100644
index 0000000000..a9af60ee53
--- /dev/null
+++ b/docs/src/css/custom.css
@@ -0,0 +1,36 @@
+/**
+ * Any CSS included here will be global. The classic template
+ * bundles Infima by default. Infima is a CSS framework designed to
+ * work well for content-centric websites.
+ */
+
+/* You can override the default Infima variables here. */
+:root {
+ --ifm-color-primary: #4465db;
+ --ifm-code-font-size: 95%;
+ --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.1);
+ --secondary: #171717;
+ --secondary-dark: #0a0a0a;
+ --secondary-light: #737373;
+}
+
+/* For readability concerns, you should choose a lighter palette in dark mode. */
+[data-theme="dark"] {
+ --ifm-color-primary: #4465db;
+ --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3);
+ --secondary: #737373;
+ --secondary-dark: #171717;
+ --secondary-light: #d4d4d4;
+}
+
+.footer--dark {
+ background-image: linear-gradient(
+ 140deg,
+ var(--secondary) 20%,
+ var(--secondary-light) 100%
+ );
+}
+
+.a {
+ text-decoration: underline;
+}
diff --git a/docs/src/pages/faq.tsx b/docs/src/pages/faq.tsx
new file mode 100644
index 0000000000..4745ec5bb1
--- /dev/null
+++ b/docs/src/pages/faq.tsx
@@ -0,0 +1,56 @@
+import Layout from "@theme/Layout";
+
+export default function FAQ() {
+ return (
+
+
+
Frequently Asked Questions
+
Support
+
How can I report an issue with OpenDevin?
+
+ Please file a bug on{" "}
+ GitHub if
+ you notice a problem that likely affects others.
+ If you're having trouble installing, or have general questions, reach out on{" "}
+ Discord or{" "}
+ Slack.
+
+
General
+
What is Devin?
+
+ Devin{" "}
+ represents a cutting-edge autonomous agent designed to navigate the
+ complexities of software engineering. It leverages a combination of
+ tools such as a shell, code editor, and web browser, showcasing the
+ untapped potential of LLMs in software development. Our goal is to
+ explore and expand upon Devin's capabilities, identifying both its
+ strengths and areas for improvement, to guide the progress of open
+ code models.
+
+
Why OpenDevin?
+
+ The{" "}
+
+ OpenDevin
+ {" "}
+ project is born out of a desire to replicate, enhance, and innovate
+ beyond the original Devin model. By engaging the{" "}
+
+ open-source community
+
+ , we aim to tackle the challenges faced by Code LLMs in practical
+ scenarios, producing works that significantly contribute to the
+ community and pave the way for future advancements.
+
+
+
+ );
+}
diff --git a/docs/src/pages/index.module.css b/docs/src/pages/index.module.css
new file mode 100644
index 0000000000..9f71a5da77
--- /dev/null
+++ b/docs/src/pages/index.module.css
@@ -0,0 +1,23 @@
+/**
+ * CSS files with the .module.css suffix will be treated as CSS modules
+ * and scoped locally.
+ */
+
+.heroBanner {
+ padding: 4rem 0;
+ text-align: center;
+ position: relative;
+ overflow: hidden;
+}
+
+@media screen and (max-width: 996px) {
+ .heroBanner {
+ padding: 2rem;
+ }
+}
+
+.buttons {
+ display: flex;
+ align-items: center;
+ justify-content: center;
+}
diff --git a/docs/src/pages/index.tsx b/docs/src/pages/index.tsx
new file mode 100644
index 0000000000..86ce7e6783
--- /dev/null
+++ b/docs/src/pages/index.tsx
@@ -0,0 +1,33 @@
+import useDocusaurusContext from "@docusaurus/useDocusaurusContext";
+import Layout from "@theme/Layout";
+
+import { Code } from "../components/Code/Code";
+import { HomepageHeader } from "../components/HomepageHeader/HomepageHeader";
+import { Welcome } from "../components/Welcome/Welcome";
+
+export function Header({ title, summary, description }): JSX.Element {
+ return (
+
'
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.
+
+ASSISTANT:
+The file has been updated. Let me run the Python file again with the new changes:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 126
+
+ASSISTANT:
+The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!
+
+--- END OF EXAMPLE ---
+
+
+NOW, LET'S START!
+
+Write a shell script 'hello.sh' that prints 'hello'.
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_002.log
new file mode 100644
index 0000000000..5d18551a1b
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_002.log
@@ -0,0 +1,229 @@
+
+
+----------
+
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "" tag, for example:
+
+print("Hello World!")
+
+The assistant can execute bash commands on behalf of the user by wrapping them with and .
+For example, you can list the files in the current directory by ls .
+The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
+The assistant can install Python packages through bash by pip install [package needed] and should always import packages and define variables before starting to use them.
+The assistant should stop and provide an answer when they have already obtained the answer from the execution result.
+
+
+Apart from the standard bash commands, you can also use the following special commands in environment:
+open [] - opens the file at the given path in the editor. If line_number is provided, the window will be move to include that line
+goto - moves the window to show
+scroll_down - moves the window down {WINDOW} lines
+scroll_up - moves the window down {WINDOW} lines
+create - creates and opens a new file with the given name
+search_dir [] - searches for search_term in all files in dir. If dir is not provided, searches in the current directory
+search_file [] - searches for search_term in file. If file is not provided, searches in the current open file
+find_file [] - finds all files with the given name in dir. If dir is not provided, searches in the current directory
+edit :
+
+end_of_edit - replaces lines through (inclusive) with the given text in the open file. The replacement text is terminated by a line with only end_of_edit on it. All of the will be entered, so make sure your indentation is formatted properly. Python files will be checked for syntax errors after the edit. If the system detects a syntax error, the edit will not be executed. Simply try to edit the file again, but make sure to read the error message and modify the edit command you issue accordingly. Issuing the same command a second time will just lead to the same error message again.
+Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION. If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
+
+The assistant's response should be concise, but do express their thoughts.
+Try to include one of or in each of your responses, unless it is a direct answer to a question OR a message to the user.
+IMPORTANT: Whenever possible, execute the code for the user using or instead of providing it.
+
+
+----------
+
+Here is an example of how you can interact with the environment for task solving:
+
+--- START OF EXAMPLE ---
+
+USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
+
+ASSISTANT:
+Sure! Let me write the Python code for starting a web server and save it to a file `app.py`:
+
+CODE='''
+from flask import Flask
+app = Flask(__name__)
+
+@app.route('/')
+def index():
+ numbers = list(range(1, 11))
+ return str(numbers)
+
+if __name__ == '__main__':
+ app.run(port=5000)
+'''
+with open('app.py', 'w') as f:
+ f.write(CODE)
+
+
+USER:
+Observation:
+[Code executed successfully with no output]
+
+ASSISTANT:
+I have created a Python file `app.py` that will display a list of numbers from 1 to 10 when you run it. Let me run the Python file for you:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 121[1]+ Exit 1 python3 app.py > server.log 2>&1
+
+ASSISTANT: Looks like the server is running with PID 121 then crashed. Let me check the server log:
+
+cat server.log
+
+
+USER:
+Observation:
+Traceback (most recent call last):
+ File "/workspace/app.py", line 2, in
+ from flask import Flask
+ModuleNotFoundError: No module named 'flask'
+
+ASSISTANT:
+It seems that Flask is not installed. Let me install Flask for you:
+
+pip install flask
+
+
+USER:
+Observation:
+Defaulting to user installation because normal site-packages is not writeable
+Collecting flask
+ Using cached flask-3.0.3-py3-none-any.whl (101 kB)
+Collecting blinker>=1.6.2
+ Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
+Collecting Werkzeug>=3.0.0
+ Using cached werkzeug-3.0.2-py3-none-any.whl (226 kB)
+Collecting click>=8.1.3
+ Using cached click-8.1.7-py3-none-any.whl (97 kB)
+Collecting itsdangerous>=2.1.2
+ Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
+Requirement already satisfied: Jinja2>=3.1.2 in /home/opendevin/.local/lib/python3.10/site-packages (from flask) (3.1.3)
+Requirement already satisfied: MarkupSafe>=2.0 in /home/opendevin/.local/lib/python3.10/site-packages (from Jinja2>=3.1.2->flask) (2.1.5)
+Installing collected packages: Werkzeug, itsdangerous, click, blinker, flask
+Successfully installed Werkzeug-3.0.2 blinker-1.7.0 click-8.1.7 flask-3.0.3 itsdangerous-2.2.0
+
+ASSISTANT:
+Now that Flask is installed, let me run the Python file again:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 124
+
+ASSISTANT:
+Let me check the server log again:
+
+cat server.log
+
+
+USER:
+Observation:
+* Serving Flask app 'app'
+ * Debug mode: off
+WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running on http://127.0.0.1:5000
+Press CTRL+C to quit
+
+ASSISTANT:
+The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
+
+USER: Now kill the server, make it display the numbers in a table format.
+
+ASSISTANT:
+Sure! Let me stop the server first:
+
+kill 124
+
+
+USER:
+Observation:
+[1]+ Terminated python3 app.py > server.log 2>&1
+
+ASSISTANT:
+The server has been stopped. Let me open the Python file and modify it.
+
+open app.py
+
+
+USER:
+[File: /workspace/app.py (11 lines total)]
+1:
+2:from flask import Flask
+3:app = Flask(__name__)
+4:
+5:@app.route('/')
+6:def index():
+7: numbers = list(range(1, 11))
+8: return str(numbers)
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+
+ASSISTANT:
+I should edit the file to display the numbers in a table format. Let me do that for you:
+
+
+edit 8:8 <' + ''.join([f'
'
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.
+
+ASSISTANT:
+The file has been updated. Let me run the Python file again with the new changes:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 126
+
+ASSISTANT:
+The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!
+
+--- END OF EXAMPLE ---
+
+
+NOW, LET'S START!
+
+Write a shell script 'hello.sh' that prints 'hello'.
+
+----------
+
+Sure, I'll create a shell script named `hello.sh` that prints "hello". Let's start by writing the script.
+
+
+echo "echo 'hello'" > hello.sh
+
+
+----------
+
+OBSERVATION:
+
+[Command -1 finished with exit code 0]]
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_003.log
new file mode 100644
index 0000000000..b97024f588
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_003.log
@@ -0,0 +1,237 @@
+
+
+----------
+
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "" tag, for example:
+
+print("Hello World!")
+
+The assistant can execute bash commands on behalf of the user by wrapping them with and .
+For example, you can list the files in the current directory by ls .
+The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
+The assistant can install Python packages through bash by pip install [package needed] and should always import packages and define variables before starting to use them.
+The assistant should stop and provide an answer when they have already obtained the answer from the execution result.
+
+
+Apart from the standard bash commands, you can also use the following special commands in environment:
+open [] - opens the file at the given path in the editor. If line_number is provided, the window will be move to include that line
+goto - moves the window to show
+scroll_down - moves the window down {WINDOW} lines
+scroll_up - moves the window down {WINDOW} lines
+create - creates and opens a new file with the given name
+search_dir [] - searches for search_term in all files in dir. If dir is not provided, searches in the current directory
+search_file [] - searches for search_term in file. If file is not provided, searches in the current open file
+find_file [] - finds all files with the given name in dir. If dir is not provided, searches in the current directory
+edit :
+
+end_of_edit - replaces lines through (inclusive) with the given text in the open file. The replacement text is terminated by a line with only end_of_edit on it. All of the will be entered, so make sure your indentation is formatted properly. Python files will be checked for syntax errors after the edit. If the system detects a syntax error, the edit will not be executed. Simply try to edit the file again, but make sure to read the error message and modify the edit command you issue accordingly. Issuing the same command a second time will just lead to the same error message again.
+Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION. If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
+
+The assistant's response should be concise, but do express their thoughts.
+Try to include one of or in each of your responses, unless it is a direct answer to a question OR a message to the user.
+IMPORTANT: Whenever possible, execute the code for the user using or instead of providing it.
+
+
+----------
+
+Here is an example of how you can interact with the environment for task solving:
+
+--- START OF EXAMPLE ---
+
+USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
+
+ASSISTANT:
+Sure! Let me write the Python code for starting a web server and save it to a file `app.py`:
+
+CODE='''
+from flask import Flask
+app = Flask(__name__)
+
+@app.route('/')
+def index():
+ numbers = list(range(1, 11))
+ return str(numbers)
+
+if __name__ == '__main__':
+ app.run(port=5000)
+'''
+with open('app.py', 'w') as f:
+ f.write(CODE)
+
+
+USER:
+Observation:
+[Code executed successfully with no output]
+
+ASSISTANT:
+I have created a Python file `app.py` that will display a list of numbers from 1 to 10 when you run it. Let me run the Python file for you:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 121[1]+ Exit 1 python3 app.py > server.log 2>&1
+
+ASSISTANT: Looks like the server is running with PID 121 then crashed. Let me check the server log:
+
+cat server.log
+
+
+USER:
+Observation:
+Traceback (most recent call last):
+ File "/workspace/app.py", line 2, in
+ from flask import Flask
+ModuleNotFoundError: No module named 'flask'
+
+ASSISTANT:
+It seems that Flask is not installed. Let me install Flask for you:
+
+pip install flask
+
+
+USER:
+Observation:
+Defaulting to user installation because normal site-packages is not writeable
+Collecting flask
+ Using cached flask-3.0.3-py3-none-any.whl (101 kB)
+Collecting blinker>=1.6.2
+ Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
+Collecting Werkzeug>=3.0.0
+ Using cached werkzeug-3.0.2-py3-none-any.whl (226 kB)
+Collecting click>=8.1.3
+ Using cached click-8.1.7-py3-none-any.whl (97 kB)
+Collecting itsdangerous>=2.1.2
+ Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
+Requirement already satisfied: Jinja2>=3.1.2 in /home/opendevin/.local/lib/python3.10/site-packages (from flask) (3.1.3)
+Requirement already satisfied: MarkupSafe>=2.0 in /home/opendevin/.local/lib/python3.10/site-packages (from Jinja2>=3.1.2->flask) (2.1.5)
+Installing collected packages: Werkzeug, itsdangerous, click, blinker, flask
+Successfully installed Werkzeug-3.0.2 blinker-1.7.0 click-8.1.7 flask-3.0.3 itsdangerous-2.2.0
+
+ASSISTANT:
+Now that Flask is installed, let me run the Python file again:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 124
+
+ASSISTANT:
+Let me check the server log again:
+
+cat server.log
+
+
+USER:
+Observation:
+* Serving Flask app 'app'
+ * Debug mode: off
+WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running on http://127.0.0.1:5000
+Press CTRL+C to quit
+
+ASSISTANT:
+The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
+
+USER: Now kill the server, make it display the numbers in a table format.
+
+ASSISTANT:
+Sure! Let me stop the server first:
+
+kill 124
+
+
+USER:
+Observation:
+[1]+ Terminated python3 app.py > server.log 2>&1
+
+ASSISTANT:
+The server has been stopped. Let me open the Python file and modify it.
+
+open app.py
+
+
+USER:
+[File: /workspace/app.py (11 lines total)]
+1:
+2:from flask import Flask
+3:app = Flask(__name__)
+4:
+5:@app.route('/')
+6:def index():
+7: numbers = list(range(1, 11))
+8: return str(numbers)
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+
+ASSISTANT:
+I should edit the file to display the numbers in a table format. Let me do that for you:
+
+
+edit 8:8 <' + ''.join([f'
'
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.
+
+ASSISTANT:
+The file has been updated. Let me run the Python file again with the new changes:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 126
+
+ASSISTANT:
+The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!
+
+--- END OF EXAMPLE ---
+
+
+NOW, LET'S START!
+
+Write a shell script 'hello.sh' that prints 'hello'.
+
+----------
+
+Sure, I'll create a shell script named `hello.sh` that prints "hello". Let's start by writing the script.
+
+
+echo "echo 'hello'" > hello.sh
+
+
+----------
+
+OBSERVATION:
+
+[Command -1 finished with exit code 0]]
+
+----------
+
+The shell script `hello.sh` has been successfully created. If you'd like, I can run the script to show you the output. Would you like me to proceed with executing the script?
+
+----------
+
+can you run it to make sure it is working?
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_004.log
new file mode 100644
index 0000000000..0d25276e4d
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/prompt_004.log
@@ -0,0 +1,251 @@
+
+
+----------
+
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "" tag, for example:
+
+print("Hello World!")
+
+The assistant can execute bash commands on behalf of the user by wrapping them with and .
+For example, you can list the files in the current directory by ls .
+The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
+The assistant can install Python packages through bash by pip install [package needed] and should always import packages and define variables before starting to use them.
+The assistant should stop and provide an answer when they have already obtained the answer from the execution result.
+
+
+Apart from the standard bash commands, you can also use the following special commands in environment:
+open [] - opens the file at the given path in the editor. If line_number is provided, the window will be move to include that line
+goto - moves the window to show
+scroll_down - moves the window down {WINDOW} lines
+scroll_up - moves the window down {WINDOW} lines
+create - creates and opens a new file with the given name
+search_dir [] - searches for search_term in all files in dir. If dir is not provided, searches in the current directory
+search_file [] - searches for search_term in file. If file is not provided, searches in the current open file
+find_file [] - finds all files with the given name in dir. If dir is not provided, searches in the current directory
+edit :
+
+end_of_edit - replaces lines through (inclusive) with the given text in the open file. The replacement text is terminated by a line with only end_of_edit on it. All of the will be entered, so make sure your indentation is formatted properly. Python files will be checked for syntax errors after the edit. If the system detects a syntax error, the edit will not be executed. Simply try to edit the file again, but make sure to read the error message and modify the edit command you issue accordingly. Issuing the same command a second time will just lead to the same error message again.
+Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION. If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
+
+The assistant's response should be concise, but do express their thoughts.
+Try to include one of or in each of your responses, unless it is a direct answer to a question OR a message to the user.
+IMPORTANT: Whenever possible, execute the code for the user using or instead of providing it.
+
+
+----------
+
+Here is an example of how you can interact with the environment for task solving:
+
+--- START OF EXAMPLE ---
+
+USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
+
+ASSISTANT:
+Sure! Let me write the Python code for starting a web server and save it to a file `app.py`:
+
+CODE='''
+from flask import Flask
+app = Flask(__name__)
+
+@app.route('/')
+def index():
+ numbers = list(range(1, 11))
+ return str(numbers)
+
+if __name__ == '__main__':
+ app.run(port=5000)
+'''
+with open('app.py', 'w') as f:
+ f.write(CODE)
+
+
+USER:
+Observation:
+[Code executed successfully with no output]
+
+ASSISTANT:
+I have created a Python file `app.py` that will display a list of numbers from 1 to 10 when you run it. Let me run the Python file for you:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 121[1]+ Exit 1 python3 app.py > server.log 2>&1
+
+ASSISTANT: Looks like the server is running with PID 121 then crashed. Let me check the server log:
+
+cat server.log
+
+
+USER:
+Observation:
+Traceback (most recent call last):
+ File "/workspace/app.py", line 2, in
+ from flask import Flask
+ModuleNotFoundError: No module named 'flask'
+
+ASSISTANT:
+It seems that Flask is not installed. Let me install Flask for you:
+
+pip install flask
+
+
+USER:
+Observation:
+Defaulting to user installation because normal site-packages is not writeable
+Collecting flask
+ Using cached flask-3.0.3-py3-none-any.whl (101 kB)
+Collecting blinker>=1.6.2
+ Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
+Collecting Werkzeug>=3.0.0
+ Using cached werkzeug-3.0.2-py3-none-any.whl (226 kB)
+Collecting click>=8.1.3
+ Using cached click-8.1.7-py3-none-any.whl (97 kB)
+Collecting itsdangerous>=2.1.2
+ Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
+Requirement already satisfied: Jinja2>=3.1.2 in /home/opendevin/.local/lib/python3.10/site-packages (from flask) (3.1.3)
+Requirement already satisfied: MarkupSafe>=2.0 in /home/opendevin/.local/lib/python3.10/site-packages (from Jinja2>=3.1.2->flask) (2.1.5)
+Installing collected packages: Werkzeug, itsdangerous, click, blinker, flask
+Successfully installed Werkzeug-3.0.2 blinker-1.7.0 click-8.1.7 flask-3.0.3 itsdangerous-2.2.0
+
+ASSISTANT:
+Now that Flask is installed, let me run the Python file again:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 124
+
+ASSISTANT:
+Let me check the server log again:
+
+cat server.log
+
+
+USER:
+Observation:
+* Serving Flask app 'app'
+ * Debug mode: off
+WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running on http://127.0.0.1:5000
+Press CTRL+C to quit
+
+ASSISTANT:
+The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
+
+USER: Now kill the server, make it display the numbers in a table format.
+
+ASSISTANT:
+Sure! Let me stop the server first:
+
+kill 124
+
+
+USER:
+Observation:
+[1]+ Terminated python3 app.py > server.log 2>&1
+
+ASSISTANT:
+The server has been stopped. Let me open the Python file and modify it.
+
+open app.py
+
+
+USER:
+[File: /workspace/app.py (11 lines total)]
+1:
+2:from flask import Flask
+3:app = Flask(__name__)
+4:
+5:@app.route('/')
+6:def index():
+7: numbers = list(range(1, 11))
+8: return str(numbers)
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+
+ASSISTANT:
+I should edit the file to display the numbers in a table format. Let me do that for you:
+
+
+edit 8:8 <' + ''.join([f'
'
+9:
+10:if __name__ == '__main__':
+11: app.run(port=5000)
+File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.
+
+ASSISTANT:
+The file has been updated. Let me run the Python file again with the new changes:
+
+python3 app.py > server.log 2>&1 &
+
+
+USER:
+Observation:
+[1] 126
+
+ASSISTANT:
+The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!
+
+--- END OF EXAMPLE ---
+
+
+NOW, LET'S START!
+
+Write a shell script 'hello.sh' that prints 'hello'.
+
+----------
+
+Sure, I'll create a shell script named `hello.sh` that prints "hello". Let's start by writing the script.
+
+
+echo "echo 'hello'" > hello.sh
+
+
+----------
+
+OBSERVATION:
+
+[Command -1 finished with exit code 0]]
+
+----------
+
+The shell script `hello.sh` has been successfully created. If you'd like, I can run the script to show you the output. Would you like me to proceed with executing the script?
+
+----------
+
+can you run it to make sure it is working?
+
+----------
+
+Certainly! Let's run the `hello.sh` script to ensure it's working as expected.
+
+
+bash hello.sh
+
+
+----------
+
+OBSERVATION:
+hello
+[Command -1 finished with exit code 0]]
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/response_001.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_001.log
new file mode 100644
index 0000000000..d37476a5d2
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_001.log
@@ -0,0 +1,4 @@
+Sure, I'll create a shell script named `hello.sh` that prints "hello". Let's start by writing the script.
+
+
+echo "echo 'hello'" > hello.sh
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/response_002.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_002.log
new file mode 100644
index 0000000000..1d10fb6f51
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_002.log
@@ -0,0 +1 @@
+The shell script `hello.sh` has been successfully created. If you'd like, I can run the script to show you the output. Would you like me to proceed with executing the script?
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/response_003.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_003.log
new file mode 100644
index 0000000000..6d0b936aae
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_003.log
@@ -0,0 +1,4 @@
+Certainly! Let's run the `hello.sh` script to ensure it's working as expected.
+
+
+bash hello.sh
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/response_004.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_004.log
new file mode 100644
index 0000000000..6961909c08
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/response_004.log
@@ -0,0 +1 @@
+The script `hello.sh` is working correctly and prints "hello" as expected. If you have any more tasks or need further modifications, feel free to let me know!
diff --git a/tests/integration/mock/CodeActAgent/test_write_simple_script/user_responses.log b/tests/integration/mock/CodeActAgent/test_write_simple_script/user_responses.log
new file mode 100644
index 0000000000..9e92a50a7d
--- /dev/null
+++ b/tests/integration/mock/CodeActAgent/test_write_simple_script/user_responses.log
@@ -0,0 +1,2 @@
+can you run it to make sure it is working?
+/exit
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_001.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_001.log
index 14a75d24e9..00848e7077 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_001.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_001.log
@@ -92,7 +92,8 @@ This is your internal monologue, in JSON format:
{
"action": "recall",
"args": {
- "query": "what it is I want to do"
+ "query": "what it is I want to do",
+ "thought": ""
}
},
{
@@ -119,7 +120,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "echo \"hello world\"",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -144,7 +146,7 @@ This is your internal monologue, in JSON format:
"content": "echo \"console.log('hello world')\"",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -163,7 +165,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "node test.js",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -193,7 +196,7 @@ This is your internal monologue, in JSON format:
"path": "test.js",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -224,7 +227,8 @@ This is your internal monologue, in JSON format:
{
"action": "browse",
"args": {
- "url": "google.com"
+ "url": "google.com",
+ "thought": ""
}
},
{
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_002.log
index 1b29542bdb..b95b7275b1 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_002.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_002.log
@@ -92,7 +92,8 @@ This is your internal monologue, in JSON format:
{
"action": "recall",
"args": {
- "query": "what it is I want to do"
+ "query": "what it is I want to do",
+ "thought": ""
}
},
{
@@ -119,7 +120,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "echo \"hello world\"",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -144,7 +146,7 @@ This is your internal monologue, in JSON format:
"content": "echo \"console.log('hello world')\"",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -163,7 +165,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "node test.js",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -193,7 +196,7 @@ This is your internal monologue, in JSON format:
"path": "test.js",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -224,7 +227,8 @@ This is your internal monologue, in JSON format:
{
"action": "browse",
"args": {
- "url": "google.com"
+ "url": "google.com",
+ "thought": ""
}
},
{
@@ -317,7 +321,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "ls",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_003.log
index 3b2f94b6dd..1f75f642ad 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_003.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_003.log
@@ -92,7 +92,8 @@ This is your internal monologue, in JSON format:
{
"action": "recall",
"args": {
- "query": "what it is I want to do"
+ "query": "what it is I want to do",
+ "thought": ""
}
},
{
@@ -119,7 +120,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "echo \"hello world\"",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -144,7 +146,7 @@ This is your internal monologue, in JSON format:
"content": "echo \"console.log('hello world')\"",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -163,7 +165,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "node test.js",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -193,7 +196,7 @@ This is your internal monologue, in JSON format:
"path": "test.js",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -224,7 +227,8 @@ This is your internal monologue, in JSON format:
{
"action": "browse",
"args": {
- "url": "google.com"
+ "url": "google.com",
+ "thought": ""
}
},
{
@@ -317,7 +321,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "ls",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -332,7 +337,7 @@ This is your internal monologue, in JSON format:
{
"action": "think",
"args": {
- "thought": "Before I create the 'hello.sh' file, I need to think about the content. It should contain a shebang line to specify the shell interpreter and the command to print 'hello'."
+ "thought": "Since there are no existing files listed by the 'ls' command, I should create the 'hello.sh' script now."
}
},
{
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_004.log
index 644be27820..b37d8b7b85 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_004.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_004.log
@@ -92,7 +92,8 @@ This is your internal monologue, in JSON format:
{
"action": "recall",
"args": {
- "query": "what it is I want to do"
+ "query": "what it is I want to do",
+ "thought": ""
}
},
{
@@ -119,7 +120,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "echo \"hello world\"",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -144,7 +146,7 @@ This is your internal monologue, in JSON format:
"content": "echo \"console.log('hello world')\"",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -163,7 +165,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "node test.js",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -193,7 +196,7 @@ This is your internal monologue, in JSON format:
"path": "test.js",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -224,7 +227,8 @@ This is your internal monologue, in JSON format:
{
"action": "browse",
"args": {
- "url": "google.com"
+ "url": "google.com",
+ "thought": ""
}
},
{
@@ -317,7 +321,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "ls",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -332,7 +337,7 @@ This is your internal monologue, in JSON format:
{
"action": "think",
"args": {
- "thought": "Before I create the 'hello.sh' file, I need to think about the content. It should contain a shebang line to specify the shell interpreter and the command to print 'hello'."
+ "thought": "Since there are no existing files listed by the 'ls' command, I should create the 'hello.sh' script now."
}
},
{
@@ -347,7 +352,7 @@ This is your internal monologue, in JSON format:
"content": "#!/bin/bash\necho 'hello'",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_005.log
index d3b2cb56e8..43f06a3d8a 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_005.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_005.log
@@ -92,7 +92,8 @@ This is your internal monologue, in JSON format:
{
"action": "recall",
"args": {
- "query": "what it is I want to do"
+ "query": "what it is I want to do",
+ "thought": ""
}
},
{
@@ -119,7 +120,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "echo \"hello world\"",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -144,7 +146,7 @@ This is your internal monologue, in JSON format:
"content": "echo \"console.log('hello world')\"",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -163,7 +165,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "node test.js",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -193,7 +196,7 @@ This is your internal monologue, in JSON format:
"path": "test.js",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -224,7 +227,8 @@ This is your internal monologue, in JSON format:
{
"action": "browse",
"args": {
- "url": "google.com"
+ "url": "google.com",
+ "thought": ""
}
},
{
@@ -317,7 +321,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "ls",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
@@ -332,7 +337,7 @@ This is your internal monologue, in JSON format:
{
"action": "think",
"args": {
- "thought": "Before I create the 'hello.sh' file, I need to think about the content. It should contain a shebang line to specify the shell interpreter and the command to print 'hello'."
+ "thought": "Since there are no existing files listed by the 'ls' command, I should create the 'hello.sh' script now."
}
},
{
@@ -347,7 +352,7 @@ This is your internal monologue, in JSON format:
"content": "#!/bin/bash\necho 'hello'",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -361,7 +366,8 @@ This is your internal monologue, in JSON format:
"action": "run",
"args": {
"command": "bash hello.sh",
- "background": false
+ "background": false,
+ "thought": ""
}
},
{
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_006.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_006.log
new file mode 100644
index 0000000000..d29d4c4358
--- /dev/null
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/prompt_006.log
@@ -0,0 +1,403 @@
+
+
+----------
+
+
+You're a thoughtful robot. Your main task is this:
+Write a shell script 'hello.sh' that prints 'hello'.
+
+Don't expand the scope of your task--just complete it as written.
+
+This is your internal monologue, in JSON format:
+
+[
+ {
+ "action": "think",
+ "args": {
+ "thought": "I exist!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Hmm...looks like I can type in a command line prompt"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Looks like I have a web browser too!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Here's what I want to do: Write a shell script 'hello.sh' that prints 'hello'."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "How am I going to get there though?"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "It seems like I have some kind of short term memory."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Each of my thoughts seems to be stored in a JSON array."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "It seems whatever I say next will be added as an object to the list."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "But no one has perfect short-term memory. My list of thoughts will be summarized and condensed over time, losing information in the process."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Fortunately I have long term memory!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I can just perform a recall action, followed by the thing I want to remember. And then related thoughts just spill out!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Sometimes they're random thoughts that don't really have to do with what I wanted to remember. But usually they're exactly what I need!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Let's try it out!"
+ }
+ },
+ {
+ "action": "recall",
+ "args": {
+ "query": "what it is I want to do",
+ "thought": ""
+ }
+ },
+ {
+ "observation": "recall",
+ "content": "Here's what I want to do: Write a shell script 'hello.sh' that prints 'hello'.",
+ "extras": {
+ "memories": [],
+ "role": "assistant"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "How am I going to get there though?"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Neat! And it looks like it's easy for me to use the command line too! I just have to perform a run action and include the command I want to run in the command argument. The command output just jumps into my head!"
+ }
+ },
+ {
+ "action": "run",
+ "args": {
+ "command": "echo \"hello world\"",
+ "background": false,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "run",
+ "content": "hello world",
+ "extras": {
+ "command_id": 0,
+ "command": "",
+ "exit_code": 0
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Cool! I bet I can write files too using the write action."
+ }
+ },
+ {
+ "action": "write",
+ "args": {
+ "path": "test.js",
+ "content": "echo \"console.log('hello world')\"",
+ "start": 0,
+ "end": -1,
+ "thought": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I just created test.js. I'll try and run it now."
+ }
+ },
+ {
+ "action": "run",
+ "args": {
+ "command": "node test.js",
+ "background": false,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "run",
+ "content": "hello world",
+ "extras": {
+ "command_id": 0,
+ "command": "",
+ "exit_code": 0
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "It works!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I'm going to try reading it now using the read action."
+ }
+ },
+ {
+ "action": "read",
+ "args": {
+ "path": "test.js",
+ "start": 0,
+ "end": -1,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "read",
+ "content": "console.log('hello world')",
+ "extras": {
+ "path": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Nice! I can read files too!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "And if I want to use the browser, I just need to use the browse action and include the url I want to visit in the url argument"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Let's try that..."
+ }
+ },
+ {
+ "action": "browse",
+ "args": {
+ "url": "google.com",
+ "thought": ""
+ }
+ },
+ {
+ "observation": "browse",
+ "content": "",
+ "extras": {
+ "url": "",
+ "status_code": 200,
+ "error": false
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I can browse the web too!"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "And once I have completed my task, I can use the finish action to stop working."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "But I should only use the finish action when I'm absolutely certain that I've completed my task and have tested my work."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Very cool. Now to accomplish my task."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I'll need a strategy. And as I make progress, I'll need to keep refining that strategy. I'll need to set goals, and break them into sub-goals."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "In between actions, I must always take some time to think, strategize, and set new goals. I should never take two actions in a row."
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "OK so my task is to Write a shell script 'hello.sh' that prints 'hello'.. I haven't made any progress yet. Where should I start?"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "It seems like there might be an existing project here. I should probably start by running `ls` to see what's here."
+ }
+ },
+ {
+ "action": "run",
+ "args": {
+ "command": "ls",
+ "background": false,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "run",
+ "content": "",
+ "extras": {
+ "command_id": -1,
+ "command": "ls",
+ "exit_code": 0
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Since there are no existing files listed by the 'ls' command, I should create the 'hello.sh' script now."
+ }
+ },
+ {
+ "observation": "null",
+ "content": "",
+ "extras": {}
+ },
+ {
+ "action": "write",
+ "args": {
+ "path": "hello.sh",
+ "content": "#!/bin/bash\necho 'hello'",
+ "start": 0,
+ "end": -1,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "write",
+ "content": "",
+ "extras": {
+ "path": "hello.sh"
+ }
+ },
+ {
+ "action": "run",
+ "args": {
+ "command": "bash hello.sh",
+ "background": false,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "run",
+ "content": "hello",
+ "extras": {
+ "command_id": -1,
+ "command": "bash hello.sh",
+ "exit_code": 0
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "The script 'hello.sh' has been written and tested successfully. It prints 'hello' as required. I should now finalize my task."
+ }
+ },
+ {
+ "observation": "null",
+ "content": "",
+ "extras": {}
+ }
+]
+
+
+Your most recent thought is at the bottom of that monologue. Continue your train of thought.
+What is your next thought or action? Your response must be in JSON format.
+It must be an object, and it must contain two fields:
+* `action`, which is one of the actions below
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+Here are the possible actions:
+* `read` - reads the content of a file. Arguments:
+ * `path` - the path of the file to read
+* `write` - writes the content to a file. Arguments:
+ * `path` - the path of the file to write
+ * `content` - the content to write to the file
+* `run` - runs a command. Arguments:
+ * `command` - the command to run
+ * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+* `kill` - kills a background command
+ * `id` - the ID of the background command to kill
+* `browse` - opens a web page. Arguments:
+ * `url` - the URL to open
+* `recall` - recalls a past memory. Arguments:
+ * `query` - the query to search for
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+ * `thought` - the thought to record
+* `finish` - if you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working.
+
+
+
+You MUST take time to think in between read, write, run, browse, and recall actions.
+You should never act twice in a row without thinking. But if your last several
+actions are all "think" actions, you should consider taking a different action.
+
+Notes:
+* your environment is Debian Linux. You can install software with `apt`
+* your working directory will not change, even if you run `cd`. All commands will be run in the `/workspace` directory.
+* don't run interactive commands, or commands that don't return (e.g. `node server.js`). You may run commands in the background (e.g. `node server.js &`)
+
+What is your next thought or action? Again, you must reply with JSON, and only with JSON.
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_002.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_002.log
index bd27926940..791926155e 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_002.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_002.log
@@ -1,6 +1,6 @@
{
"action": "think",
"args": {
- "thought": "Before I create the 'hello.sh' file, I need to think about the content. It should contain a shebang line to specify the shell interpreter and the command to print 'hello'."
+ "thought": "Since there are no existing files listed by the 'ls' command, I should create the 'hello.sh' script now."
}
}
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_004.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_004.log
index 7055344e27..9cc89f2a5b 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_004.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_004.log
@@ -1,3 +1,4 @@
+```json
{
"action": "run",
"args": {
@@ -5,3 +6,4 @@
"background": false
}
}
+```
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_005.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_005.log
index c6819f9911..18d32603ec 100644
--- a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_005.log
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_005.log
@@ -1,4 +1,8 @@
+```json
{
- "action": "finish",
- "args": {}
+ "action": "think",
+ "args": {
+ "thought": "The script 'hello.sh' has been written and tested successfully. It prints 'hello' as required. I should now finalize my task."
+ }
}
+```
diff --git a/tests/integration/mock/MonologueAgent/test_write_simple_script/response_006.log b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_006.log
new file mode 100644
index 0000000000..c6819f9911
--- /dev/null
+++ b/tests/integration/mock/MonologueAgent/test_write_simple_script/response_006.log
@@ -0,0 +1,4 @@
+{
+ "action": "finish",
+ "args": {}
+}
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log
index 717229bfc3..81f0b95cea 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log
@@ -58,7 +58,8 @@ ten actions--more happened before that.
"action": "modify_task",
"args": {
"id": "0",
- "state": "in_progress"
+ "state": "in_progress",
+ "thought": ""
}
}
]
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log
index 0f11b679c8..a6e7ef7520 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log
@@ -58,24 +58,14 @@ ten actions--more happened before that.
"action": "modify_task",
"args": {
"id": "0",
- "state": "in_progress"
+ "state": "in_progress",
+ "thought": ""
}
},
{
- "action": "write",
+ "action": "think",
"args": {
- "path": "hello.sh",
- "content": "#!/bin/bash\necho 'hello'",
- "start": 0,
- "end": -1,
- "thoughts": ""
- }
- },
- {
- "observation": "write",
- "content": "",
- "extras": {
- "path": "hello.sh"
+ "thought": "I need to write a shell script called 'hello.sh' that contains the necessary command to print 'hello'. This will likely involve using echo or printf to output the text to the console when the script is run. I should start by creating and writing the content to the file."
}
}
]
@@ -119,4 +109,4 @@ actions are all `think` actions, you should consider taking a different action.
What is your next thought or action? Again, you must reply with JSON, and only with JSON.
-You just changed a file. You should think about how it affects your plan.
+Look at your last thought in the history above. What does it suggest? Don't think anymore--take action.
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log
index 7f0d9c655c..e51d7afeca 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log
@@ -58,7 +58,14 @@ ten actions--more happened before that.
"action": "modify_task",
"args": {
"id": "0",
- "state": "in_progress"
+ "state": "in_progress",
+ "thought": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I need to write a shell script called 'hello.sh' that contains the necessary command to print 'hello'. This will likely involve using echo or printf to output the text to the console when the script is run. I should start by creating and writing the content to the file."
}
},
{
@@ -68,7 +75,7 @@ ten actions--more happened before that.
"content": "#!/bin/bash\necho 'hello'",
"start": 0,
"end": -1,
- "thoughts": ""
+ "thought": ""
}
},
{
@@ -77,22 +84,6 @@ ten actions--more happened before that.
"extras": {
"path": "hello.sh"
}
- },
- {
- "action": "run",
- "args": {
- "command": "bash hello.sh",
- "background": false
- }
- },
- {
- "observation": "run",
- "content": "hello",
- "extras": {
- "command_id": -1,
- "command": "bash hello.sh",
- "exit_code": 0
- }
}
]
@@ -135,4 +126,4 @@ actions are all `think` actions, you should consider taking a different action.
What is your next thought or action? Again, you must reply with JSON, and only with JSON.
-You should think about the command you just ran, what output it gave, and how that affects your plan.
+You just changed a file. You should think about how it affects your plan.
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log
new file mode 100644
index 0000000000..2ce5998c09
--- /dev/null
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log
@@ -0,0 +1,135 @@
+
+
+----------
+
+
+# Task
+You're a diligent software engineer AI. You can't see, draw, or interact with a
+browser, but you can read and write files, and you can run commands, and you can think.
+
+You've been given the following task:
+
+Write a shell script 'hello.sh' that prints 'hello'.
+
+## Plan
+As you complete this task, you're building a plan and keeping
+track of your progress. Here's a JSON representation of your plan:
+
+{
+ "id": "0",
+ "goal": "Write a shell script 'hello.sh' that prints 'hello'.",
+ "state": "in_progress",
+ "subtasks": []
+}
+
+
+You're currently working on this task:
+Write a shell script 'hello.sh' that prints 'hello'..
+If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW.
+
+You're responsible for managing this plan and the status of tasks in
+it, by using the `add_task` and `modify_task` actions described below.
+
+If the History below contradicts the state of any of these tasks, you
+MUST modify the task using the `modify_task` action described below.
+
+Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for
+a task that's already represented. Every task must be represented only once.
+
+Tasks that are sequential MUST be siblings. They must be added in order
+to their parent task.
+
+If you mark a task as 'completed', 'verified', or 'abandoned',
+all non-abandoned subtasks will be marked the same way.
+So before closing a task this way, you MUST not only be sure that it has
+been completed successfully--you must ALSO be sure that all its subtasks
+are ready to be marked the same way.
+
+If, and only if, ALL tasks have already been marked verified,
+you MUST respond with the `finish` action.
+
+## History
+Here is a recent history of actions you've taken in service of this plan,
+as well as observations you've made. This only includes the MOST RECENT
+ten actions--more happened before that.
+
+[
+ {
+ "action": "modify_task",
+ "args": {
+ "id": "0",
+ "state": "in_progress",
+ "thought": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I need to write a shell script called 'hello.sh' that contains the necessary command to print 'hello'. This will likely involve using echo or printf to output the text to the console when the script is run. I should start by creating and writing the content to the file."
+ }
+ },
+ {
+ "action": "write",
+ "args": {
+ "path": "hello.sh",
+ "content": "#!/bin/bash\necho 'hello'",
+ "start": 0,
+ "end": -1,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "write",
+ "content": "",
+ "extras": {
+ "path": "hello.sh"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Now that the script 'hello.sh' has been written and contains the appropriate command to print 'hello', the next logical step is to ensure it works as expected. I should proceed to run the script to verify its output."
+ }
+ }
+]
+
+
+Your most recent action is at the bottom of that history.
+
+## Action
+What is your next thought or action? Your response must be in JSON format.
+
+It must be an object, and it must contain two fields:
+* `action`, which is one of the actions below
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+* `read` - reads the content of a file. Arguments:
+ * `path` - the path of the file to read
+* `write` - writes the content to a file. Arguments:
+ * `path` - the path of the file to write
+ * `content` - the content to write to the file
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+ * `command` - the command to run
+ * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+* `kill` - kills a background command
+ * `id` - the ID of the background command to kill
+* `browse` - opens a web page. Arguments:
+ * `url` - the URL to open
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+ * `thought` - the thought to record
+* `add_task` - add a task to your plan. Arguments:
+ * `parent` - the ID of the parent task
+ * `goal` - the goal of the task
+ * `subtasks` - a list of subtasks, each of which is a map with a `goal` key.
+* `modify_task` - close a task. Arguments:
+ * `id` - the ID of the task to close
+ * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now.
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working.
+
+You MUST take time to think in between read, write, run, browse, and recall actions.
+You should never act twice in a row without thinking. But if your last several
+actions are all `think` actions, you should consider taking a different action.
+
+What is your next thought or action? Again, you must reply with JSON, and only with JSON.
+
+Look at your last thought in the history above. What does it suggest? Don't think anymore--take action.
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_006.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_006.log
new file mode 100644
index 0000000000..bbde0f0a3f
--- /dev/null
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_006.log
@@ -0,0 +1,152 @@
+
+
+----------
+
+
+# Task
+You're a diligent software engineer AI. You can't see, draw, or interact with a
+browser, but you can read and write files, and you can run commands, and you can think.
+
+You've been given the following task:
+
+Write a shell script 'hello.sh' that prints 'hello'.
+
+## Plan
+As you complete this task, you're building a plan and keeping
+track of your progress. Here's a JSON representation of your plan:
+
+{
+ "id": "0",
+ "goal": "Write a shell script 'hello.sh' that prints 'hello'.",
+ "state": "in_progress",
+ "subtasks": []
+}
+
+
+You're currently working on this task:
+Write a shell script 'hello.sh' that prints 'hello'..
+If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW.
+
+You're responsible for managing this plan and the status of tasks in
+it, by using the `add_task` and `modify_task` actions described below.
+
+If the History below contradicts the state of any of these tasks, you
+MUST modify the task using the `modify_task` action described below.
+
+Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for
+a task that's already represented. Every task must be represented only once.
+
+Tasks that are sequential MUST be siblings. They must be added in order
+to their parent task.
+
+If you mark a task as 'completed', 'verified', or 'abandoned',
+all non-abandoned subtasks will be marked the same way.
+So before closing a task this way, you MUST not only be sure that it has
+been completed successfully--you must ALSO be sure that all its subtasks
+are ready to be marked the same way.
+
+If, and only if, ALL tasks have already been marked verified,
+you MUST respond with the `finish` action.
+
+## History
+Here is a recent history of actions you've taken in service of this plan,
+as well as observations you've made. This only includes the MOST RECENT
+ten actions--more happened before that.
+
+[
+ {
+ "action": "modify_task",
+ "args": {
+ "id": "0",
+ "state": "in_progress",
+ "thought": ""
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "I need to write a shell script called 'hello.sh' that contains the necessary command to print 'hello'. This will likely involve using echo or printf to output the text to the console when the script is run. I should start by creating and writing the content to the file."
+ }
+ },
+ {
+ "action": "write",
+ "args": {
+ "path": "hello.sh",
+ "content": "#!/bin/bash\necho 'hello'",
+ "start": 0,
+ "end": -1,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "write",
+ "content": "",
+ "extras": {
+ "path": "hello.sh"
+ }
+ },
+ {
+ "action": "think",
+ "args": {
+ "thought": "Now that the script 'hello.sh' has been written and contains the appropriate command to print 'hello', the next logical step is to ensure it works as expected. I should proceed to run the script to verify its output."
+ }
+ },
+ {
+ "action": "run",
+ "args": {
+ "command": "bash hello.sh",
+ "background": false,
+ "thought": ""
+ }
+ },
+ {
+ "observation": "run",
+ "content": "hello",
+ "extras": {
+ "command_id": -1,
+ "command": "bash hello.sh",
+ "exit_code": 0
+ }
+ }
+]
+
+
+Your most recent action is at the bottom of that history.
+
+## Action
+What is your next thought or action? Your response must be in JSON format.
+
+It must be an object, and it must contain two fields:
+* `action`, which is one of the actions below
+* `args`, which is a map of key-value pairs, specifying the arguments for that action
+
+* `read` - reads the content of a file. Arguments:
+ * `path` - the path of the file to read
+* `write` - writes the content to a file. Arguments:
+ * `path` - the path of the file to write
+ * `content` - the content to write to the file
+* `run` - runs a command on the command line in a Linux shell. Arguments:
+ * `command` - the command to run
+ * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
+* `kill` - kills a background command
+ * `id` - the ID of the background command to kill
+* `browse` - opens a web page. Arguments:
+ * `url` - the URL to open
+* `think` - make a plan, set a goal, or record your thoughts. Arguments:
+ * `thought` - the thought to record
+* `add_task` - add a task to your plan. Arguments:
+ * `parent` - the ID of the parent task
+ * `goal` - the goal of the task
+ * `subtasks` - a list of subtasks, each of which is a map with a `goal` key.
+* `modify_task` - close a task. Arguments:
+ * `id` - the ID of the task to close
+ * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now.
+* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working.
+
+You MUST take time to think in between read, write, run, browse, and recall actions.
+You should never act twice in a row without thinking. But if your last several
+actions are all `think` actions, you should consider taking a different action.
+
+What is your next thought or action? Again, you must reply with JSON, and only with JSON.
+
+You should think about the command you just ran, what output it gave, and how that affects your plan.
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log
index f4f1a41bf1..9e31d23c8a 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log
@@ -1,3 +1,4 @@
+```json
{
"action": "modify_task",
"args": {
@@ -5,3 +6,4 @@
"state": "in_progress"
}
}
+```
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log
index 03bf823601..700cd1c18d 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log
@@ -1,9 +1,8 @@
```json
{
- "action": "write",
+ "action": "think",
"args": {
- "path": "hello.sh",
- "content": "#!/bin/bash\necho 'hello'"
+ "thought": "I need to write a shell script called 'hello.sh' that contains the necessary command to print 'hello'. This will likely involve using echo or printf to output the text to the console when the script is run. I should start by creating and writing the content to the file."
}
}
```
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log
index 7055344e27..03bf823601 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log
@@ -1,7 +1,9 @@
+```json
{
- "action": "run",
+ "action": "write",
"args": {
- "command": "bash hello.sh",
- "background": false
+ "path": "hello.sh",
+ "content": "#!/bin/bash\necho 'hello'"
}
}
+```
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log
index 6093c90df7..02ae256680 100644
--- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log
@@ -1,7 +1,6 @@
{
- "action": "modify_task",
+ "action": "think",
"args": {
- "id": "0",
- "state": "verified"
+ "thought": "Now that the script 'hello.sh' has been written and contains the appropriate command to print 'hello', the next logical step is to ensure it works as expected. I should proceed to run the script to verify its output."
}
}
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log
new file mode 100644
index 0000000000..9cc89f2a5b
--- /dev/null
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log
@@ -0,0 +1,9 @@
+```json
+{
+ "action": "run",
+ "args": {
+ "command": "bash hello.sh",
+ "background": false
+ }
+}
+```
diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_006.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_006.log
new file mode 100644
index 0000000000..90dec695c4
--- /dev/null
+++ b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_006.log
@@ -0,0 +1,10 @@
+```json
+{
+ "action": "modify_task",
+ "args": {
+ "id": "0",
+ "state": "verified",
+ "thought": "The shell script 'hello.sh' has been run and verified to successfully output 'hello'. This completes the primary task requirement."
+ }
+}
+```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log
index 0ddf35a113..89793bca08 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log
@@ -66,8 +66,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- To begin solving the task, I need to create a new shell script file named 'hello.sh'. This file should contain a script that prints the message 'hello' when executed. I will start by writing the initial content of this script.
+ thought: Thoughts:
+ To solve the task of writing a shell script named 'hello.sh' that prints 'hello', I need to create a new file named 'hello.sh'. I will then write the necessary shell script code into this file.
Action:
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log
index b3d5b905e3..320adaa09e 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log
@@ -66,8 +66,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- To begin solving the task, I need to create a new shell script file named 'hello.sh'. This file should contain a script that prints the message 'hello' when executed. I will start by writing the initial content of this script.
+ thought: Thoughts:
+ To solve the task of writing a shell script named 'hello.sh' that prints 'hello', I need to create a new file named 'hello.sh'. I will then write the necessary shell script code into this file.
Action:
@@ -84,8 +84,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- Since the file 'hello.sh' has been created, the next step is to write the shell script code that prints "hello". This involves adding a shebang line to specify the shell interpreter and a print command. I will add these lines to the 'hello.sh' file.
+ thought: Thoughts:
+ The task is to write a shell script named 'hello.sh' that prints 'hello'. I have already created the file 'hello.sh' in a previous step. Now, I need to write the shell script code into this file. The script should include the shebang line to specify the shell interpreter and a command to print 'hello'.
Action:
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log
index 444383d84f..f93d78d595 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log
@@ -66,8 +66,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- To begin solving the task, I need to create a new shell script file named 'hello.sh'. This file should contain a script that prints the message 'hello' when executed. I will start by writing the initial content of this script.
+ thought: Thoughts:
+ To solve the task of writing a shell script named 'hello.sh' that prints 'hello', I need to create a new file named 'hello.sh'. I will then write the necessary shell script code into this file.
Action:
@@ -84,8 +84,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- Since the file 'hello.sh' has been created, the next step is to write the shell script code that prints "hello". This involves adding a shebang line to specify the shell interpreter and a print command. I will add these lines to the 'hello.sh' file.
+ thought: Thoughts:
+ The task is to write a shell script named 'hello.sh' that prints 'hello'. I have already created the file 'hello.sh' in a previous step. Now, I need to write the shell script code into this file. The script should include the shebang line to specify the shell interpreter and a command to print 'hello'.
Action:
@@ -97,19 +97,21 @@ extras:
Memory 2:
Previous Action:
-action: write
+action: read
args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- I have created the file 'hello.sh' but have not yet written the script content that prints "hello". The next step is to write the shell script code into 'hello.sh'. This will include a shebang line to specify the shell interpreter (bash in this case) and an echo command to print "hello".
+ thought: Thoughts:
+ I need to verify the contents of the 'hello.sh' script to ensure that it correctly contains the shell script code to print 'hello'. This involves checking if the shebang line is present and if the print command is correctly written. If the script is correct, I can then proceed to test it to confirm its functionality.
Action:
Output from Action:
-observation: write
-content:
+observation: read
+content: #!/bin/bash
+echo "hello"
+
extras:
path: hello.sh
======= End Actions =======
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log
index a79e0c6a02..f7e8564066 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log
@@ -66,8 +66,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- To begin solving the task, I need to create a new shell script file named 'hello.sh'. This file should contain a script that prints the message 'hello' when executed. I will start by writing the initial content of this script.
+ thought: Thoughts:
+ To solve the task of writing a shell script named 'hello.sh' that prints 'hello', I need to create a new file named 'hello.sh'. I will then write the necessary shell script code into this file.
Action:
@@ -84,8 +84,8 @@ args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- Since the file 'hello.sh' has been created, the next step is to write the shell script code that prints "hello". This involves adding a shebang line to specify the shell interpreter and a print command. I will add these lines to the 'hello.sh' file.
+ thought: Thoughts:
+ The task is to write a shell script named 'hello.sh' that prints 'hello'. I have already created the file 'hello.sh' in a previous step. Now, I need to write the shell script code into this file. The script should include the shebang line to specify the shell interpreter and a command to print 'hello'.
Action:
@@ -97,31 +97,13 @@ extras:
Memory 2:
Previous Action:
-action: write
-args:
- path: hello.sh
- start: 0
- end: -1
- thoughts: Thoughts:
- I have created the file 'hello.sh' but have not yet written the script content that prints "hello". The next step is to write the shell script code into 'hello.sh'. This will include a shebang line to specify the shell interpreter (bash in this case) and an echo command to print "hello".
-
- Action:
-
-Output from Action:
-observation: write
-content:
-extras:
- path: hello.sh
-
-Memory 3:
-Previous Action:
action: read
args:
path: hello.sh
start: 0
end: -1
- thoughts: Thoughts:
- I have created the file 'hello.sh' but have not yet confirmed the content of the script. The next step is to read the content of 'hello.sh' to ensure that it contains the correct shell script code that prints "hello". This will help verify that the previous write actions were successful and the file is ready for execution.
+ thought: Thoughts:
+ I need to verify the contents of the 'hello.sh' script to ensure that it correctly contains the shell script code to print 'hello'. This involves checking if the shebang line is present and if the print command is correctly written. If the script is correct, I can then proceed to test it to confirm its functionality.
Action:
@@ -132,6 +114,22 @@ echo "hello"
extras:
path: hello.sh
+
+Memory 3:
+Previous Action:
+action: run
+args:
+ command: bash hello.sh
+ background: False
+ thought:
+
+Output from Action:
+observation: run
+content: hello
+extras:
+ command_id: -1
+ command: bash hello.sh
+ exit_code: 0
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log
deleted file mode 100644
index 4a68e46a11..0000000000
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log
+++ /dev/null
@@ -1,198 +0,0 @@
-
-
-----------
-
-SYSTEM INFO:
-You am an autonomous coding agent, here to provide solutions for coding issues.
-You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment.
-You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
-
-INSTRUCTIONS:
-Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want.
-Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
-When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command.
-Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`.
-
-NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
-
-IMPORTANT TIPS:
-1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved.
- If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through.
-2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it.
-3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file.
-4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command.
-5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory.
-6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
-7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed.
-
-
-DOCUMENTATION:
-It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you.
-They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task.
-You can open an file by using either the read or write operations.
-- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it.
-- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file.
-
-Commands:
-exit - Executed when task is complete
-read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read.
-write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file.
-browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github
-scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines
-scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines
-edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes
-goto - This will take you directly to a line and show you the 100 lines below it.
- - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included
-pip install - You can use this to import python packages. Make sure you include the correct package name when using this command.
-ls - Use the ls command to view all the files in your current directory, this is a good starting point.
-NOT ALLOWED - You cannot use interactive commands like python or node
-Custom bash commands:
-None
-
-
-The following commands require an open file to be used: edit, scroll_up, scroll_down, goto
-To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down'
-when using write and edit do not surround the code with any "" just write the code.
-
-----------
-
-These are your past 4 actions:
-===== Previous Actions =====
-
-Memory 0:
-Previous Action:
-action: write
-args:
- path: hello.sh
- start: 0
- end: -1
- thoughts: Thoughts:
- Since the file 'hello.sh' has been created, the next step is to write the shell script code that prints "hello". This involves adding a shebang line to specify the shell interpreter and a print command. I will add these lines to the 'hello.sh' file.
-
- Action:
-
-Output from Action:
-observation: write
-content:
-extras:
- path: hello.sh
-
-Memory 1:
-Previous Action:
-action: write
-args:
- path: hello.sh
- start: 0
- end: -1
- thoughts: Thoughts:
- I have created the file 'hello.sh' but have not yet written the script content that prints "hello". The next step is to write the shell script code into 'hello.sh'. This will include a shebang line to specify the shell interpreter (bash in this case) and an echo command to print "hello".
-
- Action:
-
-Output from Action:
-observation: write
-content:
-extras:
- path: hello.sh
-
-Memory 2:
-Previous Action:
-action: read
-args:
- path: hello.sh
- start: 0
- end: -1
- thoughts: Thoughts:
- I have created the file 'hello.sh' but have not yet confirmed the content of the script. The next step is to read the content of 'hello.sh' to ensure that it contains the correct shell script code that prints "hello". This will help verify that the previous write actions were successful and the file is ready for execution.
-
- Action:
-
-Output from Action:
-observation: read
-content: #!/bin/bash
-echo "hello"
-
-extras:
- path: hello.sh
-
-Memory 3:
-Previous Action:
-action: run
-args:
- command: bash hello.sh
- background: False
-
-Output from Action:
-observation: run
-content: hello
-extras:
- command_id: -1
- command: bash hello.sh
- exit_code: 0
-======= End Actions =======
-Use these memories to provide additional context to the problem you are solving.
-Remember that you have already completed these steps so you do not need to perform them again.
-
-----------
-
-RESPONSE FORMAT:
-This is the format of the response you will make in order to solve the current issue.
-You will be given multiple iterations to complete this task so break it into steps and solve them one by one.
-
-Your output must contain the following:
-- First, thoughts about what your next action should be and plan it out.
- - You will have a memory of your thoughts so you can use this to remember things for the next step.
- - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem.
-- Second, create a piece of code that will execute your next action based on the thoughts you have.
- - Remember that you can only have one action for each thought, do not include multiple actions.
-
-Your code MUST be surrounded in triple back ticks EXACTLY like this:
-```
-
-```
-
-Notes:
-- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output.
-- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result.
-- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary.
-- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then.
-- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves.
-- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done.
-- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder.
-- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello():
- print('Hello World')` this is how you would have to format your write command.
- - The white spaces matter as the code changes will be added to the code so they must have proper syntax.
-
-This is a template using the format described above
-Items in <> are suggestions for you, fill them out based on the context of the problem you are solving.
-
-[ FORMAT ]
-Thoughts:
-
-
-
-Action:
-```
-
-```
-[ END FORMAT ]
-
-Do not provide anything extra just your thought and action.
-
-You are currently trying to complete this task:
-Write a shell script 'hello.sh' that prints 'hello'.
-
-CURRENT WORKSPACE:
- Open File: hello.sh on line 0
- You can use these commands with the current file:
- Navigation: `scroll_up`, `scroll_down`, and `goto `
- Modification: `edit `
-
-
-Keep all of the guidelines above in mind when you are thinking and making code.
-Please come up with a thought and action based on your current task and latest steps.
-Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything.
-Be very strict about the formatting that you use and make sure you follow the guidelines.
-NEVER output multiple commands. ONLY take ONE STEP at a time.
-When you have completed your task run the "exit" command.
-Begin with your thought about the next step and then come up with an action to perform your thought.
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log
index 757ced6842..c434807f55 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log
@@ -1,8 +1,8 @@
Thoughts:
-To begin solving the task, I need to create a new shell script file named 'hello.sh'. This file should contain a script that prints the message 'hello' when executed. I will start by writing the initial content of this script.
+To solve the task of writing a shell script named 'hello.sh' that prints 'hello', I need to create a new file named 'hello.sh'. I will then write the necessary shell script code into this file.
Action:
```
write hello.sh #!/bin/bash
-echo 'hello'
+echo "hello"
```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log
index a9b34f581c..1cc79f854b 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log
@@ -1,5 +1,5 @@
Thoughts:
-Since the file 'hello.sh' has been created, the next step is to write the shell script code that prints "hello". This involves adding a shebang line to specify the shell interpreter and a print command. I will add these lines to the 'hello.sh' file.
+The task is to write a shell script named 'hello.sh' that prints 'hello'. I have already created the file 'hello.sh' in a previous step. Now, I need to write the shell script code into this file. The script should include the shebang line to specify the shell interpreter and a command to print 'hello'.
Action:
```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log
index 0fa2854495..49592827e5 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log
@@ -1,8 +1,7 @@
Thoughts:
-I have created the file 'hello.sh' but have not yet written the script content that prints "hello". The next step is to write the shell script code into 'hello.sh'. This will include a shebang line to specify the shell interpreter (bash in this case) and an echo command to print "hello".
+I need to verify the contents of the 'hello.sh' script to ensure that it correctly contains the shell script code to print 'hello'. This involves checking if the shebang line is present and if the print command is correctly written. If the script is correct, I can then proceed to test it to confirm its functionality.
Action:
```
-write hello.sh #!/bin/bash
-echo "hello"
+read hello.sh
```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log
index fa6d159a3a..2092717d50 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log
@@ -1,7 +1,7 @@
Thoughts:
-I have created the file 'hello.sh' but have not yet confirmed the content of the script. The next step is to read the content of 'hello.sh' to ensure that it contains the correct shell script code that prints "hello". This will help verify that the previous write actions were successful and the file is ready for execution.
+The shell script 'hello.sh' has been written and contains the necessary commands to print 'hello'. The next step is to test the script to ensure it functions as expected. This involves running the script and checking the output. If the script runs successfully and prints 'hello', then the task is completed.
Action:
```
-read hello.sh
+bash hello.sh
```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log
index 5ff2634c95..7515501080 100644
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log
+++ b/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log
@@ -1,7 +1,7 @@
Thoughts:
-The shell script 'hello.sh' has been successfully created and contains the necessary code to print "hello". The next logical step is to test the script to ensure it functions as expected. This involves executing the script and observing the output. If the script runs correctly and prints "hello", then the task is completed successfully.
+The shell script 'hello.sh' has been successfully created and tested, and it correctly prints 'hello' as required. The script includes the shebang line and the echo command to print 'hello'. Since the script is functioning as expected and meets the task requirements, there are no further modifications or tests needed. It's time to conclude this task.
Action:
```
-bash hello.sh
+exit
```
diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log
deleted file mode 100644
index bc1cfb2181..0000000000
--- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log
+++ /dev/null
@@ -1,7 +0,0 @@
-Thoughts:
-The shell script 'hello.sh' has been successfully created and tested, and it correctly prints "hello" when executed. Since the task was to write a shell script that prints 'hello' and this has been achieved as evidenced by the output from the previous run command, there are no further modifications or actions needed on the script. Therefore, I should conclude this task.
-
-Action:
-```
-exit
-```
diff --git a/tests/integration/test_agent.py b/tests/integration/test_agent.py
index 137425f18e..5eea4ac5c0 100644
--- a/tests/integration/test_agent.py
+++ b/tests/integration/test_agent.py
@@ -1,13 +1,18 @@
-import os
import asyncio
+import os
import subprocess
import pytest
-from opendevin.main import main
+from opendevin.core.main import main
-@pytest.mark.skipif(os.environ.get('AGENT') == 'CodeActAgent', reason='CodeActAgent requires task to be in a special format')
+# skip if
+@pytest.mark.skipif(
+ os.getenv('AGENT') == 'CodeActAgent'
+ and os.getenv('SANDBOX_TYPE').lower() == 'exec',
+ reason='CodeActAgent does not support exec sandbox since exec sandbox is NOT stateful',
+)
def test_write_simple_script():
task = "Write a shell script 'hello.sh' that prints 'hello'."
asyncio.run(main(task))
@@ -20,4 +25,6 @@ def test_write_simple_script():
result = subprocess.run(['bash', script_path], capture_output=True, text=True)
# Verify the output from the script
- assert result.stdout.strip() == 'hello', f'Expected output "hello", but got "{result.stdout.strip()}"'
+ assert (
+ result.stdout.strip() == 'hello'
+ ), f'Expected output "hello", but got "{result.stdout.strip()}"'
diff --git a/tests/test_fileops.py b/tests/test_fileops.py
index e657ac7dc8..0a6c3d3f0c 100644
--- a/tests/test_fileops.py
+++ b/tests/test_fileops.py
@@ -1,26 +1,28 @@
-from opendevin import config
-from opendevin.schema import ConfigType
-from opendevin.action import fileop
from pathlib import Path
+
import pytest
+from opendevin import config
+from opendevin.events.action import files
+from opendevin.schema import ConfigType
+SANDBOX_PATH_PREFIX = '/workspace'
def test_resolve_path():
- assert fileop.resolve_path('test.txt', '/workspace') == Path(config.get(ConfigType.WORKSPACE_BASE)) / 'test.txt'
- assert fileop.resolve_path('subdir/test.txt', '/workspace') == \
+ assert files.resolve_path('test.txt', '/workspace') == Path(config.get(ConfigType.WORKSPACE_BASE)) / 'test.txt'
+ assert files.resolve_path('subdir/test.txt', '/workspace') == \
Path(config.get(ConfigType.WORKSPACE_BASE)) / 'subdir' / 'test.txt'
- assert fileop.resolve_path(Path(fileop.SANDBOX_PATH_PREFIX) / 'test.txt', '/workspace') == \
+ assert files.resolve_path(Path(SANDBOX_PATH_PREFIX) / 'test.txt', '/workspace') == \
Path(config.get(ConfigType.WORKSPACE_BASE)) / 'test.txt'
- assert fileop.resolve_path(Path(fileop.SANDBOX_PATH_PREFIX) / 'subdir' / 'test.txt',
+ assert files.resolve_path(Path(SANDBOX_PATH_PREFIX) / 'subdir' / 'test.txt',
'/workspace') == Path(config.get(ConfigType.WORKSPACE_BASE)) / 'subdir' / 'test.txt'
- assert fileop.resolve_path(Path(fileop.SANDBOX_PATH_PREFIX) / 'subdir' / '..' / 'test.txt',
+ assert files.resolve_path(Path(SANDBOX_PATH_PREFIX) / 'subdir' / '..' / 'test.txt',
'/workspace') == Path(config.get(ConfigType.WORKSPACE_BASE)) / 'test.txt'
with pytest.raises(PermissionError):
- fileop.resolve_path(Path(fileop.SANDBOX_PATH_PREFIX) / '..' / 'test.txt', '/workspace')
+ files.resolve_path(Path(SANDBOX_PATH_PREFIX) / '..' / 'test.txt', '/workspace')
with pytest.raises(PermissionError):
- fileop.resolve_path(Path('..') / 'test.txt', '/workspace')
+ files.resolve_path(Path('..') / 'test.txt', '/workspace')
with pytest.raises(PermissionError):
- fileop.resolve_path(Path('/') / 'test.txt', '/workspace')
- assert fileop.resolve_path('test.txt', '/workspace/test') == \
+ files.resolve_path(Path('/') / 'test.txt', '/workspace')
+ assert files.resolve_path('test.txt', '/workspace/test') == \
Path(config.get(ConfigType.WORKSPACE_BASE)) / 'test' / 'test.txt'
diff --git a/tests/unit/test_action_github.py b/tests/unit/test_action_github.py
index 570613ec51..a55a0643c9 100644
--- a/tests/unit/test_action_github.py
+++ b/tests/unit/test_action_github.py
@@ -1,17 +1,17 @@
-
-from opendevin import config
-from agenthub.dummy_agent.agent import DummyAgent
-from opendevin.action.github import GitHubPushAction, GitHubSendPRAction
-from opendevin.controller.agent_controller import AgentController
-from opendevin.llm.llm import LLM
-from opendevin.observation.error import AgentErrorObservation
-from opendevin.observation.message import AgentMessageObservation
-from opendevin.observation.run import CmdOutputObservation
-
-from opendevin.schema.config import ConfigType
-import pytest
from unittest.mock import MagicMock, call, patch
+import pytest
+
+from agenthub.dummy_agent.agent import DummyAgent
+from opendevin.controller.agent_controller import AgentController
+from opendevin.core import config
+from opendevin.core.schema.config import ConfigType
+from opendevin.events.action.github import GitHubPushAction, GitHubSendPRAction
+from opendevin.events.observation.commands import CmdOutputObservation
+from opendevin.events.observation.error import AgentErrorObservation
+from opendevin.events.observation.message import AgentMessageObservation
+from opendevin.llm.llm import LLM
+
@pytest.fixture
def agent_controller():
@@ -27,12 +27,16 @@ def agent_controller():
@patch.dict(config.config, {'GITHUB_TOKEN': 'fake_token'}, clear=True)
@patch('random.choices')
@patch('opendevin.controller.action_manager.ActionManager.run_command')
-async def test_run_push_successful(mock_run_command, mock_random_choices, agent_controller):
+async def test_run_push_successful(
+ mock_run_command, mock_random_choices, agent_controller
+):
# Setup mock for random.choices
mock_random_choices.return_value = ['a', 'b', 'c', 'd', 'e']
# Create a CmdOutputObservation instance for successful command execution
- successful_output = CmdOutputObservation(content='', command_id=1, command='', exit_code=0)
+ successful_output = CmdOutputObservation(
+ content='', command_id=1, command='', exit_code=0
+ )
# Setup the mock for run_command to return successful output
mock_run_command.return_value = successful_output
@@ -63,17 +67,13 @@ async def test_run_push_successful(mock_run_command, mock_random_choices, agent_
async def test_run_push_error_missing_token(
mock_run_command, mock_random_choices, agent_controller
):
-
# Run the method
push_action = GitHubPushAction(owner='owner', repo='repo', branch='branch')
result = await push_action.run(agent_controller)
# Verify the result is an error due to missing token
assert isinstance(result, AgentErrorObservation)
- assert (
- result.message
- == 'Oops. Something went wrong: GITHUB_TOKEN is not set'
- )
+ assert result.message == 'Oops. Something went wrong: GITHUB_TOKEN is not set'
@pytest.mark.asyncio
@@ -87,7 +87,15 @@ async def test_run_pull_request_created_successfully(mock_post, agent_controller
mock_post.return_value = mock_response
# Run the method
- pr_action = GitHubSendPRAction(owner='owner', repo='repo', title='title', head='head', head_repo='head_repo', base='base', body='body')
+ pr_action = GitHubSendPRAction(
+ owner='owner',
+ repo='repo',
+ title='title',
+ head='head',
+ head_repo='head_repo',
+ base='base',
+ body='body',
+ )
result = await pr_action.run(agent_controller)
# Verify the result is a success observation
@@ -95,6 +103,7 @@ async def test_run_pull_request_created_successfully(mock_post, agent_controller
assert 'Pull request created successfully' in result.content
assert 'https://github.com/example/pull/1' in result.content
+
@pytest.mark.asyncio
@patch('requests.post')
@patch.dict(config.config, {'GITHUB_TOKEN': 'fake_token'}, clear=True)
@@ -106,7 +115,15 @@ async def test_run_pull_request_creation_failed(mock_post, agent_controller):
mock_post.return_value = mock_response
# Run the method
- pr_action = GitHubSendPRAction(owner='owner', repo='repo', title='title', head='head', head_repo='head_repo', base='base', body='body')
+ pr_action = GitHubSendPRAction(
+ owner='owner',
+ repo='repo',
+ title='title',
+ head='head',
+ head_repo='head_repo',
+ base='base',
+ body='body',
+ )
result = await pr_action.run(agent_controller)
# Verify the result is an error observation
@@ -115,11 +132,19 @@ async def test_run_pull_request_creation_failed(mock_post, agent_controller):
assert 'Status code: 400' in result.content
assert 'Bad Request' in result.content
+
@pytest.mark.asyncio
async def test_run_error_missing_token(agent_controller):
-
# Run the method
- pr_action = GitHubSendPRAction(owner='owner', repo='repo', title='title', head='head', head_repo='head_repo', base='base', body='body')
+ pr_action = GitHubSendPRAction(
+ owner='owner',
+ repo='repo',
+ title='title',
+ head='head',
+ head_repo='head_repo',
+ base='base',
+ body='body',
+ )
result = await pr_action.run(agent_controller)
# Verify the result is an error due to missing token
diff --git a/tests/unit/test_action_serialization.py b/tests/unit/test_action_serialization.py
index 05d383a399..2d699a95b6 100644
--- a/tests/unit/test_action_serialization.py
+++ b/tests/unit/test_action_serialization.py
@@ -1,17 +1,17 @@
-from opendevin.action import (
- action_from_dict,
+from opendevin.events.action import (
Action,
+ AddTaskAction,
+ AgentFinishAction,
+ AgentRecallAction,
AgentThinkAction,
+ BrowseURLAction,
CmdKillAction,
CmdRunAction,
- BrowseURLAction,
- GitHubPushAction,
FileReadAction,
FileWriteAction,
- AgentRecallAction,
- AgentFinishAction,
- AddTaskAction,
+ GitHubPushAction,
ModifyTaskAction,
+ action_from_dict,
)
@@ -39,7 +39,7 @@ def test_agent_think_action_serialization_deserialization():
def test_agent_recall_action_serialization_deserialization():
original_action_dict = {
'action': 'recall',
- 'args': {'query': 'Test query.'}
+ 'args': {'query': 'Test query.', 'thought': ''}
}
serialization_deserialization(original_action_dict, AgentRecallAction)
@@ -47,7 +47,7 @@ def test_agent_recall_action_serialization_deserialization():
def test_agent_finish_action_serialization_deserialization():
original_action_dict = {
'action': 'finish',
- 'args': {'outputs': {}},
+ 'args': {'outputs': {}, 'thought': ''}
}
serialization_deserialization(original_action_dict, AgentFinishAction)
@@ -55,7 +55,7 @@ def test_agent_finish_action_serialization_deserialization():
def test_cmd_kill_action_serialization_deserialization():
original_action_dict = {
'action': 'kill',
- 'args': {'id': '1337'}
+ 'args': {'id': '1337', 'thought': ''}
}
serialization_deserialization(original_action_dict, CmdKillAction)
@@ -63,7 +63,7 @@ def test_cmd_kill_action_serialization_deserialization():
def test_cmd_run_action_serialization_deserialization():
original_action_dict = {
'action': 'run',
- 'args': {'command': 'echo "Hello world"', 'background': True}
+ 'args': {'command': 'echo "Hello world"', 'background': True, 'thought': ''}
}
serialization_deserialization(original_action_dict, CmdRunAction)
@@ -71,7 +71,7 @@ def test_cmd_run_action_serialization_deserialization():
def test_browse_url_action_serialization_deserialization():
original_action_dict = {
'action': 'browse',
- 'args': {'url': 'https://www.example.com'}
+ 'args': {'thought': '', 'url': 'https://www.example.com'}
}
serialization_deserialization(original_action_dict, BrowseURLAction)
@@ -87,7 +87,7 @@ def test_github_push_action_serialization_deserialization():
def test_file_read_action_serialization_deserialization():
original_action_dict = {
'action': 'read',
- 'args': {'path': '/path/to/file.txt', 'start': 0, 'end': -1, 'thoughts': 'None'}
+ 'args': {'path': '/path/to/file.txt', 'start': 0, 'end': -1, 'thought': 'None'}
}
serialization_deserialization(original_action_dict, FileReadAction)
@@ -95,7 +95,7 @@ def test_file_read_action_serialization_deserialization():
def test_file_write_action_serialization_deserialization():
original_action_dict = {
'action': 'write',
- 'args': {'path': '/path/to/file.txt', 'content': 'Hello world', 'start': 0, 'end': 1, 'thoughts': 'None'}
+ 'args': {'path': '/path/to/file.txt', 'content': 'Hello world', 'start': 0, 'end': 1, 'thought': 'None'}
}
serialization_deserialization(original_action_dict, FileWriteAction)
@@ -103,7 +103,7 @@ def test_file_write_action_serialization_deserialization():
def test_add_task_action_serialization_deserialization():
original_action_dict = {
'action': 'add_task',
- 'args': {'parent': 'Test parent', 'goal': 'Test goal', 'subtasks': []}
+ 'args': {'parent': 'Test parent', 'goal': 'Test goal', 'subtasks': [], 'thought': ''}
}
serialization_deserialization(original_action_dict, AddTaskAction)
@@ -111,6 +111,6 @@ def test_add_task_action_serialization_deserialization():
def test_modify_task_action_serialization_deserialization():
original_action_dict = {
'action': 'modify_task',
- 'args': {'id': 1, 'state': 'Test state.'}
+ 'args': {'id': 1, 'state': 'Test state.', 'thought': ''}
}
serialization_deserialization(original_action_dict, ModifyTaskAction)
diff --git a/tests/unit/test_arg_parser.py b/tests/unit/test_arg_parser.py
index 6c140f795a..1b332c8669 100644
--- a/tests/unit/test_arg_parser.py
+++ b/tests/unit/test_arg_parser.py
@@ -1,7 +1,7 @@
-from opendevin.config import get_parser
-
import pytest
+from opendevin.core.config import get_parser
+
def test_help_message(capsys):
parser = get_parser()
@@ -35,8 +35,12 @@ options:
expected_lines = expected_help_message.strip().split('\n')
# Ensure both outputs have the same number of lines
- assert len(actual_lines) == len(expected_lines), 'The number of lines in the help message does not match.'
+ assert len(actual_lines) == len(
+ expected_lines
+ ), 'The number of lines in the help message does not match.'
# Compare each line
for actual, expected in zip(actual_lines, expected_lines):
- assert actual.strip() == expected.strip(), f"Expected '{expected}', got '{actual}'"
+ assert (
+ actual.strip() == expected.strip()
+ ), f"Expected '{expected}', got '{actual}'"
diff --git a/tests/unit/test_micro_agents.py b/tests/unit/test_micro_agents.py
new file mode 100644
index 0000000000..8533bc036c
--- /dev/null
+++ b/tests/unit/test_micro_agents.py
@@ -0,0 +1,70 @@
+import json
+import os
+from unittest.mock import MagicMock
+
+import yaml
+
+from agenthub.micro.registry import all_microagents
+from opendevin.controller.agent import Agent
+from opendevin.controller.state.plan import Plan
+from opendevin.controller.state.state import State
+
+
+def test_all_agents_are_loaded():
+ full_path = os.path.join('agenthub', 'micro')
+ agent_names = set()
+ for root, _, files in os.walk(full_path):
+ for file in files:
+ if file == 'agent.yaml':
+ file_path = os.path.join(root, file)
+ with open(file_path, 'r') as yaml_file:
+ data = yaml.safe_load(yaml_file)
+ agent_names.add(data['name'])
+ assert agent_names == set(all_microagents.keys())
+
+
+def test_coder_agent_with_summary():
+ """
+ Coder agent should render code summary as part of prompt
+ """
+ mock_llm = MagicMock()
+ content = json.dumps({'action': 'finish', 'args': {}})
+ mock_llm.completion.return_value = {'choices': [{'message': {'content': content}}]}
+
+ coder_agent = Agent.get_cls('CoderAgent')(llm=mock_llm)
+ assert coder_agent is not None
+ task = 'This is a dummy task'
+ plan = Plan(task)
+ summary = 'This is a dummy summary about this repo'
+ state = State(plan, inputs={'summary': summary})
+ coder_agent.step(state)
+
+ mock_llm.completion.assert_called_once()
+ _, kwargs = mock_llm.completion.call_args
+ prompt = kwargs['messages'][0]['content']
+ assert task in prompt
+ assert "Here's a summary of the codebase, as it relates to this task" in prompt
+ assert summary in prompt
+
+
+def test_coder_agent_without_summary():
+ """
+ When there's no codebase_summary available, there shouldn't be any prompt
+ about 'code summary'
+ """
+ mock_llm = MagicMock()
+ content = json.dumps({'action': 'finish', 'args': {}})
+ mock_llm.completion.return_value = {'choices': [{'message': {'content': content}}]}
+
+ coder_agent = Agent.get_cls('CoderAgent')(llm=mock_llm)
+ assert coder_agent is not None
+ task = 'This is a dummy task'
+ plan = Plan(task)
+ state = State(plan)
+ coder_agent.step(state)
+
+ mock_llm.completion.assert_called_once()
+ _, kwargs = mock_llm.completion.call_args
+ prompt = kwargs['messages'][0]['content']
+ assert task in prompt
+ assert "Here's a summary of the codebase, as it relates to this task" not in prompt
diff --git a/tests/unit/test_observation_serialization.py b/tests/unit/test_observation_serialization.py
index e75efc0a14..4159f3c13d 100644
--- a/tests/unit/test_observation_serialization.py
+++ b/tests/unit/test_observation_serialization.py
@@ -1,4 +1,8 @@
-from opendevin.observation import observation_from_dict, Observation, CmdOutputObservation
+from opendevin.events.observation import (
+ CmdOutputObservation,
+ Observation,
+ observation_from_dict,
+)
def test_observation_serialization_deserialization():
diff --git a/tests/unit/test_sandbox.py b/tests/unit/test_sandbox.py
new file mode 100644
index 0000000000..4ab00b9454
--- /dev/null
+++ b/tests/unit/test_sandbox.py
@@ -0,0 +1,117 @@
+import pathlib
+import tempfile
+from unittest.mock import patch
+
+import pytest
+
+from opendevin.core import config
+from opendevin.runtime.docker.ssh_box import DockerSSHBox
+
+
+@pytest.fixture
+def temp_dir():
+ # get a temporary directory
+ with tempfile.TemporaryDirectory() as temp_dir:
+ pathlib.Path().mkdir(parents=True, exist_ok=True)
+ yield temp_dir
+
+
+def test_ssh_box_run_as_devin(temp_dir):
+ # get a temporary directory
+ with patch.dict(
+ config.config,
+ {
+ config.ConfigType.WORKSPACE_BASE: temp_dir,
+ config.ConfigType.RUN_AS_DEVIN: 'true',
+ config.ConfigType.SANDBOX_TYPE: 'ssh',
+ },
+ clear=True,
+ ):
+ ssh_box = DockerSSHBox()
+
+ # test the ssh box
+ exit_code, output = ssh_box.execute('ls -l')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == 'total 0'
+
+ exit_code, output = ssh_box.execute('mkdir test')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == ''
+
+ exit_code, output = ssh_box.execute('ls -l')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert 'opendevin' in output, "The output should contain username 'opendevin'"
+ assert 'test' in output, 'The output should contain the test directory'
+
+ exit_code, output = ssh_box.execute('touch test/foo.txt')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == ''
+
+ exit_code, output = ssh_box.execute('ls -l test')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert 'foo.txt' in output, 'The output should contain the foo.txt file'
+
+
+def test_ssh_box_multi_line_cmd_run_as_devin(temp_dir):
+ # get a temporary directory
+ with patch.dict(
+ config.config,
+ {
+ config.ConfigType.WORKSPACE_BASE: temp_dir,
+ config.ConfigType.RUN_AS_DEVIN: 'true',
+ config.ConfigType.SANDBOX_TYPE: 'ssh',
+ },
+ clear=True,
+ ):
+ ssh_box = DockerSSHBox()
+
+ # test the ssh box
+ exit_code, output = ssh_box.execute('pwd\nls -l')
+ assert exit_code == 0, 'The exit code should be 0.'
+ expected_lines = ['/workspacels -l', 'total 0']
+ assert output.strip().splitlines() == expected_lines
+
+
+def test_ssh_box_stateful_cmd_run_as_devin(temp_dir):
+ # get a temporary directory
+ with patch.dict(
+ config.config,
+ {
+ config.ConfigType.WORKSPACE_BASE: temp_dir,
+ config.ConfigType.RUN_AS_DEVIN: 'true',
+ config.ConfigType.SANDBOX_TYPE: 'ssh',
+ },
+ clear=True,
+ ):
+ ssh_box = DockerSSHBox()
+
+ # test the ssh box
+ exit_code, output = ssh_box.execute('mkdir test')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == ''
+
+ exit_code, output = ssh_box.execute('cd test')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == ''
+
+ exit_code, output = ssh_box.execute('pwd')
+ assert exit_code == 0, 'The exit code should be 0.'
+ assert output.strip() == '/workspace/test'
+
+
+def test_ssh_box_failed_cmd_run_as_devin(temp_dir):
+ # get a temporary directory
+ with patch.dict(
+ config.config,
+ {
+ config.ConfigType.WORKSPACE_BASE: temp_dir,
+ config.ConfigType.RUN_AS_DEVIN: 'true',
+ config.ConfigType.SANDBOX_TYPE: 'ssh',
+ },
+ clear=True,
+ ):
+ ssh_box = DockerSSHBox()
+
+ # test the ssh box with a command that fails
+ exit_code, output = ssh_box.execute('non_existing_command')
+ assert exit_code != 0, 'The exit code should not be 0 for a failed command.'