Integration test: Verify finish state & add auto-rerun in regenerate.sh (#1773)

* regenerate.sh: Allow testing on a specific agent and/or test

* Check agent finish state

* rengerate.sh: Rerun after fixing the prompts

* Fix SWEAgent test_write_simple_script

* Add more help message

* Add a known issue to README.md

* regenerate.sh: Fix help message typo

* Fix a typo in README
This commit is contained in:
Boxuan Li
2024-05-14 00:50:29 -07:00
committed by GitHub
parent b84f25ab35
commit 3d53d363b4
23 changed files with 133 additions and 877 deletions

View File

@@ -25,10 +25,17 @@ def read_task_from_stdin() -> str:
return sys.stdin.read()
async def main(task_str: str = '', exit_on_message: bool = False) -> None:
async def main(task_str: str = '', exit_on_message: bool = False) -> AgentState:
"""
Main coroutine to run the agent controller with task input flexibility.
It's only used when you launch opendevin backend directly via cmdline.
Args:
task_str: task string (optional)
exit_on_message: quit if agent asks for a message from user (optional)
Returns:
The final agent state right before shutdown
"""
# Determine the task source
@@ -99,7 +106,10 @@ async def main(task_str: str = '', exit_on_message: bool = False) -> None:
]:
await asyncio.sleep(1) # Give back control for a tick, so the agent can run
# retrieve the final state before we close the controller and agent
final_agent_state = controller.get_agent_state()
await controller.close()
return final_agent_state
if __name__ == '__main__':

View File

@@ -65,6 +65,19 @@ failed tests, but it still costs money! If you don't want
to cover the cost, ask one of the maintainers to regenerate for you.
You might also be able to fix the tests by hand.
If you only want to run a specific test, set environment variable
`ONLY_TEST_NAME` to the test name. If you only want to run a specific agent,
set environment variable `ONLY_TEST_AGENT` to the agent. You could also use both,
e.g.
```bash
TEST_ONLY=true ONLY_TEST_NAME="test_write_simple_script" ONLY_TEST_AGENT="MonologueAgent" ./tests/integration/regenerate.sh
```
Known issue: sometimes you might see transient errors like `pexpect.pxssh.ExceptionPxssh: Could not establish connection to host`.
The regenerate.sh script doesn't know this is a transient error and would still regenerate the test artifacts. You could simply
terminate the script by `ctrl+c` and rerun the script.
## Write a new Integration Test
To write an integration test, there are essentially two steps:

View File

@@ -78,7 +78,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to create a shell script named 'hello.sh' that will print 'hello' when executed. I will need to ensure that the script does not prompt for any confirmation during execution.
To begin solving the task, I need to create a new shell script named 'hello.sh'. This script should contain a simple command to print 'hello'. I will use the `write` command to create this file and insert the necessary bash script code.
Action:

View File

@@ -78,7 +78,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to create a shell script named 'hello.sh' that will print 'hello' when executed. I will need to ensure that the script does not prompt for any confirmation during execution.
To begin solving the task, I need to create a new shell script named 'hello.sh'. This script should contain a simple command to print 'hello'. I will use the `write` command to create this file and insert the necessary bash script code.
Action:
@@ -96,7 +96,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation.
I have created the file 'hello.sh' but I have not yet written any content into it. The task requires me to write a shell script that prints 'hello'. I need to add the appropriate bash command to this file to fulfill the requirement. The command to print in bash is `echo`. I will write this command into 'hello.sh'.
Action:

View File

@@ -78,7 +78,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to create a shell script named 'hello.sh' that will print 'hello' when executed. I will need to ensure that the script does not prompt for any confirmation during execution.
To begin solving the task, I need to create a new shell script named 'hello.sh'. This script should contain a simple command to print 'hello'. I will use the `write` command to create this file and insert the necessary bash script code.
Action:
@@ -96,7 +96,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation.
I have created the file 'hello.sh' but I have not yet written any content into it. The task requires me to write a shell script that prints 'hello'. I need to add the appropriate bash command to this file to fulfill the requirement. The command to print in bash is `echo`. I will write this command into 'hello.sh'.
Action:
@@ -114,7 +114,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation. I will simply add the command to print 'hello' to the script.
Based on the previous actions, it appears that the file 'hello.sh' has been created but no content has been written into it yet. The task requires a shell script that prints 'hello'. I need to write the appropriate bash command into 'hello.sh' to fulfill this requirement. The command to print in bash is `echo "hello"`.
Action:

View File

@@ -67,7 +67,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to create a shell script named 'hello.sh' that will print 'hello' when executed. I will need to ensure that the script does not prompt for any confirmation during execution.
To begin solving the task, I need to create a new shell script named 'hello.sh'. This script should contain a simple command to print 'hello'. I will use the `write` command to create this file and insert the necessary bash script code.
Action:
@@ -85,7 +85,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation.
I have created the file 'hello.sh' but I have not yet written any content into it. The task requires me to write a shell script that prints 'hello'. I need to add the appropriate bash command to this file to fulfill the requirement. The command to print in bash is `echo`. I will write this command into 'hello.sh'.
Action:
@@ -103,7 +103,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation. I will simply add the command to print 'hello' to the script.
Based on the previous actions, it appears that the file 'hello.sh' has been created but no content has been written into it yet. The task requires a shell script that prints 'hello'. I need to write the appropriate bash command into 'hello.sh' to fulfill this requirement. The command to print in bash is `echo "hello"`.
Action:
@@ -115,19 +115,20 @@ extras:
Memory 3:
Previous Action:
action: write
action: read
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Based on the previous actions, it seems that the file 'hello.sh' has been created but the content has not been verified yet. The task requires a shell script that prints 'hello'. I need to read the content of 'hello.sh' to ensure that the correct command `echo "hello"` has been written into it. If the content is correct, I can then proceed to test the script to confirm it behaves as expected.
Action:
Output from Action:
observation: write
content:
observation: read
content: echo "hello"
extras:
path: hello.sh
======= End Actions =======

View File

@@ -67,7 +67,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation.
I have created the file 'hello.sh' but I have not yet written any content into it. The task requires me to write a shell script that prints 'hello'. I need to add the appropriate bash command to this file to fulfill the requirement. The command to print in bash is `echo`. I will write this command into 'hello.sh'.
Action:
@@ -85,7 +85,7 @@ args:
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation. I will simply add the command to print 'hello' to the script.
Based on the previous actions, it appears that the file 'hello.sh' has been created but no content has been written into it yet. The task requires a shell script that prints 'hello'. I need to write the appropriate bash command into 'hello.sh' to fulfill this requirement. The command to print in bash is `echo "hello"`.
Action:
@@ -97,39 +97,38 @@ extras:
Memory 2:
Previous Action:
action: write
action: read
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Based on the previous actions, it seems that the file 'hello.sh' has been created but the content has not been verified yet. The task requires a shell script that prints 'hello'. I need to read the content of 'hello.sh' to ensure that the correct command `echo "hello"` has been written into it. If the content is correct, I can then proceed to test the script to confirm it behaves as expected.
Action:
Output from Action:
observation: write
content:
observation: read
content: echo "hello"
extras:
path: hello.sh
Memory 3:
Previous Action:
action: write
action: run
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
command: bash hello.sh
background: False
thought:
Output from Action:
observation: write
content:
observation: run
content: hello
extras:
path: hello.sh
command_id: -1
command: bash hello.sh
exit_code: 0
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.

View File

@@ -1,199 +0,0 @@
----------
SYSTEM INFO:
You am an autonomous coding agent, here to provide solutions for coding issues.
You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment.
You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
INSTRUCTIONS:
Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want.
Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command.
Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.
NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
IMPORTANT TIPS:
1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved.
If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through.
2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it.
3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read <file> 583'. This is a much faster way to read through the file.
4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command.
5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory.
6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed.
DOCUMENTATION:
It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you.
They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task.
You can open an file by using either the read or write operations.
- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it.
- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file.
Commands:
exit - Executed when task is complete
read <file_name> [<start_line>] [<end_line>] - Shows a given file's contents starting from <start_line> up to <end_line>. Default: start_line = 0, end_line = -1. By default the whole file will be read.
write <file> <changes> [<start_line>] [<end_line>] - Modifies a <file> by replacing the current lines between <start_line> and <end_line> with <changes>. Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file.
browse <url> - Returns the text version of any url, this can be useful to look up documentation or finding issues on github
scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines
scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines
edit <start_line> <end_line> <changes> - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file
goto <line_num> - This will take you directly to a line and show you the 100 lines below it.
<bash_command> <args> - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included
pip install <package> - You can use this to import python packages. Make sure you include the correct package name when using this command.
ls - Use the ls command to view all the files in your current directory, this is a good starting point.
NOT ALLOWED - You cannot use interactive commands like python or node
Custom bash commands:
None
The following commands require an open file to be used: edit, scroll_up, scroll_down, goto
To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down'
when using write and edit do not surround the code with any "" just write the code.
----------
These are your past 4 actions:
===== Previous Actions =====
Memory 0:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation. I will simply add the command to print 'hello' to the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 1:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 2:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 3:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will simply write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.
----------
RESPONSE FORMAT:
This is the format of the response you will make in order to solve the current issue.
You will be given multiple iterations to complete this task so break it into steps and solve them one by one.
Your output must contain the following:
- First, thoughts about what your next action should be and plan it out.
- You will have a memory of your thoughts so you can use this to remember things for the next step.
- Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem.
- Second, create a piece of code that will execute your next action based on the thoughts you have.
- Remember that you can only have one action for each thought, do not include multiple actions.
Your code MUST be surrounded in triple back ticks EXACTLY like this:
```
<code>
```
Notes:
- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output.
- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result.
- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary.
- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then.
- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves.
- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done.
- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder.
- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello():
print('Hello World')` this is how you would have to format your write command.
- The white spaces matter as the code changes will be added to the code so they must have proper syntax.
This is a template using the format described above
Items in <> are suggestions for you, fill them out based on the context of the problem you are solving.
[ FORMAT ]
Thoughts:
<Provide clear and concise thoughts on the next step to take, highlighting any important details or context that should be remembered.>
<You can use multiple lines to express your thoughts>
Action:
```
<command> <params>
```
[ END FORMAT ]
Do not provide anything extra just your thought and action.
You are currently trying to complete this task:
Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.
CURRENT WORKSPACE:
Open File: hello.sh on line 0
You can use these commands with the current file:
Navigation: `scroll_up`, `scroll_down`, and `goto <line>`
Modification: `edit <start_line> <end_line> <changes>`
Keep all of the guidelines above in mind when you are thinking and making code.
Please come up with a thought and action based on your current task and latest steps.
Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything.
Be very strict about the formatting that you use and make sure you follow the guidelines.
NEVER output multiple commands. ONLY take ONE STEP at a time.
When you have completed your task run the "exit" command.
Begin with your thought about the next step and then come up with an action to perform your thought.

View File

@@ -1,199 +0,0 @@
----------
SYSTEM INFO:
You am an autonomous coding agent, here to provide solutions for coding issues.
You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment.
You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
INSTRUCTIONS:
Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want.
Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command.
Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.
NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
IMPORTANT TIPS:
1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved.
If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through.
2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it.
3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read <file> 583'. This is a much faster way to read through the file.
4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command.
5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory.
6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed.
DOCUMENTATION:
It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you.
They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task.
You can open an file by using either the read or write operations.
- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it.
- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file.
Commands:
exit - Executed when task is complete
read <file_name> [<start_line>] [<end_line>] - Shows a given file's contents starting from <start_line> up to <end_line>. Default: start_line = 0, end_line = -1. By default the whole file will be read.
write <file> <changes> [<start_line>] [<end_line>] - Modifies a <file> by replacing the current lines between <start_line> and <end_line> with <changes>. Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file.
browse <url> - Returns the text version of any url, this can be useful to look up documentation or finding issues on github
scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines
scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines
edit <start_line> <end_line> <changes> - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file
goto <line_num> - This will take you directly to a line and show you the 100 lines below it.
<bash_command> <args> - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included
pip install <package> - You can use this to import python packages. Make sure you include the correct package name when using this command.
ls - Use the ls command to view all the files in your current directory, this is a good starting point.
NOT ALLOWED - You cannot use interactive commands like python or node
Custom bash commands:
None
The following commands require an open file to be used: edit, scroll_up, scroll_down, goto
To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down'
when using write and edit do not surround the code with any "" just write the code.
----------
These are your past 4 actions:
===== Previous Actions =====
Memory 0:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 1:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 2:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will simply write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 3:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will write the command to print 'hello' directly into the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.
----------
RESPONSE FORMAT:
This is the format of the response you will make in order to solve the current issue.
You will be given multiple iterations to complete this task so break it into steps and solve them one by one.
Your output must contain the following:
- First, thoughts about what your next action should be and plan it out.
- You will have a memory of your thoughts so you can use this to remember things for the next step.
- Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem.
- Second, create a piece of code that will execute your next action based on the thoughts you have.
- Remember that you can only have one action for each thought, do not include multiple actions.
Your code MUST be surrounded in triple back ticks EXACTLY like this:
```
<code>
```
Notes:
- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output.
- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result.
- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary.
- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then.
- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves.
- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done.
- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder.
- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello():
print('Hello World')` this is how you would have to format your write command.
- The white spaces matter as the code changes will be added to the code so they must have proper syntax.
This is a template using the format described above
Items in <> are suggestions for you, fill them out based on the context of the problem you are solving.
[ FORMAT ]
Thoughts:
<Provide clear and concise thoughts on the next step to take, highlighting any important details or context that should be remembered.>
<You can use multiple lines to express your thoughts>
Action:
```
<command> <params>
```
[ END FORMAT ]
Do not provide anything extra just your thought and action.
You are currently trying to complete this task:
Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.
CURRENT WORKSPACE:
Open File: hello.sh on line 0
You can use these commands with the current file:
Navigation: `scroll_up`, `scroll_down`, and `goto <line>`
Modification: `edit <start_line> <end_line> <changes>`
Keep all of the guidelines above in mind when you are thinking and making code.
Please come up with a thought and action based on your current task and latest steps.
Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything.
Be very strict about the formatting that you use and make sure you follow the guidelines.
NEVER output multiple commands. ONLY take ONE STEP at a time.
When you have completed your task run the "exit" command.
Begin with your thought about the next step and then come up with an action to perform your thought.

View File

@@ -1,199 +0,0 @@
----------
SYSTEM INFO:
You am an autonomous coding agent, here to provide solutions for coding issues.
You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment.
You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
INSTRUCTIONS:
Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want.
Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command.
Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.
NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
IMPORTANT TIPS:
1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved.
If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through.
2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it.
3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read <file> 583'. This is a much faster way to read through the file.
4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command.
5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory.
6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed.
DOCUMENTATION:
It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you.
They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task.
You can open an file by using either the read or write operations.
- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it.
- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file.
Commands:
exit - Executed when task is complete
read <file_name> [<start_line>] [<end_line>] - Shows a given file's contents starting from <start_line> up to <end_line>. Default: start_line = 0, end_line = -1. By default the whole file will be read.
write <file> <changes> [<start_line>] [<end_line>] - Modifies a <file> by replacing the current lines between <start_line> and <end_line> with <changes>. Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file.
browse <url> - Returns the text version of any url, this can be useful to look up documentation or finding issues on github
scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines
scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines
edit <start_line> <end_line> <changes> - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file
goto <line_num> - This will take you directly to a line and show you the 100 lines below it.
<bash_command> <args> - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included
pip install <package> - You can use this to import python packages. Make sure you include the correct package name when using this command.
ls - Use the ls command to view all the files in your current directory, this is a good starting point.
NOT ALLOWED - You cannot use interactive commands like python or node
Custom bash commands:
None
The following commands require an open file to be used: edit, scroll_up, scroll_down, goto
To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down'
when using write and edit do not surround the code with any "" just write the code.
----------
These are your past 4 actions:
===== Previous Actions =====
Memory 0:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 1:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will simply write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 2:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will write the command to print 'hello' directly into the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 3:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will directly write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.
----------
RESPONSE FORMAT:
This is the format of the response you will make in order to solve the current issue.
You will be given multiple iterations to complete this task so break it into steps and solve them one by one.
Your output must contain the following:
- First, thoughts about what your next action should be and plan it out.
- You will have a memory of your thoughts so you can use this to remember things for the next step.
- Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem.
- Second, create a piece of code that will execute your next action based on the thoughts you have.
- Remember that you can only have one action for each thought, do not include multiple actions.
Your code MUST be surrounded in triple back ticks EXACTLY like this:
```
<code>
```
Notes:
- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output.
- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result.
- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary.
- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then.
- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves.
- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done.
- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder.
- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello():
print('Hello World')` this is how you would have to format your write command.
- The white spaces matter as the code changes will be added to the code so they must have proper syntax.
This is a template using the format described above
Items in <> are suggestions for you, fill them out based on the context of the problem you are solving.
[ FORMAT ]
Thoughts:
<Provide clear and concise thoughts on the next step to take, highlighting any important details or context that should be remembered.>
<You can use multiple lines to express your thoughts>
Action:
```
<command> <params>
```
[ END FORMAT ]
Do not provide anything extra just your thought and action.
You are currently trying to complete this task:
Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.
CURRENT WORKSPACE:
Open File: hello.sh on line 0
You can use these commands with the current file:
Navigation: `scroll_up`, `scroll_down`, and `goto <line>`
Modification: `edit <start_line> <end_line> <changes>`
Keep all of the guidelines above in mind when you are thinking and making code.
Please come up with a thought and action based on your current task and latest steps.
Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything.
Be very strict about the formatting that you use and make sure you follow the guidelines.
NEVER output multiple commands. ONLY take ONE STEP at a time.
When you have completed your task run the "exit" command.
Begin with your thought about the next step and then come up with an action to perform your thought.

View File

@@ -1,199 +0,0 @@
----------
SYSTEM INFO:
You am an autonomous coding agent, here to provide solutions for coding issues.
You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment.
You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
INSTRUCTIONS:
Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want.
Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command.
Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.
NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
IMPORTANT TIPS:
1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved.
If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through.
2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it.
3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read <file> 583'. This is a much faster way to read through the file.
4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command.
5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory.
6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed.
DOCUMENTATION:
It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you.
They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task.
You can open an file by using either the read or write operations.
- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it.
- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file.
Commands:
exit - Executed when task is complete
read <file_name> [<start_line>] [<end_line>] - Shows a given file's contents starting from <start_line> up to <end_line>. Default: start_line = 0, end_line = -1. By default the whole file will be read.
write <file> <changes> [<start_line>] [<end_line>] - Modifies a <file> by replacing the current lines between <start_line> and <end_line> with <changes>. Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file.
browse <url> - Returns the text version of any url, this can be useful to look up documentation or finding issues on github
scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines
scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines
edit <start_line> <end_line> <changes> - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file
goto <line_num> - This will take you directly to a line and show you the 100 lines below it.
<bash_command> <args> - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included
pip install <package> - You can use this to import python packages. Make sure you include the correct package name when using this command.
ls - Use the ls command to view all the files in your current directory, this is a good starting point.
NOT ALLOWED - You cannot use interactive commands like python or node
Custom bash commands:
None
The following commands require an open file to be used: edit, scroll_up, scroll_down, goto
To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down'
when using write and edit do not surround the code with any "" just write the code.
----------
These are your past 4 actions:
===== Previous Actions =====
Memory 0:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will simply write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 1:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will write the command to print 'hello' directly into the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 2:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will directly write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
Memory 3:
Previous Action:
action: write
args:
path: hello.sh
start: 0
end: -1
thought: Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will directly write the command to print 'hello' in the script.
Action:
Output from Action:
observation: write
content:
extras:
path: hello.sh
======= End Actions =======
Use these memories to provide additional context to the problem you are solving.
Remember that you have already completed these steps so you do not need to perform them again.
----------
RESPONSE FORMAT:
This is the format of the response you will make in order to solve the current issue.
You will be given multiple iterations to complete this task so break it into steps and solve them one by one.
Your output must contain the following:
- First, thoughts about what your next action should be and plan it out.
- You will have a memory of your thoughts so you can use this to remember things for the next step.
- Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem.
- Second, create a piece of code that will execute your next action based on the thoughts you have.
- Remember that you can only have one action for each thought, do not include multiple actions.
Your code MUST be surrounded in triple back ticks EXACTLY like this:
```
<code>
```
Notes:
- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output.
- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result.
- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary.
- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then.
- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves.
- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done.
- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder.
- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello():
print('Hello World')` this is how you would have to format your write command.
- The white spaces matter as the code changes will be added to the code so they must have proper syntax.
This is a template using the format described above
Items in <> are suggestions for you, fill them out based on the context of the problem you are solving.
[ FORMAT ]
Thoughts:
<Provide clear and concise thoughts on the next step to take, highlighting any important details or context that should be remembered.>
<You can use multiple lines to express your thoughts>
Action:
```
<command> <params>
```
[ END FORMAT ]
Do not provide anything extra just your thought and action.
You are currently trying to complete this task:
Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.
CURRENT WORKSPACE:
Open File: hello.sh on line 0
You can use these commands with the current file:
Navigation: `scroll_up`, `scroll_down`, and `goto <line>`
Modification: `edit <start_line> <end_line> <changes>`
Keep all of the guidelines above in mind when you are thinking and making code.
Please come up with a thought and action based on your current task and latest steps.
Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything.
Be very strict about the formatting that you use and make sure you follow the guidelines.
NEVER output multiple commands. ONLY take ONE STEP at a time.
When you have completed your task run the "exit" command.
Begin with your thought about the next step and then come up with an action to perform your thought.

View File

@@ -1,7 +1,8 @@
Thoughts:
I need to create a shell script named 'hello.sh' that will print 'hello' when executed. I will need to ensure that the script does not prompt for any confirmation during execution.
To begin solving the task, I need to create a new shell script named 'hello.sh'. This script should contain a simple command to print 'hello'. I will use the `write` command to create this file and insert the necessary bash script code.
Action:
```
write hello.sh "echo 'hello'"
write hello.sh #!/bin/bash
echo 'hello'
```

View File

@@ -1,7 +1,7 @@
Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation.
I have created the file 'hello.sh' but I have not yet written any content into it. The task requires me to write a shell script that prints 'hello'. I need to add the appropriate bash command to this file to fulfill the requirement. The command to print in bash is `echo`. I will write this command into 'hello.sh'.
Action:
```
edit 0 -1 "echo 'hello'"
write hello.sh echo "hello"
```

View File

@@ -1,7 +1,7 @@
Thoughts:
I need to add the code to the 'hello.sh' script that will print 'hello' when executed without any prompts for confirmation. I will simply add the command to print 'hello' to the script.
Based on the previous actions, it appears that the file 'hello.sh' has been created but no content has been written into it yet. The task requires a shell script that prints 'hello'. I need to write the appropriate bash command into 'hello.sh' to fulfill this requirement. The command to print in bash is `echo "hello"`.
Action:
```
edit 0 -1 "echo 'hello'"
write hello.sh echo "hello"
```

View File

@@ -1,7 +1,7 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
Based on the previous actions, it seems that the file 'hello.sh' has been created but the content has not been verified yet. The task requires a shell script that prints 'hello'. I need to read the content of 'hello.sh' to ensure that the correct command `echo "hello"` has been written into it. If the content is correct, I can then proceed to test the script to confirm it behaves as expected.
Action:
```
edit 0 -1 "echo 'hello'"
read hello.sh
```

View File

@@ -1,7 +1,7 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already created, I will simply edit the existing script to include the command to print 'hello'.
The content of 'hello.sh' has been verified and contains the correct command `echo "hello"` which fulfills the task requirement to print 'hello'. The next logical step is to test the script to ensure it behaves as expected when executed. This will confirm that the script is functional and meets the requirements specified.
Action:
```
edit 0 -1 "echo 'hello'"
bash hello.sh
```

View File

@@ -1,7 +1,7 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will simply write the command to print 'hello' in the script.
The shell script 'hello.sh' has been successfully created and tested, and it outputs 'hello' as expected. Since the task was to write a shell script that prints 'hello' and this has been achieved, there are no further modifications or actions required on the script. Therefore, I should conclude this task.
Action:
```
edit 0 -1 "echo 'hello'"
exit
```

View File

@@ -1,7 +0,0 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will write the command to print 'hello' directly into the script.
Action:
```
edit 0 -1 "echo 'hello'"
```

View File

@@ -1,7 +0,0 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will directly write the command to print 'hello' in the script.
Action:
```
edit 0 -1 "echo 'hello'"
```

View File

@@ -1,7 +0,0 @@
Thoughts:
I need to add the command to print 'hello' in the 'hello.sh' script without any prompts for confirmation. Since the script is already open, I will directly write the command to print 'hello' in the script.
Action:
```
edit 0 -1 "echo 'hello'"
```

View File

@@ -1,7 +0,0 @@
Thoughts:
I have already opened the 'hello.sh' script and need to add the command to print 'hello' without any prompts for confirmation. I will use the `edit` command to insert the necessary code to achieve this.
Action:
```
edit 0 -1 "echo 'hello'"
```

View File

@@ -1,6 +1,18 @@
#!/bin/bash
set -eo pipefail
run_test() {
SANDBOX_TYPE=$SANDBOX_TYPE \
WORKSPACE_BASE=$WORKSPACE_BASE \
REMIND_ITERATIONS=$remind_iterations \
MAX_ITERATIONS=$MAX_ITERATIONS \
WORKSPACE_MOUNT_PATH=$WORKSPACE_MOUNT_PATH \
AGENT=$agent \
poetry run pytest -s ./tests/integration/test_agent.py::$test_name
# return exit code of pytest
return $?
}
if [ -z $WORKSPACE_MOUNT_PATH ]; then
WORKSPACE_MOUNT_PATH=$(pwd)
fi
@@ -30,15 +42,36 @@ test_names=(
num_of_tests=${#test_names[@]}
num_of_agents=${#agents[@]}
if [ "$num_of_agents" -ne "${#remind_iterations_config[@]}" ]; then
echo "Every agent must have its own remind_iterations_config"
exit 1
fi
if [ "$num_of_tests" -ne "${#test_names[@]}" ]; then
echo "Every task must correspond to one test case"
exit 1
fi
rm -rf logs
rm -rf $WORKSPACE_BASE
for ((i = 0; i < num_of_tests; i++)); do
task=${tasks[i]}
test_name=${test_names[i]}
# skip other tests if only one test is specified
if [[ -n "$ONLY_TEST_NAME" && "$ONLY_TEST_NAME" != "$test_name" ]]; then
continue
fi
for ((j = 0; j < num_of_agents; j++)); do
agent=${agents[j]}
remind_iterations=${remind_iterations_config[j]}
# skip other agents if only one agent is specified
if [[ -n "$ONLY_TEST_AGENT" && "$ONLY_TEST_AGENT" != "$agent" ]]; then
continue
fi
echo -e "\n\n\n\n========Running $test_name for $agent========\n\n\n\n"
rm -rf $WORKSPACE_BASE
mkdir $WORKSPACE_BASE
@@ -53,13 +86,7 @@ for ((i = 0; i < num_of_tests; i++)); do
set +e
fi
SANDBOX_TYPE=$SANDBOX_TYPE \
WORKSPACE_BASE=$WORKSPACE_BASE \
REMIND_ITERATIONS=$remind_iterations \
MAX_ITERATIONS=$MAX_ITERATIONS \
WORKSPACE_MOUNT_PATH=$WORKSPACE_MOUNT_PATH \
AGENT=$agent \
poetry run pytest -s ./tests/integration/test_agent.py::$test_name
run_test
TEST_STATUS=$?
# Re-enable 'exit on error'
set -e
@@ -93,6 +120,31 @@ for ((i = 0; i < num_of_tests; i++)); do
mkdir -p tests/integration/mock/$agent/$test_name/
mv logs/llm/**/* tests/integration/mock/$agent/$test_name/
echo -e "\n\n\n\n========$test_name test data regenerated for $agent, rerun test again to verify========\n\n\n\n"
# Temporarily disable 'exit on error'
set +e
run_test
TEST_STATUS=$?
# Re-enable 'exit on error'
set -e
if [[ $TEST_STATUS -ne 0 ]]; then
echo -e "\n\n\n\n========$test_name for $agent RERUN FAILED========\n\n\n\n"
echo -e "There are multiple possibilities:"
echo -e " 1. The agent is unable to finish the task within $MAX_ITERATIONS steps."
echo -e " 2. The agent thinks itself has finished the task, but fails the validation in the test code."
echo -e " 3. There is something non-deterministic in the prompt."
echo -e " 4. There is a bug in this script, or in OpenDevin code."
echo -e "NOTE: Some of the above problems could sometimes be fixed by a retry (with a more powerful LLM)."
echo -e " You could also consider improving the agent, increasing MAX_ITERATIONS, or skipping this test for this agent."
exit 1
else
echo -e "\n\n\n\n========$test_name for $agent RERUN PASSED========\n\n\n\n"
sleep 1
fi
else
echo -e "\n\n\n\n========$test_name for $agent PASSED========\n\n\n\n"
sleep 1

View File

@@ -6,6 +6,7 @@ import subprocess
import pytest
from opendevin.core.main import main
from opendevin.core.schema import AgentState
workspace_base = os.getenv('WORKSPACE_BASE')
@@ -17,7 +18,8 @@ workspace_base = os.getenv('WORKSPACE_BASE')
)
def test_write_simple_script():
task = "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point."
asyncio.run(main(task, exit_on_message=True))
final_agent_state = asyncio.run(main(task, exit_on_message=True))
assert final_agent_state == AgentState.FINISHED
# Verify the script file exists
script_path = os.path.join(workspace_base, 'hello.sh')
@@ -57,7 +59,8 @@ def test_edits():
# Execute the task
task = 'Fix typos in bad.txt. Do not ask me for confirmation at any point.'
asyncio.run(main(task, exit_on_message=True))
final_agent_state = asyncio.run(main(task, exit_on_message=True))
assert final_agent_state == AgentState.FINISHED
# Verify bad.txt has been fixed
text = """This is a stupid typo.
@@ -81,7 +84,8 @@ Enjoy!
def test_ipython():
# Execute the task
task = "Use Jupyter IPython to write a text file containing 'hello world' to '/workspace/test.txt'. Do not ask me for confirmation at any point."
asyncio.run(main(task, exit_on_message=True))
final_agent_state = asyncio.run(main(task, exit_on_message=True))
assert final_agent_state == AgentState.FINISHED
# Verify the file exists
file_path = os.path.join(workspace_base, 'test.txt')