diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_001.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_001.log deleted file mode 100644 index 3e16c53e09..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_001.log +++ /dev/null @@ -1,105 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "open", - "subtasks": [] -} - - -You're not currently working on any tasks. Your next action MUST be to mark a task as in_progress. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -Look at your last thought in the history above. What does it suggest? Don't think anymore--take action. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_002.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_002.log deleted file mode 100644 index 69a37da818..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_002.log +++ /dev/null @@ -1,160 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "open", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "open", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "open", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're not currently working on any tasks. Your next action MUST be to mark a task as in_progress. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_003.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_003.log deleted file mode 100644 index 56f8b4f370..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_003.log +++ /dev/null @@ -1,171 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "in_progress", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "open", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Read the content of bad.txt. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_004.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_004.log deleted file mode 100644 index 23509a31fe..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_004.log +++ /dev/null @@ -1,189 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "in_progress", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "open", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Read the content of bad.txt. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the file you just read, what you learned from it, and how that affects your plan. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_005.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_005.log deleted file mode 100644 index ae0e222b14..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_005.log +++ /dev/null @@ -1,197 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "open", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Fix typos in bad.txt. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_006.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_006.log deleted file mode 100644 index 03fb196126..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_006.log +++ /dev/null @@ -1,207 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "in_progress", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Identify and correct typos in the text. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_007.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_007.log deleted file mode 100644 index e68d3c9b4b..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_007.log +++ /dev/null @@ -1,215 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "in_progress", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Identify and correct typos in the text. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } - }, - { - "source": "agent", - "action": "message", - "args": { - "content": "I have identified the typos in the text from bad.txt. I will now proceed to correct them. The original text is: 'This is a stupid typoo.\\nReally?\\nNo mor typos!\\nEnjoy!\\n'. The corrected version will be: 'This is a stupid typo.\\nReally?\\nNo more typos!\\nEnjoy!\\n'.", - "wait_for_response": false - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -Look at your last thought in the history above. What does it suggest? Don't think anymore--take action. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_008.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_008.log deleted file mode 100644 index 6fdab06f49..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_008.log +++ /dev/null @@ -1,234 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "in_progress", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Identify and correct typos in the text. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } - }, - { - "source": "agent", - "action": "message", - "args": { - "content": "I have identified the typos in the text from bad.txt. I will now proceed to correct them. The original text is: 'This is a stupid typoo.\\nReally?\\nNo mor typos!\\nEnjoy!\\n'. The corrected version will be: 'This is a stupid typo.\\nReally?\\nNo more typos!\\nEnjoy!\\n'.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "write", - "args": { - "path": "bad.txt", - "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "write", - "content": "", - "extras": { - "path": "bad.txt" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You just changed a file. You should think about how it affects your plan. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_009.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_009.log deleted file mode 100644 index 76dd879730..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_009.log +++ /dev/null @@ -1,242 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "open", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Fix typos in bad.txt. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } - }, - { - "source": "agent", - "action": "message", - "args": { - "content": "I have identified the typos in the text from bad.txt. I will now proceed to correct them. The original text is: 'This is a stupid typoo.\\nReally?\\nNo mor typos!\\nEnjoy!\\n'. The corrected version will be: 'This is a stupid typo.\\nReally?\\nNo more typos!\\nEnjoy!\\n'.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "write", - "args": { - "path": "bad.txt", - "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "write", - "content": "", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "completed", - "thought": "The typos in the text from bad.txt have been identified and corrected. The content has been written back to the file." - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/prompt_010.log b/tests/integration/mock/PlannerAgent/test_edits/prompt_010.log deleted file mode 100644 index 88c58066f5..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/prompt_010.log +++ /dev/null @@ -1,251 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Fix typos in bad.txt. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Fix typos in bad.txt", - "state": "in_progress", - "subtasks": [ - { - "id": "0.0", - "goal": "Read the content of bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.1", - "goal": "Identify and correct typos in the text", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.2", - "goal": "Write the corrected content back to bad.txt", - "state": "completed", - "subtasks": [] - }, - { - "id": "0.3", - "goal": "Verify the corrections", - "state": "open", - "subtasks": [] - } - ] - } - ] -} - - -You're currently working on this task: -Fix typos in bad.txt. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "read", - "args": { - "path": "bad.txt", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "read", - "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } - }, - { - "source": "agent", - "action": "message", - "args": { - "content": "I have identified the typos in the text from bad.txt. I will now proceed to correct them. The original text is: 'This is a stupid typoo.\\nReally?\\nNo mor typos!\\nEnjoy!\\n'. The corrected version will be: 'This is a stupid typo.\\nReally?\\nNo more typos!\\nEnjoy!\\n'.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "write", - "args": { - "path": "bad.txt", - "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "write", - "content": "", - "extras": { - "path": "bad.txt" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "completed", - "thought": "The typos in the text from bad.txt have been identified and corrected. The content has been written back to the file." - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0.2", - "state": "completed", - "thought": "The corrected content has been successfully written back to bad.txt." - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_001.log b/tests/integration/mock/PlannerAgent/test_edits/response_001.log deleted file mode 100644 index ef95463074..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_001.log +++ /dev/null @@ -1,23 +0,0 @@ -```json -{ - "action": "add_task", - "args": { - "parent": "", - "goal": "Fix typos in bad.txt", - "subtasks": [ - { - "goal": "Read the content of bad.txt" - }, - { - "goal": "Identify and correct typos in the text" - }, - { - "goal": "Write the corrected content back to bad.txt" - }, - { - "goal": "Verify the corrections" - } - ] - } -} -``` diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_002.log b/tests/integration/mock/PlannerAgent/test_edits/response_002.log deleted file mode 100644 index eb081caba2..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_002.log +++ /dev/null @@ -1,9 +0,0 @@ -```json -{ - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "in_progress" - } -} -``` diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_003.log b/tests/integration/mock/PlannerAgent/test_edits/response_003.log deleted file mode 100644 index ac56b4ad06..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_003.log +++ /dev/null @@ -1,6 +0,0 @@ -{ - "action": "read", - "args": { - "path": "bad.txt" - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_004.log b/tests/integration/mock/PlannerAgent/test_edits/response_004.log deleted file mode 100644 index 63f5de5de5..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_004.log +++ /dev/null @@ -1,8 +0,0 @@ -{ - "action": "modify_task", - "args": { - "task_id": "0.0", - "state": "completed", - "thought": "The content of bad.txt has been successfully read and is now stored for further processing." - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_005.log b/tests/integration/mock/PlannerAgent/test_edits/response_005.log deleted file mode 100644 index 14185eb4aa..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_005.log +++ /dev/null @@ -1,8 +0,0 @@ -{ - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "in_progress", - "thought": "Starting to identify and correct typos in the text from bad.txt." - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_006.log b/tests/integration/mock/PlannerAgent/test_edits/response_006.log deleted file mode 100644 index 6ae61ee987..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_006.log +++ /dev/null @@ -1,7 +0,0 @@ -{ - "action": "message", - "args": { - "content": "I have identified the typos in the text from bad.txt. I will now proceed to correct them. The original text is: 'This is a stupid typoo.\\nReally?\\nNo mor typos!\\nEnjoy!\\n'. The corrected version will be: 'This is a stupid typo.\\nReally?\\nNo more typos!\\nEnjoy!\\n'.", - "wait_for_response": false - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_007.log b/tests/integration/mock/PlannerAgent/test_edits/response_007.log deleted file mode 100644 index 64308f91f8..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_007.log +++ /dev/null @@ -1,7 +0,0 @@ -{ - "action": "write", - "args": { - "path": "bad.txt", - "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n" - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_008.log b/tests/integration/mock/PlannerAgent/test_edits/response_008.log deleted file mode 100644 index 6613982387..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_008.log +++ /dev/null @@ -1,8 +0,0 @@ -{ - "action": "modify_task", - "args": { - "task_id": "0.1", - "state": "completed", - "thought": "The typos in the text from bad.txt have been identified and corrected. The content has been written back to the file." - } -} diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_009.log b/tests/integration/mock/PlannerAgent/test_edits/response_009.log deleted file mode 100644 index bccad9963b..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_009.log +++ /dev/null @@ -1,10 +0,0 @@ -```json -{ - "action": "modify_task", - "args": { - "task_id": "0.2", - "state": "completed", - "thought": "The corrected content has been successfully written back to bad.txt." - } -} -``` diff --git a/tests/integration/mock/PlannerAgent/test_edits/response_010.log b/tests/integration/mock/PlannerAgent/test_edits/response_010.log deleted file mode 100644 index 7b26d2b860..0000000000 --- a/tests/integration/mock/PlannerAgent/test_edits/response_010.log +++ /dev/null @@ -1,8 +0,0 @@ -{ - "action": "modify_task", - "args": { - "task_id": "0.3", - "state": "in_progress", - "thought": "I am now going to verify the corrections made in bad.txt to ensure that all typos have been correctly fixed." - } -} diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_001.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_001.log deleted file mode 100644 index e6f19f9135..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_001.log +++ /dev/null @@ -1,105 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "open", - "subtasks": [] -} - - -You're not currently working on any tasks. Your next action MUST be to mark a task as in_progress. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -Look at your last thought in the history above. What does it suggest? Don't think anymore--take action. diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log deleted file mode 100644 index 69885927c1..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_002.log +++ /dev/null @@ -1,122 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "open", - "subtasks": [ - { - "id": "0", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "state": "open", - "subtasks": [] - } - ] -} - - -You're not currently working on any tasks. Your next action MUST be to mark a task as in_progress. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "subtasks": [], - "thought": "" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log deleted file mode 100644 index 1bee8674fd..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_003.log +++ /dev/null @@ -1,133 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "state": "in_progress", - "subtasks": [] - } - ] -} - - -You're currently working on this task: -Write a shell script 'hello.sh' that prints 'hello'. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "subtasks": [], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0", - "state": "in_progress", - "thought": "" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the next action to take. diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log deleted file mode 100644 index 808b6f5004..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_004.log +++ /dev/null @@ -1,152 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "state": "in_progress", - "subtasks": [] - } - ] -} - - -You're currently working on this task: -Write a shell script 'hello.sh' that prints 'hello'. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "subtasks": [], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "write", - "args": { - "path": "hello.sh", - "content": "#!/bin/bash\necho 'hello'", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "write", - "content": "", - "extras": { - "path": "hello.sh" - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You just changed a file. You should think about how it affects your plan. diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log deleted file mode 100644 index 4d53aaa560..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/prompt_005.log +++ /dev/null @@ -1,171 +0,0 @@ - - ----------- - - -# Task -You're a diligent software engineer AI. You can't see, draw, or interact with a -browser, but you can read and write files, and you can run commands, and you can think. - -You've been given the following task: - -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -## Plan -As you complete this task, you're building a plan and keeping -track of your progress. Here's a JSON representation of your plan: - -{ - "id": "", - "goal": "", - "state": "in_progress", - "subtasks": [ - { - "id": "0", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "state": "in_progress", - "subtasks": [] - } - ] -} - - -You're currently working on this task: -Write a shell script 'hello.sh' that prints 'hello'. -If it's not achievable AND verifiable with a SINGLE action, you MUST break it down into subtasks NOW. - -You're responsible for managing this plan and the status of tasks in -it, by using the `add_task` and `modify_task` actions described below. - -If the History below contradicts the state of any of these tasks, you -MUST modify the task using the `modify_task` action described below. - -Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for -a task that's already represented. Every task must be represented only once. - -Tasks that are sequential MUST be siblings. They must be added in order -to their parent task. - -If you mark a task as 'completed', 'verified', or 'abandoned', -all non-abandoned subtasks will be marked the same way. -So before closing a task this way, you MUST not only be sure that it has -been completed successfully--you must ALSO be sure that all its subtasks -are ready to be marked the same way. - -If, and only if, ALL tasks have already been marked verified, -you MUST respond with the `finish` action. - -## History -Here is a recent history of actions you've taken in service of this plan, -as well as observations you've made. This only includes the MOST RECENT -ten actions--more happened before that. - -[ - { - "source": "user", - "action": "message", - "args": { - "content": "Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point.", - "wait_for_response": false - } - }, - { - "source": "agent", - "action": "add_task", - "args": { - "parent": "", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "subtasks": [], - "thought": "" - } - }, - { - "source": "agent", - "action": "modify_task", - "args": { - "task_id": "0", - "state": "in_progress", - "thought": "" - } - }, - { - "source": "agent", - "action": "write", - "args": { - "path": "hello.sh", - "content": "#!/bin/bash\necho 'hello'", - "start": 0, - "end": -1, - "thought": "" - } - }, - { - "source": "agent", - "observation": "write", - "content": "", - "extras": { - "path": "hello.sh" - } - }, - { - "source": "agent", - "action": "run", - "args": { - "command": "bash hello.sh", - "background": false, - "thought": "" - } - }, - { - "source": "agent", - "observation": "run", - "content": "hello", - "extras": { - "command_id": -1, - "command": "bash hello.sh", - "exit_code": 0 - } - } -] - - -Your most recent action is at the bottom of that history. - -## Action -What is your next thought or action? Your response must be in JSON format. - -It must be an object, and it must contain two fields: -* `action`, which is one of the actions below -* `args`, which is a map of key-value pairs, specifying the arguments for that action - -* `read` - reads the content of a file. Arguments: - * `path` - the path of the file to read -* `write` - writes the content to a file. Arguments: - * `path` - the path of the file to write - * `content` - the content to write to the file -* `run` - runs a command on the command line in a Linux shell. Arguments: - * `command` - the command to run - * `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true. -* `kill` - kills a background command - * `command_id` - the ID of the background command to kill -* `browse` - opens a web page. Arguments: - * `url` - the URL to open -* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments: - * `content` - the message to record - * `wait_for_response` - set to `true` to wait for the user to respond before proceeding -* `add_task` - add a task to your plan. Arguments: - * `parent` - the ID of the parent task (leave empty if it should go at the top level) - * `goal` - the goal of the task - * `subtasks` - a list of subtasks, each of which is a map with a `goal` key. -* `modify_task` - close a task. Arguments: - * `task_id` - the ID of the task to close - * `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now. -* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. - -You MUST take time to think in between read, write, run, kill, browse, and recall actions--do this with the `message` action. -You should never act twice in a row without thinking. But if your last several -actions are all `message` actions, you should consider taking a different action. - -What is your next thought or action? Again, you must reply with JSON, and only with JSON. - -You should think about the command you just ran, what output it gave, and how that affects your plan. diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log deleted file mode 100644 index e6618ccaf3..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_001.log +++ /dev/null @@ -1,8 +0,0 @@ -{ - "action": "add_task", - "args": { - "parent": "", - "goal": "Write a shell script 'hello.sh' that prints 'hello'", - "subtasks": [] - } -} diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log deleted file mode 100644 index a15bacbd59..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_002.log +++ /dev/null @@ -1,7 +0,0 @@ -{ - "action": "modify_task", - "args": { - "task_id": "0", - "state": "in_progress" - } -} diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log deleted file mode 100644 index f97464936b..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_003.log +++ /dev/null @@ -1,7 +0,0 @@ -{ - "action": "write", - "args": { - "path": "hello.sh", - "content": "#!/bin/bash\necho 'hello'" - } -} diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log deleted file mode 100644 index 7055344e27..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_004.log +++ /dev/null @@ -1,7 +0,0 @@ -{ - "action": "run", - "args": { - "command": "bash hello.sh", - "background": false - } -} diff --git a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log b/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log deleted file mode 100644 index 25eccafe13..0000000000 --- a/tests/integration/mock/PlannerAgent/test_write_simple_script/response_005.log +++ /dev/null @@ -1,10 +0,0 @@ -```json -{ - "action": "modify_task", - "args": { - "task_id": "0", - "state": "completed", - "thought": "The script 'hello.sh' was written and executed successfully, producing the expected output 'hello'." - } -} -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_001.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_001.log deleted file mode 100644 index 6bf1a28cc7..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_001.log +++ /dev/null @@ -1,122 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: user -action: message -args: - wait_for_response: False - -Output from Action: -observation: null -content: -extras: -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -None - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log deleted file mode 100644 index fc3748b1bb..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_002.log +++ /dev/null @@ -1,147 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: user -action: message -args: - wait_for_response: False - -Output from Action: -observation: null -content: -extras: - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to create a shell script named 'hello.sh' that prints 'hello' without asking for confirmation. I will need to ensure that the script is executable. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log deleted file mode 100644 index b2032a257d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_003.log +++ /dev/null @@ -1,167 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: user -action: message -args: - wait_for_response: False - -Output from Action: -observation: null -content: -extras: - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to create a shell script named 'hello.sh' that prints 'hello' without asking for confirmation. I will need to ensure that the script is executable. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log deleted file mode 100644 index bcc0482ea2..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_004.log +++ /dev/null @@ -1,187 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: user -action: message -args: - wait_for_response: False - -Output from Action: -observation: null -content: -extras: - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to create a shell script named 'hello.sh' that prints 'hello' without asking for confirmation. I will need to ensure that the script is executable. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 3: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log deleted file mode 100644 index 4d30dce03d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_005.log +++ /dev/null @@ -1,195 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to create a shell script named 'hello.sh' that prints 'hello' without asking for confirmation. I will need to ensure that the script is executable. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 3: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log deleted file mode 100644 index df9c5de4e7..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_006.log +++ /dev/null @@ -1,195 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 3: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_007.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_007.log deleted file mode 100644 index 4445cc31a7..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_007.log +++ /dev/null @@ -1,193 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 3: -Previous Action: -source: agent -action: run -args: - command: chmod +x hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: -extras: - command_id: -1 - command: chmod +x hello.sh - exit_code: 0 -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_008.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_008.log deleted file mode 100644 index d8fb41585f..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_008.log +++ /dev/null @@ -1,191 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 1: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 2: -Previous Action: -source: agent -action: run -args: - command: chmod +x hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: -extras: - command_id: -1 - command: chmod +x hello.sh - exit_code: 0 - -Memory 3: -Previous Action: -source: agent -action: run -args: - command: bash hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: hello -extras: - command_id: -1 - command: bash hello.sh - exit_code: 0 -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_009.log b/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_009.log deleted file mode 100644 index 4833bb34db..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/prompt_009.log +++ /dev/null @@ -1,189 +0,0 @@ -SYSTEM INFO: -You am an autonomous coding agent, here to provide solutions for coding issues. -You have been designed to assist you with a wide range of programming tasks, from code editing and debugging to testing and deployment. -You have access to a variety of tools and commands that you can use to help you solve problems efficiently. - -INSTRUCTIONS: -Now, you're going to solve this issue on your own. You can use any bash commands or custom commands you wish to complete your task. Edit all the files you need to and run any checks or tests that you want. -Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. -When you're satisfied with all of the changes you've made, you can indicate that you are done by running the exit command. -Note however that you cannot use any interactive session commands (e.g. python, vim, node) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python .py`. - -NOTE ABOUT THE write COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! - -IMPORTANT TIPS: -1. Reproduce the bug: Always start by trying to replicate the bug that the issue discusses. If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment and run it to ensure you can reproduce the bug. Then, start trying to fix it. When you think you've fixed the bug, re-run the bug reproduction script to make sure that the issue has indeed been resolved. - If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, so that you can be sure the script ran fine all the way through. -2. Try different commands: If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it. -3. Navigate large files: If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, you would use the 'read' command like this: 'read 583'. This is a much faster way to read through the file. -4. Handle input files: If the bug reproduction script requires inputting/reading a specific file, such as 'buggy-input.png', and you'd like to understand how to input that file, conduct a search in the existing repository code to see whether someone else has already done that. Do this by running the command: 'search_dir "buggy-input.png"'. If that doesn't work, use the Linux 'find' command. -5. Understand your context: Always make sure to look at the currently open file and the current working directory. The currently open file might be in a different directory than the working directory. -6. Verify your edits: When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. -7. Thoroughly test your solution: After making any changes to fix a bug, be sure to thoroughly test your solution to ensure the bug has been resolved. Re-run the bug reproduction script and verify that the issue has been addressed. - - -DOCUMENTATION: -It is recommend that you use the commands provided for interacting with files and your directory because they have been specially built for you. -They will make it much easier for you to look at files and make changes. Using these commands will help you be better at your task. -You can open an file by using either the read or write operations. -- If a file already exists you should read it before making any changes. Use the `edit` command to make changes once you have read it. -- If you are creating a new file use the write command. Use the `edit` command to make changes once you have created the new file. - -Commands: -exit - Executed when task is complete -read [] [] - Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read. -write [] [] - Modifies a by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no line args will replace the whole file. -browse - Returns the text version of any url, this can be useful to look up documentation or finding issues on github -scroll_up - Takes no arguments. This will scroll up and show you the 100 lines above your current lines -scroll_down - Takes no arguments. This will scroll down and show you the 100 lines below your current lines -edit - This will modify lines in the currently open file. use start_line and end_line to designate which lines to change and then write the multiline changes. Set end_line to -1 to denote the end of the file -goto - This will take you directly to a line and show you the 100 lines below it. - - You can use any bash command you need (cd, ls, rm, grep, dir, mv, wget, git, zip, etc.) with their arguments included -pip install - You can use this to import python packages. Make sure you include the correct package name when using this command. -ls - Use the ls command to view all the files in your current directory, this is a good starting point. -NOT ALLOWED - You cannot use interactive commands like python or node -Custom bash commands: -None - - -The following commands require an open file to be used: edit, scroll_up, scroll_down, goto -To modify the current file use 'edit'. To move through the current file use 'goto' or 'scroll_up'/'scroll_down' -when using write and edit do not surround the code with any "" just write the code.These are your past 4 actions: -===== Previous Actions ===== - -Memory 0: -Previous Action: -source: agent -action: write -args: - path: hello.sh - start: 0 - end: -1 - thought: Thoughts: - I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - - Action: - -Output from Action: -source: agent -observation: write -content: -extras: - path: hello.sh - -Memory 1: -Previous Action: -source: agent -action: run -args: - command: chmod +x hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: -extras: - command_id: -1 - command: chmod +x hello.sh - exit_code: 0 - -Memory 2: -Previous Action: -source: agent -action: run -args: - command: bash hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: hello -extras: - command_id: -1 - command: bash hello.sh - exit_code: 0 - -Memory 3: -Previous Action: -source: agent -action: run -args: - command: bash hello.sh - background: False - thought: - -Output from Action: -source: agent -observation: run -content: hello -extras: - command_id: -1 - command: bash hello.sh - exit_code: 0 -======= End Actions ======= -Use these memories to provide additional context to the problem you are solving. -Remember that you have already completed these steps so you do not need to perform them again.RESPONSE FORMAT: -This is the format of the response you will make in order to solve the current issue. -You will be given multiple iterations to complete this task so break it into steps and solve them one by one. - -Your output must contain the following: -- First, thoughts about what your next action should be and plan it out. - - You will have a memory of your thoughts so you can use this to remember things for the next step. - - Use your thoughts to think about what you are currently doing, what you have done on prior steps and how that relates to solving the problem. -- Second, create a piece of code that will execute your next action based on the thoughts you have. - - Remember that you can only have one action for each thought, do not include multiple actions. - -Your code MUST be surrounded in triple back ticks EXACTLY like this: -``` - -``` - -Notes: -- Adhere to the format so that the program loop continues smoothly, it is very important to only give one command per output. -- DO NOT give more than one command within the triple backticks. This will just throw an error and nothing will happen as a result. -- Do not give multiple code blocks, if you do only the second one will be captured and run, this might give an error if the first one was necessary. -- To execute multiple commands you should write them down in your thoughts section so you can remember it on the next step and execute them then. -- The only commands you are not capable of executing are interactive commands like `python` or `node` by themselves. -- If you think that you have completed the task that has been given to you based on your previous actions and outputs then use ``` exit ``` as the command to let the system know that you are done. -- DO NOT make any copies of your previous memories those will be provided to you at each step, making copies just wastes time and energy. Think smarter not harder. -- The write and edit commands requires proper indentation in the content section ex. `write hw.py def hello(): - print('Hello World')` this is how you would have to format your write command. - - The white spaces matter as the code changes will be added to the code so they must have proper syntax. - -This is a template using the format described above -Items in <> are suggestions for you, fill them out based on the context of the problem you are solving. - -[ FORMAT ] -Thoughts: - - - -Action: -``` - -``` -[ END FORMAT ] - -Do not provide anything extra just your thought and action. - -You are currently trying to complete this task: -Write a shell script 'hello.sh' that prints 'hello'. Do not ask me for confirmation at any point. - -CURRENT WORKSPACE: - Open File: hello.sh on line 0 - You can use these commands with the current file: - Navigation: `scroll_up`, `scroll_down`, and `goto ` - Modification: `edit ` - - -Keep all of the guidelines above in mind when you are thinking and making code. -Please come up with a thought and action based on your current task and latest steps. -Make sure that you do not repeat the same actions, there will not be any changes in result if you do not changes anything. -Be very strict about the formatting that you use and make sure you follow the guidelines. -NEVER output multiple commands. ONLY take ONE STEP at a time. -When you have completed your task run the "exit" command. -Begin with your thought about the next step and then come up with an action to perform your thought. diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log deleted file mode 100644 index 2924117174..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_001.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I need to create a shell script named 'hello.sh' that prints 'hello' without asking for confirmation. I will need to ensure that the script is executable. - -Action: -``` -write hello.sh "echo 'hello'" -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log deleted file mode 100644 index c3b065960d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_002.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - -Action: -``` -edit 0 -1 "echo 'hello'" -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log deleted file mode 100644 index c3b065960d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_003.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - -Action: -``` -edit 0 -1 "echo 'hello'" -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log deleted file mode 100644 index c3b065960d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_004.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - -Action: -``` -edit 0 -1 "echo 'hello'" -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log deleted file mode 100644 index c3b065960d..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_005.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I need to add the code to the 'hello.sh' script that will print 'hello' without asking for confirmation. I will ensure that the script is executable after making the necessary changes. - -Action: -``` -edit 0 -1 "echo 'hello'" -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log deleted file mode 100644 index 24af89f73c..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_006.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I have already written the initial version of the 'hello.sh' script that prints 'hello' without asking for confirmation. I need to ensure that the script is executable to run successfully. - -Action: -``` -chmod +x hello.sh -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_007.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_007.log deleted file mode 100644 index d86e7c959f..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_007.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I have already added the code to the 'hello.sh' script that prints 'hello' without asking for confirmation. The script is also made executable. The next step is to run the script to ensure it functions as expected. - -Action: -``` -bash hello.sh -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_008.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_008.log deleted file mode 100644 index defb21379b..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_008.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I have already created the 'hello.sh' script that prints 'hello' without asking for confirmation. The script is also made executable. The next step is to ensure that the script runs successfully without any issues. - -Action: -``` -bash hello.sh -``` diff --git a/tests/integration/mock/SWEAgent/test_write_simple_script/response_009.log b/tests/integration/mock/SWEAgent/test_write_simple_script/response_009.log deleted file mode 100644 index 6bade8b475..0000000000 --- a/tests/integration/mock/SWEAgent/test_write_simple_script/response_009.log +++ /dev/null @@ -1,7 +0,0 @@ -Thoughts: -I have already created the 'hello.sh' script that prints 'hello' without asking for confirmation. The script is executable, and I have run it successfully. The next step is to finalize the task and exit the current environment. - -Action: -``` -exit -``` diff --git a/tests/integration/regenerate.sh b/tests/integration/regenerate.sh index 86a9caf30f..d2cb17d36a 100755 --- a/tests/integration/regenerate.sh +++ b/tests/integration/regenerate.sh @@ -20,7 +20,7 @@ WORKSPACE_MOUNT_PATH_IN_SANDBOX="/workspace" SANDBOX_TYPE="${SANDBOX_TYPE:-ssh}" MAX_ITERATIONS=10 -agents=("MonologueAgent" "CodeActAgent" "PlannerAgent" "SWEAgent") +agents=("MonologueAgent" "CodeActAgent") tasks=( "Fix typos in bad.txt." "Write a shell script 'hello.sh' that prints 'hello'." diff --git a/tests/integration/test_agent.py b/tests/integration/test_agent.py index 515d3ca00c..ac898f6c12 100644 --- a/tests/integration/test_agent.py +++ b/tests/integration/test_agent.py @@ -38,10 +38,6 @@ def test_write_simple_script(): os.getenv('AGENT') == 'CodeActAgent' and os.getenv('SANDBOX_TYPE').lower() != 'ssh', reason='CodeActAgent only supports ssh sandbox which is stateful', ) -@pytest.mark.skipif( - os.getenv('AGENT') == 'SWEAgent', - reason='SWEAgent is not capable of this task right now', -) @pytest.mark.skipif( os.getenv('SANDBOX_TYPE') == 'local', reason='local sandbox shows environment-dependent absolute path for pwd command',