* add replace-based block edit & preliminary test case fix
* further fix the insert behavior
* make edit only work on first occurence
* bump codeact version since we now use new edit agentskills
* update prompt for new agentskills
* update integration tests
* make run_infer.sh executable
* remove code block for edit_file
* update integration test for prompt changes
* default to not use hint for eval
* fix insert emptyfile bug
* throw value error when `to_replace` is empty
* make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt)
* add todo
* update integration test
* fix sandbox test for this PR
* tmp
* tmp
* merge main
* feat: auto build image cache
* remove plugins
* use config file
* update mamba setup shell
* support agnostic sandbox image autobuild
* remove config
* Update .gitignore
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* Update opendevin/runtime/docker/ssh_box.py
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* update setup.sh
* readd sudo
* add sudo in dockerfile
* remove export
* move od-runtime dependencies to sandbox dockerfile
* factor out re-build logic into a separate util file
* tweak existing plugin to use OD specific sandbox
* update testcase
* attempt to fix unit test using image built in ghcr
* use cache tag
* try to fix unit tests
* add unittest
* add unittest
* add some unittests
* revert gh workflow changes
* feat: optimize sandbox image naming rule
* add pull latest image hint
* add opendevin python hint and use mamba to install gcc
* update docker image naming rule and fix mamba issue
* Update opendevin/runtime/docker/ssh_box.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* fix: opendevin user use correct pip
* fix lint issue
* fix custom sandbox base image
* rename test name
* add skipif
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: tobitege <tobitege@gmx.de>
* added tests related to backticks
* updated .gitignore
* added extra linter test for #2210
* hotfix for integration test
* added test_ipython unit test
* added test_ipython unit test
* remove draft test from test_ipython.py
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* added tests related to backticks
* updated .gitignore
* added extra linter test for #2210
* hotfix for integration test
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* update swe_bench prompt;
use minimal prompt for codeact;
* upgrade agentskills and update testcases
* update infer prompt
* fix cwd
* add icl for swebench
* also log in_context_example to run infer
* remove extra print
* change prompt to abs path
* update error message to include current file info
* change cwd for jupyter if needed
* update edit error message
* update prompt
* improve git get patch
* update hint string
* default to 50 turns
* revert changes from codeact agent and create new CodeActSWEAgent
* revert changes to codeact
* revert instructions for run infer
* revert instructions for run infer
* update README
* update max iter
* add codeact swe agent
* fix issue for CodeActSWEAgent
* allow specifying max iter in cmdline script
* stop printing
* Update agenthub/codeact_swe_agent/README.md
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
* Fix prompt regression in jupyter plugin
---------
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* support specifying custom cost per token
* fix test for new attrs
* add to docs
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* feat: add max_budget_per_task configuration to control task cost
* Fix test_arg_parser.py
* Use the config.max_budget_per_task as default value
* Add max_budget_per_task to core/main.py as well
* Update opendevin/controller/agent_controller.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* add draft for skills
* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file
* Remove new_sample.txt file
* add some work from opendevin w/ fixes
* Add unit tests for agentskills module
* fix some issues and updated tests
* add more tests for open
* tweak and handle goto_line
* add tests for some edge cases
* add tests for scrolling
* add tests for edit
* add tests for search_dir
* update tests to use pytest
* use pytest --forked to avoid file op unit tests to interfere with each other via global var
* update doc based on swe agent tool
* update and add tests for find_file and search_file
* move agent_skills to plugins
* add agentskills as plugin and docs
* add agentskill to ssh box and fix sandbox integration
* remove extra returns in doc
* add agentskills to initial tool for jupyter
* support re-init jupyter kernel (for agentskills) after restart
* fix print window's issue with indentation and add testcases
* add prompt for codeact with the newest edit primitives
* modify the way line number is presented (remove leading space)
* change prompt to the newest display format
* support tracking of costs via metrics
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* implement and add tests for py linting
* remove extra text arg for incompatible subprocess ver
* remove sample.txt
* update test_edits integration tests
* fix all integration
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update opendevin/runtime/plugins/agent_skills/agentskills.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* Revert "correctly setup plugins for swebench eval"
This reverts commit 2bd1055673.
* bump version
* remove _AGENT_SKILLS_DOCS
* move flake8 to test dep
* update poetry.lock
* remove extra arg
* reduce max iter for eval
* update poetry
* fix integration tests
---------
Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Refactor monologue to use the messages in state history
* add messages, clean up
* fix monologue
* update integration tests
* move private method
* update SWE agent to use the history from State
* integration tests for SWE agent
* rename monologue to initial_thoughts, since that is what it is
* Refactor monologue to use the messages in state history
remove now unused method
* is_stuck update
* fix is_stuck
* unit tests
* fix tests
* Revert "Refactor monologue to use the messages in state history"
This reverts commit 76b4b765ef.
* Override eq for CmdOutputObservation to ignore the pid, compare the actual command only
* Revert "Override eq for CmdOutputObservation to ignore the pid, compare the actual command only"
This reverts commit 6418d856b5.
* Feat: add stream output to exec_run
* Using command timeout to control the exec_box's timeout.
* add bash -c to source command to compatible for sh.
Signed-off-by: ifuryst <ifuryst@gmail.com>
* Feat: add stream output to SSHBox execute
Signed-off-by: ifuryst <ifuryst@gmail.com>
* fix the test case fail.
Signed-off-by: ifuryst <ifuryst@gmail.com>
* fix the test case import wrong path for method.
Signed-off-by: ifuryst <ifuryst@gmail.com>
---------
Signed-off-by: ifuryst <ifuryst@gmail.com>
* add draft dockerfile for build all
* add rsync for build
* add all-in-one docker
* update prepare scripts
* Update swe_env_box.py
* Add swe_entry.sh (buggy now)
* Parse the test command in swe_entry.sh
* Update README for instance eval in sandbox
* revert specialized config
* replace run_as_devin as an init arg
* set container & run_as_root via args
* update swe entry script
* update env
* remove mounting
* allow error after swe_entry
* update swe_env_box
* move file
* update gitignore
* get swe_env_box a working demo
* support faking user response & provide sandox ahead of time;
also return state for controller
* tweak main to support adding controller kwargs
* add module
* initialize plugin for provided sandbox
* add pip cache to plugin & fix jupyter kernel waiting
* better print Observation output
* add run infer scripts
* update readme
* add utility for getting diff patch
* use get_diff_patch in infer
* update readme
* support cost tracking for codeact
* add swe agent edit hack
* disable color in git diff
* fix git diff cmd
* fix state return
* support limit eval
* increase t imeout and export pip cache
* add eval limit config
* return state when hit turn limit
* save log to file; allow agent to give up
* run eval with max 50 turns
* add outputs to gitignore
* save swe_instance & instruction
* add uuid to swebench
* add streamlit dep
* fix save series
* fix the issue where session id might be duplicated
* allow setting temperature for llm (use 0 for eval)
* Get report from agent running log
* support evaluating task success right after inference.
* remove extra log
* comment out prompt for baseline
* add visualizer for eval
* use plaintext for instruction
* reduce timeout for all; only increase timeout for init
* reduce timeout for all; only increase timeout for init
* ignore sid for swe env
* close sandbox in each eval loop
* update visualizer instruction
* increase max chars
* add finish action to history too
* show test result in metrics
* add sidebars for visualizer
* also visualize swe_instance
* cleanup browser when agent controller finish runinng
* do not mount workspace for swe-eval to avoid accidentally overwrite files
* Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"
This reverts commit 8ef7739054.
* Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files""
This reverts commit 016cfbb9f0.
* run jupyter command via copy to, instead of cp to mount
* only print mixin output when failed
* change ssh box logging
* add visualizer for pass rate
* add instance id to sandbox name
* only remove container we created
* use opendevin logger in main
* support multi-processing infer
* add back metadata, support keyboard interrupt
* remove container with startswith
* make pbar behave correctly
* update instruction w/ multi-processing
* show resolved rate by repo
* rename tmp dir name
* attempt to fix racing for copy to ssh_box
* fix script
* bump swe-bench-all version
* fix ipython with self-contained commands
* add jupyter demo to swe_env_box
* make resolved count two column
* increase height
* do not add glob to url params
* analyze obs length
* print instance id prior to removal handler
* add gold patch in visualizer
* fix interactive git by adding a git --no-pager as alias
* increase max_char to 10k to cover 98% of swe-bench obs cases
* allow parsing note
* prompt v2
* add iteration reminder
* adjust user response
* adjust order
* fix return eval
* fix typo
* add reminder before logging
* remove other resolve rate
* re adjust to new folder structure
* support adding eval note
* fix eval note path
* make sure first log of each instance is printed
* add eval note
* fix the display for visualizer
* tweak visualizer for better git patch reading
* exclude empty patch
* add retry mechanism for swe_env_box start
* fix ssh timeout issue
* add stat field for apply test patch success
* add visualization for fine-grained report
* attempt to support monologue agent by constraining it to single thread
* also log error msg when stopeed
* save error as well
* override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp
* add retry mechanism for sshbox
* remove retry for swe env box
* try to handle loop state stopped
* Add get report scripts
* Add script to convert agent output to swe-bench format
* Merge fine grained report for visualizer
* Update eval readme
* Update README.md
* Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite
* Update the script to get model report
* Update get_model_report.sh
* Update get_agent_report.sh
* Update report merge script
* Add agent output conversion script
* Update swe_lite_env_setup.sh
* Add example swe-bench output files
* Update eval readme
* Remove redundant scripts
* set iteration count down to false by default
* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)
* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm
* Review Feedback
* Missing None Check
* Review feedback and improved error handling
---------
Co-authored-by: Robert Brennan <accounts@rbren.io>
* fix prepare_swe_util scripts
* update builder images
* update setup script
* remove swe-bench build workflow
* update lock
* remove experiments since they are moved to hf
* remove visualizer (since it is moved to hf repo)
* simply jupyter execution via heredoc
* update ssh_box
* add initial docker readme
* add pkg-config as dependency
* add script for swe_bench all-in-one docker
* add rsync to builder
* rename var
* update commit
* update readme
* update lock
* support specify timeout for long running tasks
* fix path
* separate building of all deps and files
* support returning states at the end of controller
* remove return None
* support specify timeout for long running tasks
* add timeout for all existing sandbox impl
* fix swe_env_box for new codebase
* update llm config in config.py
* support pass sandbox in
* remove force set
* update eval script
* fix issue of overriding final state
* change default eval output to hf demo
* change default eval output to hf demo
* fix config
* only close it when it is NOT external sandbox
* add scripts
* tweak config
* only put in hostory when state has history attr
* fix agent controller on the case of run out interaction budget
* always assume state is always not none
* remove print of final state
* catch all exception when cannot compute completion cost
* Update README.md
* save source into json
* fix path
* update docker path
* return the final state on close
* merge AgentState with State
* fix integration test
* merge AgentState with State
* fix integration test
* add ChangeAgentStateAction to history in attempt to fix integration
* add back set agent state
* update tests
* update tests
* move scripts for setup
* update script and readme for infer
* do not reset logger when n processes == 1
* update eval_infer scripts and readme
* simplify readme
* copy over dir after eval
* copy over dir after eval
* directly return get state
* update lock
* fix output saving of infer
* replace print with logger
* update eval_infer script
* add back the missing .close
* increase timeout
* copy all swe_bench_format file
* attempt to fix output parsing
* log git commit id as metadata
* fix eval script
* update lock
* update unit tests
* fix argparser unit test
* fix lock
* the deps are now lightweight enough to be incude in make build
* add spaces for tests
* add eval outputs to gitignore
* remove git submodule
* readme
* tweak git email
* update upload instruction
* bump codeact version for eval
---------
Co-authored-by: Bowen Li <libowen.ne@gmail.com>
Co-authored-by: huybery <huybery@gmail.com>
Co-authored-by: Bart Shappee <bshappee@gmail.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
* enable browsing in codeact, and arbitrary browsergym DSL support
* fix
* fix unit test case
* update frontend for the new interactive browsing action
* bump ver
* Fix integration tests
---------
Co-authored-by: OpenDevinBot <bot@opendevin.com>
* fix plugin use in jupyter
* fix jupyter plugin potential port conflict
* update integration test
* wait a bit for jupyter execution
* add one unit tests for sandbox
* fix integration test
* fix integration
* fix integration yet again
* init sandbox plugins in the server