* [eval] increase timeout for swebench eval init/complete
* allow CmdRunAction to optionally block when .timeout is setted
* fix unit test for serialization
* fix unit tests for security analyzer
* fix integration tests
* add more timeout
* Add Handling of Cache Prompt When Formatting Messages
* Fix Value for Cache Control
* Fix Value for Cache Control
* Update openhands/core/message.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* Fix lint error
* Serialize Messages if Propt Caching Is Enabled
* Remove formatting message change
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
* CodeActAgent: fix message prep if prompt caching is not supported
* fix python version in regen tests workflow
* fix in conftest "mock_completion" method
* add disable_vision to LLMConfig; revert change in message parsing in llm.py
* format messages in several files for completion
* refactored message(s) formatting (llm.py); added vision_is_active()
* fix a unit test
* regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags
* try to fix path to logs folder in workflow
* llm: prevent index error
* try FORCE_USE_LLM in regenerate
* tweaks everywhere...
* fix 2 random unit test errors :(
* added FORCE_REGENERATE_TESTS=true to regenerate CLI
* fix test_lint_file_fail_typescript again
* double-quotes for env vars in workflow; llm logger set to debug
* fix typo in regenerate
* regenerate iterations now 20; applied iteration counter fix by Li
* regenerate: pass FORCE_REGENERATE flag into env
* fixes for int tests. several mock files updated.
* browsing_agent: fix response_parser.py adding ) to empty response
* test_browse_internet: fix skipif and revert obsolete mock files
* regenerate: fi bracketing for http server start/kill conditions
* disable test_browse_internet for CodeAct*Agents; mock files updated after merge
* missed to include more mock files earlier
* reverts after review feedback from Li
* forgot one
* browsing agent test, partial fixes and updated mock files
* test_browse_internet works in my WSL now!
* adapt unit test test_prompt_caching.py
* add DEBUG to regenerate workflow command
* convert regenerate workflow params to inputs
* more integration test mock files updated
* more files
* test_prompt_caching: restored test_prompt_caching_headers purpose
* file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly
* reverts/changes wrt feedback from xingyao
* updated docs and config template
* code cleanup wrt review feedback
* improve file editing prompts and unit test
converted most raise calls to a _output_error call in file_ops.py
* tweaks in test_agent_skill.py wrt to SEP separator
* tweaked the separator
* remove server runtime remnants and TEST_RUNTIME references
* restore use of TEST_RUNTIME args and variables
* fix integration tests
* added hint to properly escape docstrings
* revert latest prompt change
---------
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* feat: refactor building logic into runtime builder
* return image name
* fix testcases
* use runtime builder for eventstream runtime
* have runtime builder return str
* add api_key to sandbox config
* draft remote runtime
* remove extra if clause
* initialize runtime based on box class
* add build logic
* use base64 for file upload
* get runtime image prefix from API
* replace ___ with _s_ to make it a valid image name
* use /build to start build and /build_status to check the build progress
* update logging
* fix exit code
* always use port
* add remote runtime
* rename runtime
* fix tests import
* make dir first if work_dir does not exists;
* update debug print to remote runtime
* fix exit close_sync
* update logging
* add retry for stop
* use all box class for test keep prompt
* fix test browsing
* add retry stop
* merge init commands to save startup time
* fix await
* remove sandbox url
* support execute through specific runtime url
* fix file ops
* simplify close
* factor out runtime retry code
* fix exception handling
* fix content type error (e.g., bad gateway when runtime is not ready)
* add retry for wait until alive;
add retry for check image exists
* Revert "add retry for wait until alive;"
This reverts commit dd013cd268.
* retry when wait until alive
* clean up msg
* directly save sdist to temp dir for _put_source_code_to_dir
* support running testcases in parallel
* tweak logging;
try to close session
* try to close session even on exception
* update poetry lock
* support remote to run integration tests
* add warning for workspace base on remote runtime
* set default runtime api
* remove server runtime
* update poetry lock
* support running swe-bench (n=1) eval on remoteruntime
* add a timeout of 30 min
* add todo for docker namespace
* update poetry loc
* fix conftest.py option (#3573)
* try to fix fixture base_container_image in runtime conftest
* fix integration test mock files due to #3548
* fix test_ipython.py integration test
* try to fix pip unavailable
* update test case for pip
* force rebuild in CI
* remove extra symlink
* fix newline
* added semi-colon to line 31
* Dockerfile.j2: activate env at the end
* Revert "Dockerfile.j2: activate env at the end"
This reverts commit cf2f565102.
* cleanup Dockerfile
* switch default python image
* remove image agnostic (no longer used)
* fix tests
* simplify integration tests default image
* add nodejs specific runtime tests
* update tests and workflows
* switch to nikolaik/python-nodejs:python3.11-nodejs22
* update build sh to output image name correctly
* increase custom images to test
* fix test
* fix test
* fix double quote
* try fixing ci
* update ghcr workflow
* fix artifact name
* try to fix ghcr again
* fix workflow
* save built image to correct dir
* remove extra -docker-image
* make last tag to be human readable image tag
* fix hyphen to underscore
* run test runtime on all tags
* revert app build
* separate ghcr workflow
* update dockerfile for eval
* fix tag for test run
* try fix tag
* try fix tag via matrix output
* try workflow again
* update comments
* try fixing test matrix
* fix artifact name
* try fix tag again
* Revert "try fix tag again"
This reverts commit b369badd8c.
* tweak filename
* try different path
* fix filepath
* try fix tag artifact path again
* save json instead of line
* update matrix
* print all tags in workflow
* support only streaming diff logs from the runtime client
* remove strip from log line to fix indentation
* get py interpreter for jupyter
* rstrip to remove newline on the rightside for logging
* fix blocking issue for stream logs
* set python interpreter path in bash ps1
* update testcase for jupyter py interpreter path
* remove accidentally added changes
* remove accidentally added changes
* only print dockerfile when debug
* add docs
* remove extra tests that weren't supposed to be in this pr
* add back missing test
* revert
* make LogBuffer synchronous to fix hang in integration tests
* fix integration tests
* Update opendevin/runtime/client/client.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* fix test case
* fix integration tests
* change deque to list
* update integration tests
* rename test runtime
* fix docs
* rename opendevin to openhands in tests
---------
Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* try to fix pip unavailable
* update test case for pip
* force rebuild in CI
* remove extra symlink
* fix newline
* added semi-colon to line 31
* Dockerfile.j2: activate env at the end
* Revert "Dockerfile.j2: activate env at the end"
This reverts commit cf2f565102.
* cleanup Dockerfile
* switch default python image
* remove image agnostic (no longer used)
* fix tests
* simplify integration tests default image
* add nodejs specific runtime tests
* update tests and workflows
* switch to nikolaik/python-nodejs:python3.11-nodejs22
* update build sh to output image name correctly
* increase custom images to test
* fix test
* fix test
* fix double quote
* try fixing ci
* update ghcr workflow
* fix artifact name
* try to fix ghcr again
* fix workflow
* save built image to correct dir
* remove extra -docker-image
* make last tag to be human readable image tag
* fix hyphen to underscore
* run test runtime on all tags
* revert app build
* separate ghcr workflow
* update dockerfile for eval
* fix tag for test run
* try fix tag
* try fix tag via matrix output
* try workflow again
* update comments
* try fixing test matrix
* fix artifact name
* try fix tag again
* Revert "try fix tag again"
This reverts commit b369badd8c.
* tweak filename
* try different path
* fix filepath
* try fix tag artifact path again
* save json instead of line
* update matrix
* print all tags in workflow
* fix DOCKER_IMAGE to avoid ghcr.io/opendevin/ghcr.io/opendevin/od_runtime
* fix test matrix to only load unique test image tags
* try fix matrix again!!!!!
* add all runtime tests passed
---------
Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
* remove unused plugin mixin
* change the entire jupyter PWD with bash;
print jupyter pwd in obs as well;
* remove unused field
* remove unused comments
* change the entire jupyter PWD with bash;
print jupyter pwd in obs as well;
* fix runtime tests for jupyter
* update intgeration tests
* fix test again
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
* try to fix pip unavailable
* update test case for pip
* force rebuild in CI
* remove extra symlink
* fix newline
* added semi-colon to line 31
* Dockerfile.j2: activate env at the end
* Revert "Dockerfile.j2: activate env at the end"
This reverts commit cf2f565102.
* cleanup Dockerfile
* switch default python image
* remove image agnostic (no longer used)
* fix tests
* switch to nikolaik/python-nodejs:python3.11-nodejs22
* fix test
* fix test
* revert docker
* update template
---------
Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Graham Neubig <neubig@gmail.com>
* clean up sandbox and ssh related stuff
* remove ssh hostname
* remove ssh hostname
* remove ssh password
* update config
* fix typo that breaks the test
* switch default to eventstream runtime
* remove pull docker from makefile
* fix unittest
* fix file store path
* try deprecate server runtime
* remove persist sandbox
* move file utils
* remove server runtime related workflow
* remove unused method
* attempt to remove the reliance on filestore for BE
* fix async for list file
* fix list_files to post
* fix list files
* add suffix to directory
* make sure list file returns abs path;
make sure other backend endpoints accpets abs path
* remove server runtime test workflow
* set git config in runtime
* support the ability to specify whether to keep prompt
* fix action serialization
* fix jupyter pwd with ~ as user name
* add test for keep_prompt;
add missing await close for some tests
* update integration tests for eventstream runtime
* fix integration tests for server runtime
* Remove global config from memory
* Remove runtime global config
* Remove from storage
* Remove global config
* Fix event stream tests
* Fix sandbox issue
* Change config
* Removed transferred tests
* Add swe env box
* Fixes on testing
* Fixed some tests
* Merge with stashed changes
* Fix typing
* Fix ipython test
* Revive function
* Make temp_dir fixture
* Remove test to avoid circular import
* fix eventstream filestore for test_runtime
* fix parse arg issue that cause integration test to fail
* support swebench pull from custom namespace
* add back simple tests for runtime
* move multi-line bash tests to test_runtime;
support multi-line bash for esruntime;
* add testcase to handle PS2 prompt
* use bashlex for bash parsing to handle multi-line commands;
add testcases for multi-line commands
* revert ghcr runtime change
* Apply stash
* fix run as other user;
make test async;
* fix test runtime for run as od
* add run-as-devin to all the runtime tests
* handle the case when username is root
* move all run-as-devin tests from sandbox;
only tests a few cases on different user to save time;
* move over multi-line echo related tests to test_runtime
* fix user-specific jupyter by fixing the pypoetry virtualenv folder
* make plugin's init async;
chdir at initialization of jupyter plugin;
move ipy simple testcase to test runtime;
* support agentskills import in
move tests for jupyter pwd tests;
overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter;
make agentskills read env var lazily, in case env var is updated;
* fix ServerRuntime agentskills issue
* move agnostic image test to test_runtime
* merge runtime tests in CI
* fix enable auto lint as env var
* update warning message
* update warning message
* test for different container images
* change parsing output as debug
* add exception handling for update_pwd_decorator
* fix unit test indentation
* add plugins as default input to Runtime class;
remove init_sandbox_plugins;
implement add_env_var (include jupyter) in the base class;
* fix server runtime auto lint
* Revert "add exception handling for update_pwd_decorator"
This reverts commit 2b668b1506.
* tries to print debugging info for agentskills
* explictly setting uid (try fix permission issue)
* Revert "tries to print debugging info for agentskills"
This reverts commit 8be4c86756.
* set sandbox user id during testing to hopefully fix the permission issue
* add browser tools for server runtime
* try to debug for old pwd
* update debug cmd
* only test agnostic runtime when TEST_RUNTIME is Server
* fix temp dir mkdir
* load TEST_RUNTIME at the beginning
* remove ipython tests
* only log to file when DEBUG
* default logging to project root
* temporarily remove log to file
* fix LLM logger dir
* fix logger
* make set pwd an optional aux action
* fix prev pwd
* fix infinity recursion
* simplify
* do not import the whole od library to avoid logger folder by jupyter
* fix browsing
* increase timeout
* attempt to fix agentskills yet again
* clean up in testcases, since CI maybe run as non-root
* add _cause attribute for event.id
* remove parent
* add a bunch of debugging statement again for CI :(
* fix temp_dir fixture
* change all temp dir to follow pytest's tmp_path_factory
* remove extra bracket
* clean up error printing a bit
* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization
* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization
* add typing for tmp dir fixture
* clear the directory before running the test to avoid weird CI temp dir
* remove agnostic test case for server runtime
* Revert "remove agnostic test case for server runtime"
This reverts commit 30e2181c3f.
* disable agnostic tests in CI
* fix test
* make sure plugin arg is not passed when no plugin is specified;
remove redundant on_event function;
* move mock prompt
* rename runtime
* remove extra logging
* refactor run_controller's interface;
support multiple runtime for integration test;
filter out hostname for prompt
* uncomment other tests
* pass the right runtime to controller
* log runtime when start
* uncomment tests
* improve symbol filters
* add intergration test prompts that seemd ok
* add integration test workflow
* add python3 to default ubuntu image
* symlink python and fix permission to jupyter pip
* add retry for jupyter execute server
* fix jupyter pip install;
add post-process for jupyter pip install;
simplify init by add agent_skills path to PYTHONPATH;
add testcase to tests jupyter pip install;
* fix bug
* use ubuntu:22.04 for eventstream integration tests
* add todo
* update testcase
* remove redundant code
* fix unit test
* reduce dependency for runtime
* try making llama-index an optional dependency that's not installed by default
* remove pip install since it seemd not needed
* log ipython execution;
await write message since it returns a future
* update ipy testcase
* do not install llama-index in CI
* do not install llama-index in the app docker as well
* set sandbox container image in the integration test script
* log plugins & env var for runtime
* update conftest for sha256
* add git
* remove all non-alphanumeric chalracters
* add working ipy module tests!
* default to use host network
* remove is_async from browser to make thing a little more reliable;
retry loading browser when error;
* add sleep to wait a bit for http server
* kill http server before regenerate browsing tests
* fix browsing
* only set sandbox container image if undefined
* skip empty config value
* update evaluation to use the latest run_controller
* revert logger in execute_server to be compatible with server runtime
* revert logging level to fix jupyter
* set logger level
* revert the logging
* chmod for workspace to fix permission
* support getting timeout from action
* update test for server runtime
* try to fix file permission
* fix test_cmd_run_action_serialization_deserialization test (added timeout)
* poetry: pip 24.2, torch 2.2.2
* revert adding pip to pyproject.toml
* add build to dependencies in pyproject.toml
* forgot poetry lock --no-update
* fix a DelegatorAgent prompt_002.log (timeout)
* fix a DelegatorAgent prompt_003.log (timeout)
* couple more timeout attribs in prompt files
* some more prompt files
* prompts galore
* add clarification comment for timeout
* default timeout to config
* add assert
* update integraton tests for eventstream
* update integration tests
* fix timeout for action<->dict
* remove redundant on_event
* fix action execution timeout
* updatelock
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <tobitege@gmx.de>
* make sure codeact agent produce message in u/a/u/a order
* integration tests
* sync message changes to codeact swe
* fix integration tests
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* Updated documentation using ruff's autofix feature
* Updated pyproject.toml to include docstring validations
* Updated documentation using ruff's autofix feature
* Updated pyproject.toml to include docstring validations
* Updated docstrings using ruff's autfix feature
* Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
* add replace-based block edit & preliminary test case fix
* further fix the insert behavior
* make edit only work on first occurence
* bump codeact version since we now use new edit agentskills
* update prompt for new agentskills
* update integration tests
* make run_infer.sh executable
* remove code block for edit_file
* update integration test for prompt changes
* default to not use hint for eval
* fix insert emptyfile bug
* throw value error when `to_replace` is empty
* make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt)
* add todo
* update integration test
* fix sandbox test for this PR
* fix inserting with additional newline
* rename to edit_file_by_replace
* add back `edit_file_by_line`
* update prompt for new editing tool
* fix integration tests
* bump codeact version since there are more changes
* add back append file
* fix current line for append
* fix append unit tests
* change the location where we show edited line no to agent and fix tests
* update integration tests
* fix global window size affect by open_file bug
* fix global window size affect by open_file bug
* increase window size to 300
* add file beginning and ending marker to avoid looping
* expand the editor window to better display edit error for model
* refractor to breakdown edit to internal functions
* reduce window to 200
* move window to 100
* refractor to cleanup some logic into _calculate_window_bounds
* fix integration tests
* fix sandbox test on new prompt
* update demonstration with new changes
* fix integration
* initialize llm inside process_instance to circumvent "AttributeError: Can't pickle local object"
* update kwargs
* retry for internal server error
* fix max iteration
* override max iter from config
* fix integration tests
* remove edit file by line
* fix integration tests
* add instruction to avoid hanging
* Revert "add instruction to avoid hanging"
This reverts commit 06fd2c5938.
* handle content policy violation error
* fix integration tests
* fix typo in prompt - the window is 100
* update all integration tests
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent).
Partially solves #2075 (web GUI improvement is not the goal of this PR)