Commit Graph

34 Commits

Author SHA1 Message Date
Xingyao Wang
a5195b0e65 chore: clean up sandbox and ssh related configs (#3301)
* clean up sandbox and ssh related stuff

* remove ssh hostname

* remove ssh hostname

* remove ssh password

* update config

* fix typo that breaks the test
2024-08-08 22:15:40 +00:00
Xingyao Wang
90d0a62469 (arch) Switch default runtime to EventStream Runtime (#3271)
* switch default to eventstream runtime

* remove pull docker from makefile

* fix unittest

* fix file store path

* try deprecate server runtime

* remove persist sandbox

* move file utils

* remove server runtime related workflow

* remove unused method

* attempt to remove the reliance on filestore for BE

* fix async for list file

* fix list_files to post

* fix list files

* add suffix to directory

* make sure list file returns abs path;
make sure other backend endpoints accpets abs path

* remove server runtime test workflow

* set git config in runtime
2024-08-08 10:11:49 +08:00
Xingyao Wang
4f0a454ed6 [Arch] Support integration tests using EventStream Runtime (#3184)
* Remove global config from memory

* Remove runtime global config

* Remove from storage

* Remove global config

* Fix event stream tests

* Fix sandbox issue

* Change config

* Removed transferred tests

* Add swe env box

* Fixes on testing

* Fixed some tests

* Merge with stashed changes

* Fix typing

* Fix ipython test

* Revive function

* Make temp_dir fixture

* Remove test to avoid circular import

* fix eventstream filestore for test_runtime

* fix parse arg issue that cause integration test to fail

* support swebench pull from custom namespace

* add back simple tests for runtime

* move multi-line bash tests to test_runtime;
support multi-line bash for esruntime;

* add testcase to handle PS2 prompt

* use bashlex for bash parsing to handle multi-line commands;
add testcases for multi-line commands

* revert ghcr runtime change

* Apply stash

* fix run as other user;
make test async;

* fix test runtime for run as od

* add run-as-devin to all the runtime tests

* handle the case when username is root

* move all run-as-devin tests from sandbox;
only tests a few cases on different user to save time;

* move over multi-line echo related tests to test_runtime

* fix user-specific jupyter by fixing the pypoetry virtualenv folder

* make plugin's init async;
chdir at initialization of jupyter plugin;
move ipy simple testcase to test runtime;

* support agentskills import in
move tests for jupyter pwd tests;
overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter;
make agentskills read env var lazily, in case env var is updated;

* fix ServerRuntime agentskills issue

* move agnostic image test to test_runtime

* merge runtime tests in CI

* fix enable auto lint as env var

* update warning message

* update warning message

* test for different container images

* change parsing output as debug

* add exception handling for update_pwd_decorator

* fix unit test indentation

* add plugins as default input to Runtime class;
remove init_sandbox_plugins;
implement add_env_var (include jupyter) in the base class;

* fix server runtime auto lint

* Revert "add exception handling for update_pwd_decorator"

This reverts commit 2b668b1506.

* tries to print debugging info for agentskills

* explictly setting uid (try fix permission issue)

* Revert "tries to print debugging info for agentskills"

This reverts commit 8be4c86756.

* set sandbox user id during testing to hopefully fix the permission issue

* add browser tools for server runtime

* try to debug for old pwd

* update debug cmd

* only test agnostic runtime when TEST_RUNTIME is Server

* fix temp dir mkdir

* load TEST_RUNTIME at the beginning

* remove ipython tests

* only log to file when DEBUG

* default logging to project root

* temporarily remove log to file

* fix LLM logger dir

* fix logger

* make set pwd an optional aux action

* fix prev pwd

* fix infinity recursion

* simplify

* do not import the whole od library to avoid logger folder by jupyter

* fix browsing

* increase timeout

* attempt to fix agentskills yet again

* clean up in testcases, since CI maybe run as non-root

* add _cause attribute for event.id

* remove parent

* add a bunch of debugging statement again for CI :(

* fix temp_dir fixture

* change all temp dir to follow pytest's tmp_path_factory

* remove extra bracket

* clean up error printing a bit

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* add typing for tmp dir fixture

* clear the directory before running the test to avoid weird CI temp dir

* remove agnostic test case for server runtime

* Revert "remove agnostic test case for server runtime"

This reverts commit 30e2181c3f.

* disable agnostic tests in CI

* fix test

* make sure plugin arg is not passed when no plugin is specified;
remove redundant on_event function;

* move mock prompt

* rename runtime

* remove extra logging

* refactor run_controller's interface;
support multiple runtime for integration test;
filter out hostname for prompt

* uncomment other tests

* pass the right runtime to controller

* log runtime when start

* uncomment tests

* improve symbol filters

* add intergration test prompts that seemd ok

* add integration test workflow

* add python3 to default ubuntu image

* symlink python and fix permission to jupyter pip

* add retry for jupyter execute server

* fix jupyter pip install;
add post-process for jupyter pip install;
simplify init by add agent_skills path to PYTHONPATH;
add testcase to tests jupyter pip install;

* fix bug

* use ubuntu:22.04 for eventstream integration tests

* add todo

* update testcase

* remove redundant code

* fix unit test

* reduce dependency for runtime

* try making llama-index an optional dependency that's not installed by default

* remove pip install since it seemd not needed

* log ipython execution;
await write message since it returns a future

* update ipy testcase

* do not install llama-index in CI

* do not install llama-index in the app docker as well

* set sandbox container image in the integration test script

* log plugins & env var for runtime

* update conftest for sha256

* add git

* remove all non-alphanumeric chalracters

* add working ipy module tests!

* default to use host network

* remove is_async from browser to make thing a little more reliable;
retry loading browser when error;

* add sleep to wait a bit for http server

* kill http server before regenerate browsing tests

* fix browsing

* only set sandbox container image if undefined

* skip empty config value

* update evaluation to use the latest run_controller

* revert logger in execute_server to be compatible with server runtime

* revert logging level to fix jupyter

* set logger level

* revert the logging

* chmod for workspace to fix permission

* support getting timeout from action

* update test for server runtime

* try to fix file permission

* fix test_cmd_run_action_serialization_deserialization test (added timeout)

* poetry: pip 24.2, torch 2.2.2

* revert adding pip to pyproject.toml

* add build to dependencies in pyproject.toml

* forgot poetry lock --no-update

* fix a DelegatorAgent prompt_002.log (timeout)

* fix a DelegatorAgent prompt_003.log (timeout)

* couple more timeout attribs in prompt files

* some more prompt files

* prompts galore

* add clarification comment for timeout

* default timeout to config

* add assert

* update integraton tests for eventstream

* update integration tests

* fix timeout for action<->dict

* remove redundant on_event

* fix action execution timeout

* updatelock

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <tobitege@gmx.de>
2024-08-01 22:07:39 +00:00
Engel Nyst
21ea9953b3 don't use realpath with non-existent files (#3200) 2024-08-01 01:11:22 +02:00
Graham Neubig
3a21198424 Remove monologue agent (#3036)
* Remove monologue agent

* Fixes
2024-07-19 19:25:05 +00:00
tobitege
5a5713009f INT: prevent error on repeat integration tests after failed test(s) (#2935)
* Integration tests: prevent File not found error

* forgot to remove debug calls in regenerate.sh
2024-07-18 06:29:15 +02:00
Boxuan Li
ebbc0e6803 Integration testing: unset irrelevant env variables (#2902) 2024-07-12 22:12:37 +08:00
மனோஜ்குமார் பழனிச்சாமி
1d4f422638 Doc: Mention FORCE_REGENERATE var (#2833)
* Mention FORCE_REGENERATE var in doc

* Update tests/integration/README.md

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-07-11 04:01:15 +00:00
Boxuan Li
c68478f470 Customize LLM config per agent (#2756)
Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent).

Partially solves #2075 (web GUI improvement is not the goal of this PR)
2024-07-09 22:05:54 -07:00
மனோஜ்குமார் பழனிச்சாமி
c6aa50779d Update regenerate.sh (#2832) 2024-07-07 23:52:03 +02:00
Xingyao Wang
a47713ecb0 [Arch] Remove supports for Background Commands (#2803)
* depracting docker exec box

* remove doc exec from workflow and docs

* remove background commands

* Update tests/unit/test_sandbox.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* replace for-loop with assignment

* fix integration tests

* fix integration tests for shell script

* fix integration tests

* increase max iter to fix some monologue agent issue

* fix integration test again

* fix integration tests (seems related to run_user issue)

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-07-06 03:38:05 +08:00
மனோஜ்குமார் பழனிச்சாமி
143f38d25a Refactored sandbox config and added fast boot (#2455)
* Refactored sandbox config and added fastboot

* added tests

* fixed tests

* fixed tests

* intimate user about breaking change

* remove default config from eval

* check for lowercase env

* add test

* Revert Migration

* migrate old sandbox configs

* resolve merge conflict

* revert migration 2

* Revert "remove default config from eval"

This reverts commit de57c588db.

* change type to box_type

* fix var name

* linted

* lint

* lint comments

* fix tests

* fix tests

* fix typo

* fix box_type, remove fast_boot

* add tests for sandbox config

* fix test

* update eval docs

* small removal comments

* adapt toml template

* old fields shouldn't be in the app dataclass

* fix old keys in app config

* clean up exec box

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-07-05 03:30:21 +00:00
tobitege
823298e0d0 fix: Agentskills enhancements (#2384)
* avoid repeat logging of unneeded messages

* refactored append/edit_file (tests next)

* agentskills and unit test fixes

* testing

* more changes and test prompts

* smaller changes

* final test fixes

* remove dead code from test_agent.py

* reverting unneeded changes

* updated tests, more tweaks to skills

* refactor (#2442)

* chores: fix DelegatorAgent description (#2446)

* change

* change comments

* fix

* stopped container to prevent port issues. (#2447)

* chore: remove useless browsing code in CodeActSWEAgent (#2438)

* remove useless

* fix integration test

* Regenerate test_ipython_module artifacts for CodeActSWEAgent

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Merge remote-tracking branch 'upstream/main' into agent-fileops

* unneeded tweak

* * fix edit_file to not introduce extra newline
* updated docstrings with more details for LLM
* fix legacy typo in prompts causing ]] instead of ]
* several mock files regenerated

* Regen'ed CodeActSWEAgent integration tests

* fix _print_window signature; explicit exception type in _is_valid_path

* splitlines with named param

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-16 15:06:46 -04:00
Boxuan Li
dd1095cf6b regenerate.sh: Exit upon common known errors (#2385)
* Exit regenerate.sh upon common known errors

* More fixes

* Remove mention of transient issue

* Use tmp file instead of tty

* Remove redundant cleanup
2024-06-13 23:42:58 -07:00
Yufan Song
90ec0095df Add integration test for CodeActSWEAgent (#2377)
* add test log

* remove browsing internet

* add test by GPT-4o

* fix prompts

* change test_agent

* fix test

* fix nits
2024-06-12 02:46:15 +08:00
tobitege
9605106e72 feat: append_file incl. all tests [agentskills] (#2346)
* new skill: append_file incl. all tests

* more tests needed caring

* file_name for append_file/edit_file; updated tests
2024-06-10 17:18:40 +00:00
Boxuan Li
a9a2f10170 Revamp AgentRejectAction and allow ManagerAgent to handle rejection (#1735)
* Fix AgentRejectAction handling

* Add ManagerAgent to integration tests

* Fix regenerate.sh

* Fix merge

* Update README for micro-agents

* Add test reject to regenerate.sh

* regenerate.sh: Add support for running a specific test and/or agent

* Refine reject schema, and allow ManagerAgent to handle reject

* Add test artifacts for test_simple_task_rejection

* Fix manager agent tests

* Fix README

* test_simple_task_rejection: check final agent state

* Integration test: exit if mock prompt not found

* Update test_simple_task_rejection tests

* Fix test_edits test artifacts after prompt update

* Fix ManagerAgent test_edits

* WIP

* Fix tests

* update test_edits for ManagerAgent

* Skip local sandbox for reject test

* Fix test comparison
2024-06-08 23:12:30 -07:00
Boxuan Li
9b371b1b5f Refactor agent delegation and tweak micro agents (#1910)
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.
2024-05-28 20:01:16 -07:00
Boxuan Li
78241d9d43 Add tests for browser agent (#2031)
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 09:59:40 +00:00
Boxuan Li
acb430eef5 Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3bfb9.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 20:38:57 -07:00
Robert Brennan
0ecba83e53 Move message history out of CodeAct (#1847)
* stop keeping history state in codeact

* regenerate tests

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* revert tests

* regen tests

* refactor codeact a bit

* regenerate without using LLM

* simplify logic

* change to heredoc

* fix heredoc

* fix end_of_edit docs

* regen tests

* regenerate

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 18:39:27 +00:00
மனோஜ்குமார் பழனிச்சாமி
b0b44ed467 Auto restarted Jupyter kernel (#1808)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 08:40:31 +05:30
Boxuan Li
b6ff201780 Refactor integration test framework and relieve the pain of regeneration (#1818)
* Update README.md

* Fix WORKSPACE_MOUNT_PATH_IN_SANDBOX variable in regenerate.sh

* Regenerate prompts without calling real LLM

* Disable pytest warning capture

* Change planner agent prompt by a bit for demo

* Regenerate prompt files following prompt changes

* doc: elaborate on FORCE_USE_LLM

* Add another prompt change to monologue_agent for demo purpose

* Regenerate prompts with FORCE_USE_LLM=true

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-16 08:30:29 -07:00
Boxuan Li
6714000b2c CodeActAgent: Fix iteration reminder (#1803)
This PR includes three changes:
1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100
2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2`
3) Remove legacy ITERATION_REMINDER config
2024-05-15 13:48:47 +08:00
Robert Brennan
82a798990c refactor remind_iterations (#1760)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* fix merge issues

* update integration tests
2024-05-14 08:27:12 -04:00
Boxuan Li
3d53d363b4 Integration test: Verify finish state & add auto-rerun in regenerate.sh (#1773)
* regenerate.sh: Allow testing on a specific agent and/or test

* Check agent finish state

* rengerate.sh: Rerun after fixing the prompts

* Fix SWEAgent test_write_simple_script

* Add more help message

* Add a known issue to README.md

* regenerate.sh: Fix help message typo

* Fix a typo in README
2024-05-14 03:50:29 -04:00
Robert Brennan
e28b3ef9e8 Fix integration tests (#1764)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* add some helpers to the tests

* regenerate tests

* add to logs

* regenerate tests

* add debug info

* fix exit_on_message

* fix regen script

* regenerate tests

* Revert "Merge branch 'rb/test-regen' of ssh://github.com/opendevin/opendevin into rb/test-regen"

This reverts commit b9cd1acbf2, reversing
changes made to c888285304.

* remove prints

* revert files

* revert more

* revert more

* regenerate for the last time I hope

* add back remind_iter

* regenerate

* add back remind_iter

* regenerate

* fix remind_iter

* regenerate yet again

* regen

* remove comment

* regen again
2024-05-13 18:08:59 -04:00
Graham Neubig
b13d4647ab Print out the regenerate command (#1759)
* Print out the output of the regenerate command

* Update regenerate.sh
2024-05-13 18:43:58 +00:00
Boxuan Li
eba5ef8e67 Fix test_ipython (#1750) 2024-05-12 16:15:32 -07:00
Xingyao Wang
4db4a84e2e Simply Jupyter execution via heredoc (#1728)
* simply jupyter execution via heredoc

* make sure /tmp always exists

* add integration test for jupyter exec
2024-05-13 04:57:06 +08:00
Xingyao Wang
8bfae8413e Support passing sandbox as argument and iteration reminder (#1730)
* support custom sandbox;
add iteration_reminder

* Enable iteration reminder in CodeActAgent integration test

* Don't remove numbers when comparing prompts

* Update tests/integration/README.md

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-12 07:57:33 +00:00
Boxuan Li
bde12f4a09 CodeActAgent: Fix hack for multiple edits in same command (#1684)
* Fix edit hack for multiple edits in same command

This PR changes ([\s\S]*) to ([\s\S]*?) to make the capturing
group non-greedy. This change ensures that the regex captures
the smallest set of characters that extends up to the first
end_of_edit it encounters, rather than extending across multiple
edit commands.

Without the fix, a bash command consisting of multiple edits
would be corrupt and lead to unexpected edit results.
2024-05-10 23:32:09 -07:00
Boxuan Li
a60a6a40d6 Only regenerate integratio tests for failed ones (#1661) 2024-05-09 09:32:00 -04:00
Robert Brennan
242c4a0df6 Remove extra message actions (#1608)
* remove extra actions

* remove message observations

* support null obs

* handle null obs

* fix frontend for changes

* fix the way messages flow to the UI

* change think to message

* add regen script

* regenerate all integration tests

* change task

* remove gh test

* fix messages

* fix tests

* help agent exit after hitting max iter

* Update opendevin/events/observation/success.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-07 21:13:08 +00:00