OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-09 14:57:59 -05:00

Author	SHA1	Message	Date
Xingyao Wang	a5195b0e65	chore: clean up sandbox and ssh related configs (#3301 ) * clean up sandbox and ssh related stuff * remove ssh hostname * remove ssh hostname * remove ssh password * update config * fix typo that breaks the test	2024-08-08 22:15:40 +00:00
Xingyao Wang	90d0a62469	(arch) Switch default runtime to EventStream Runtime (#3271 ) * switch default to eventstream runtime * remove pull docker from makefile * fix unittest * fix file store path * try deprecate server runtime * remove persist sandbox * move file utils * remove server runtime related workflow * remove unused method * attempt to remove the reliance on filestore for BE * fix async for list file * fix list_files to post * fix list files * add suffix to directory * make sure list file returns abs path; make sure other backend endpoints accpets abs path * remove server runtime test workflow * set git config in runtime	2024-08-08 10:11:49 +08:00
Xingyao Wang	4f0a454ed6	[Arch] Support integration tests using EventStream Runtime (#3184 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit `2b668b1506`. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit `8be4c86756`. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit `30e2181c3f`. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * fix action execution timeout * updatelock --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-08-01 22:07:39 +00:00
Engel Nyst	21ea9953b3	don't use realpath with non-existent files (#3200 )	2024-08-01 01:11:22 +02:00
Graham Neubig	3a21198424	Remove monologue agent (#3036 ) * Remove monologue agent * Fixes	2024-07-19 19:25:05 +00:00
tobitege	5a5713009f	INT: prevent error on repeat integration tests after failed test(s) (#2935 ) * Integration tests: prevent File not found error * forgot to remove debug calls in regenerate.sh	2024-07-18 06:29:15 +02:00
Boxuan Li	ebbc0e6803	Integration testing: unset irrelevant env variables (#2902 )	2024-07-12 22:12:37 +08:00
மனோஜ்குமார் பழனிச்சாமி	1d4f422638	Doc: Mention FORCE_REGENERATE var (#2833 ) * Mention FORCE_REGENERATE var in doc * Update tests/integration/README.md --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-07-11 04:01:15 +00:00
Boxuan Li	c68478f470	Customize LLM config per agent (#2756 ) Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent). Partially solves #2075 (web GUI improvement is not the goal of this PR)	2024-07-09 22:05:54 -07:00
மனோஜ்குமார் பழனிச்சாமி	c6aa50779d	Update regenerate.sh (#2832 )	2024-07-07 23:52:03 +02:00
Xingyao Wang	a47713ecb0	[Arch] Remove supports for Background Commands (#2803 ) * depracting docker exec box * remove doc exec from workflow and docs * remove background commands * Update tests/unit/test_sandbox.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * replace for-loop with assignment * fix integration tests * fix integration tests for shell script * fix integration tests * increase max iter to fix some monologue agent issue * fix integration test again * fix integration tests (seems related to run_user issue) --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-06 03:38:05 +08:00
மனோஜ்குமார் பழனிச்சாமி	143f38d25a	Refactored sandbox config and added fast boot (#2455 ) * Refactored sandbox config and added fastboot * added tests * fixed tests * fixed tests * intimate user about breaking change * remove default config from eval * check for lowercase env * add test * Revert Migration * migrate old sandbox configs * resolve merge conflict * revert migration 2 * Revert "remove default config from eval" This reverts commit `de57c588db`. * change type to box_type * fix var name * linted * lint * lint comments * fix tests * fix tests * fix typo * fix box_type, remove fast_boot * add tests for sandbox config * fix test * update eval docs * small removal comments * adapt toml template * old fields shouldn't be in the app dataclass * fix old keys in app config * clean up exec box --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-05 03:30:21 +00:00
tobitege	823298e0d0	fix: Agentskills enhancements (#2384 ) * avoid repeat logging of unneeded messages * refactored append/edit_file (tests next) * agentskills and unit test fixes * testing * more changes and test prompts * smaller changes * final test fixes * remove dead code from test_agent.py * reverting unneeded changes * updated tests, more tweaks to skills * refactor (#2442) * chores: fix DelegatorAgent description (#2446) * change * change comments * fix * stopped container to prevent port issues. (#2447) * chore: remove useless browsing code in CodeActSWEAgent (#2438) * remove useless * fix integration test * Regenerate test_ipython_module artifacts for CodeActSWEAgent --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Merge remote-tracking branch 'upstream/main' into agent-fileops * unneeded tweak * * fix edit_file to not introduce extra newline * updated docstrings with more details for LLM * fix legacy typo in prompts causing ]] instead of ] * several mock files regenerated * Regen'ed CodeActSWEAgent integration tests * fix _print_window signature; explicit exception type in _is_valid_path * splitlines with named param --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-16 15:06:46 -04:00
Boxuan Li	dd1095cf6b	regenerate.sh: Exit upon common known errors (#2385 ) * Exit regenerate.sh upon common known errors * More fixes * Remove mention of transient issue * Use tmp file instead of tty * Remove redundant cleanup	2024-06-13 23:42:58 -07:00
Yufan Song	90ec0095df	Add integration test for CodeActSWEAgent (#2377 ) * add test log * remove browsing internet * add test by GPT-4o * fix prompts * change test_agent * fix test * fix nits	2024-06-12 02:46:15 +08:00
tobitege	9605106e72	feat: append_file incl. all tests [agentskills] (#2346 ) * new skill: append_file incl. all tests * more tests needed caring * file_name for append_file/edit_file; updated tests	2024-06-10 17:18:40 +00:00
Boxuan Li	a9a2f10170	Revamp AgentRejectAction and allow ManagerAgent to handle rejection (#1735 ) * Fix AgentRejectAction handling * Add ManagerAgent to integration tests * Fix regenerate.sh * Fix merge * Update README for micro-agents * Add test reject to regenerate.sh * regenerate.sh: Add support for running a specific test and/or agent * Refine reject schema, and allow ManagerAgent to handle reject * Add test artifacts for test_simple_task_rejection * Fix manager agent tests * Fix README * test_simple_task_rejection: check final agent state * Integration test: exit if mock prompt not found * Update test_simple_task_rejection tests * Fix test_edits test artifacts after prompt update * Fix ManagerAgent test_edits * WIP * Fix tests * update test_edits for ManagerAgent * Skip local sandbox for reject test * Fix test comparison	2024-06-08 23:12:30 -07:00
Boxuan Li	9b371b1b5f	Refactor agent delegation and tweak micro agents (#1910 ) This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents. For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent. Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent. This PR is a blocker for #1735 and likely #1945.	2024-05-28 20:01:16 -07:00
Boxuan Li	78241d9d43	Add tests for browser agent (#2031 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-05-24 09:59:40 +00:00
Boxuan Li	acb430eef5	Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888 ) * Add MacOS to integration tests * Switch back to python 3.11 * Install Docker for macos pipeline * regenerate.sh: Use environmental variable for sandbox type * Pack different agents' tests into a single check * Fix CodeAct tests * Reduce file match and extensive debug logs * Add TEST_IN_CI mode that reports codecov * Small fix: don't quit if reusing old responses failed * Merge codecov results * Fix typos * Remove coverage merge step - codecov automatically does that * Make mac integration tests as optional - too slow * Fix codecov args * Add comments in yaml * Include sandbox type in codecov report name * Fix codecov report merge * Revert renaming of test_matrix_success * Remove SWEAgent and PlannerAgent from tests * Mark planner agent and SWE agent as deprecated * CodeCov: Ignore planner and sweagent * Revert "Remove SWEAgent and PlannerAgent from tests" This reverts commit `040cb3bfb9`. * Remove all tests for SWE Agent * Only keep basic tests for MonologueAgent and PlannerAgent * Mark SWE Agent as deprecated, and ignore code coverage for it --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-22 20:38:57 -07:00
Robert Brennan	0ecba83e53	Move message history out of CodeAct (#1847 ) * stop keeping history state in codeact * regenerate tests * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * revert tests * regen tests * refactor codeact a bit * regenerate without using LLM * simplify logic * change to heredoc * fix heredoc * fix end_of_edit docs * regen tests * regenerate --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-18 18:39:27 +00:00
மனோஜ்குமார் பழனிச்சாமி	b0b44ed467	Auto restarted Jupyter kernel (#1808 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-18 08:40:31 +05:30
Boxuan Li	b6ff201780	Refactor integration test framework and relieve the pain of regeneration (#1818 ) * Update README.md * Fix WORKSPACE_MOUNT_PATH_IN_SANDBOX variable in regenerate.sh * Regenerate prompts without calling real LLM * Disable pytest warning capture * Change planner agent prompt by a bit for demo * Regenerate prompt files following prompt changes * doc: elaborate on FORCE_USE_LLM * Add another prompt change to monologue_agent for demo purpose * Regenerate prompts with FORCE_USE_LLM=true --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-16 08:30:29 -07:00
Boxuan Li	6714000b2c	CodeActAgent: Fix iteration reminder (#1803 ) This PR includes three changes: 1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100 2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2` 3) Remove legacy ITERATION_REMINDER config	2024-05-15 13:48:47 +08:00
Robert Brennan	82a798990c	refactor remind_iterations (#1760 ) * refactor remind_iterations * regenerate tests * concatenate iteration message * fix merge issues * update integration tests	2024-05-14 08:27:12 -04:00
Boxuan Li	3d53d363b4	Integration test: Verify finish state & add auto-rerun in regenerate.sh (#1773 ) * regenerate.sh: Allow testing on a specific agent and/or test * Check agent finish state * rengerate.sh: Rerun after fixing the prompts * Fix SWEAgent test_write_simple_script * Add more help message * Add a known issue to README.md * regenerate.sh: Fix help message typo * Fix a typo in README	2024-05-14 03:50:29 -04:00
Robert Brennan	e28b3ef9e8	Fix integration tests (#1764 ) * refactor remind_iterations * regenerate tests * concatenate iteration message * add some helpers to the tests * regenerate tests * add to logs * regenerate tests * add debug info * fix exit_on_message * fix regen script * regenerate tests * Revert "Merge branch 'rb/test-regen' of ssh://github.com/opendevin/opendevin into rb/test-regen" This reverts commit `b9cd1acbf2`, reversing changes made to `c888285304`. * remove prints * revert files * revert more * revert more * regenerate for the last time I hope * add back remind_iter * regenerate * add back remind_iter * regenerate * fix remind_iter * regenerate yet again * regen * remove comment * regen again	2024-05-13 18:08:59 -04:00
Graham Neubig	b13d4647ab	Print out the regenerate command (#1759 ) * Print out the output of the regenerate command * Update regenerate.sh	2024-05-13 18:43:58 +00:00
Boxuan Li	eba5ef8e67	Fix test_ipython (#1750 )	2024-05-12 16:15:32 -07:00
Xingyao Wang	4db4a84e2e	Simply Jupyter execution via heredoc (#1728 ) * simply jupyter execution via heredoc * make sure /tmp always exists * add integration test for jupyter exec	2024-05-13 04:57:06 +08:00
Xingyao Wang	8bfae8413e	Support passing sandbox as argument and iteration reminder (#1730 ) * support custom sandbox; add iteration_reminder * Enable iteration reminder in CodeActAgent integration test * Don't remove numbers when comparing prompts * Update tests/integration/README.md --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-12 07:57:33 +00:00
Boxuan Li	bde12f4a09	CodeActAgent: Fix hack for multiple edits in same command (#1684 ) * Fix edit hack for multiple edits in same command This PR changes ([\s\S]) to ([\s\S]?) to make the capturing group non-greedy. This change ensures that the regex captures the smallest set of characters that extends up to the first end_of_edit it encounters, rather than extending across multiple edit commands. Without the fix, a bash command consisting of multiple edits would be corrupt and lead to unexpected edit results.	2024-05-10 23:32:09 -07:00
Boxuan Li	a60a6a40d6	Only regenerate integratio tests for failed ones (#1661 )	2024-05-09 09:32:00 -04:00
Robert Brennan	242c4a0df6	Remove extra message actions (#1608 ) * remove extra actions * remove message observations * support null obs * handle null obs * fix frontend for changes * fix the way messages flow to the UI * change think to message * add regen script * regenerate all integration tests * change task * remove gh test * fix messages * fix tests * help agent exit after hitting max iter * Update opendevin/events/observation/success.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-07 21:13:08 +00:00

34 Commits