OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-04-29 03:00:45 -04:00

Author	SHA1	Message	Date
மனோஜ்குமார் பழனிச்சாமி	143f38d25a	Refactored sandbox config and added fast boot (#2455 ) * Refactored sandbox config and added fastboot * added tests * fixed tests * fixed tests * intimate user about breaking change * remove default config from eval * check for lowercase env * add test * Revert Migration * migrate old sandbox configs * resolve merge conflict * revert migration 2 * Revert "remove default config from eval" This reverts commit `de57c588db`. * change type to box_type * fix var name * linted * lint * lint comments * fix tests * fix tests * fix typo * fix box_type, remove fast_boot * add tests for sandbox config * fix test * update eval docs * small removal comments * adapt toml template * old fields shouldn't be in the app dataclass * fix old keys in app config * clean up exec box --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-05 03:30:21 +00:00
Xingyao Wang	0d3b3ffbf8	[Arch] Removing docker exec box (#2802 ) * depracting docker exec box * remove doc exec from workflow and docs	2024-07-04 23:15:25 +00:00
Engel Nyst	0b8d357bef	Add event synchronously (#2700 ) * add to event stream sync * remove async from tests	2024-07-05 00:15:51 +02:00
Xingyao Wang	41ddba84bd	[Agent] (Potentially) improve Editing using `diff` (#2685 ) * add replace-based block edit & preliminary test case fix * further fix the insert behavior * make edit only work on first occurence * bump codeact version since we now use new edit agentskills * update prompt for new agentskills * update integration tests * make run_infer.sh executable * remove code block for edit_file * update integration test for prompt changes * default to not use hint for eval * fix insert emptyfile bug * throw value error when `to_replace` is empty * make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt) * add todo * update integration test * fix sandbox test for this PR	2024-07-02 11:50:15 +09:00
Boxuan Li	e45b311c35	Remove MAX_CHARS traffic control (#2694 ) * Remove MAX_CHARS limiting * More cleanup	2024-06-29 12:59:41 -07:00
Boxuan Li	5835680292	Add test for auto_lint after file edit (#2655 )	2024-06-27 17:13:45 +09:00
Boxuan Li	01fa52d062	Enforce linter in tests folder (#2557 )	2024-06-20 21:50:34 -07:00
மனோஜ்குமார் பழனிச்சாமி	41564c2eac	Use :main instead of :latest (#2539 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-21 03:57:50 +00:00
Shimada666	26fc3c886a	Make plugins sandbox-agnostic (#2101 ) * tmp * tmp * merge main * feat: auto build image cache * remove plugins * use config file * update mamba setup shell * support agnostic sandbox image autobuild * remove config * Update .gitignore Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * Update opendevin/runtime/docker/ssh_box.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * update setup.sh * readd sudo * add sudo in dockerfile * remove export * move od-runtime dependencies to sandbox dockerfile * factor out re-build logic into a separate util file * tweak existing plugin to use OD specific sandbox * update testcase * attempt to fix unit test using image built in ghcr * use cache tag * try to fix unit tests * add unittest * add unittest * add some unittests * revert gh workflow changes * feat: optimize sandbox image naming rule * add pull latest image hint * add opendevin python hint and use mamba to install gcc * update docker image naming rule and fix mamba issue * Update opendevin/runtime/docker/ssh_box.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix: opendevin user use correct pip * fix lint issue * fix custom sandbox base image * rename test name * add skipif --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-06-19 19:58:07 -07:00
Engel Nyst	80fe13f4be	rename our completion as a drop-in replacement of litellm completion (#2509 )	2024-06-19 05:25:25 +02:00
Engel Nyst	b2307db010	Document, rename Agent* exceptions to LLM* (#2508 ) * rename "Agent" exceptions to LLM, document LLMResponseError	2024-06-18 22:30:22 +00:00
tobitege	d2509a19c8	fix: logger with more masking of sensitive data (#2470 ) * fix: more logger sensitive masking * fix: test_config.py updated for more sensitive patterns * added one more...	2024-06-16 17:32:26 -04:00
tobitege	823298e0d0	fix: Agentskills enhancements (#2384 ) * avoid repeat logging of unneeded messages * refactored append/edit_file (tests next) * agentskills and unit test fixes * testing * more changes and test prompts * smaller changes * final test fixes * remove dead code from test_agent.py * reverting unneeded changes * updated tests, more tweaks to skills * refactor (#2442) * chores: fix DelegatorAgent description (#2446) * change * change comments * fix * stopped container to prevent port issues. (#2447) * chore: remove useless browsing code in CodeActSWEAgent (#2438) * remove useless * fix integration test * Regenerate test_ipython_module artifacts for CodeActSWEAgent --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Merge remote-tracking branch 'upstream/main' into agent-fileops * unneeded tweak * * fix edit_file to not introduce extra newline * updated docstrings with more details for LLM * fix legacy typo in prompts causing ]] instead of ] * several mock files regenerated * Regen'ed CodeActSWEAgent integration tests * fix _print_window signature; explicit exception type in _is_valid_path * splitlines with named param --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-16 15:06:46 -04:00
Engel Nyst	bb4ea1e6cb	Adjust is-stuck check for the same steps to 3 until it's stopped (#2437 )	2024-06-14 19:20:12 +05:30
Engel Nyst	1cc70be616	workspace_mount_path sentinel: an undefined string (#2431 )	2024-06-14 10:39:33 +05:30
tobitege	9605106e72	feat: append_file incl. all tests [agentskills] (#2346 ) * new skill: append_file incl. all tests * more tests needed caring * file_name for append_file/edit_file; updated tests	2024-06-10 17:18:40 +00:00
Engel Nyst	fab8c9003b	remove deprecated github-token config (#2334 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-06-09 09:50:24 +02:00
tobitege	a97d0767e9	fix: Backticks get always escaped by runtime; add Ipython test (#2321 ) * added tests related to backticks * updated .gitignore * added extra linter test for #2210 * hotfix for integration test * added test_ipython unit test * added test_ipython unit test * remove draft test from test_ipython.py --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-06-08 21:02:27 +00:00
Xingyao Wang	903381f16e	Add back jupyter PWD env var for agentskills (#2327 ) * add back jupyter pwd env var for agentskills * add unit test for pwd change in execute_cli	2024-06-08 08:51:42 +00:00
tobitege	b431fce938	tests: more Agentskills tests; updated .gitignore (#2307 ) * added tests related to backticks * updated .gitignore * added extra linter test for #2210 * hotfix for integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-06-07 16:29:03 +00:00
tobitege	1fa09e0414	fix: test_sandbox tests didn't close dockers (#2274 ) * fix test_sandbox tests to close dockers * removed try/finally --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-06-06 03:45:45 +00:00
Frank Xu	48151bdbb0	[feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170 ) * add webarena, and revamp messaging for webarena eval * add changes for browsergym * update infer script * fix unit tests * update * add multiple run for miniwob * update instruction, remove personal path * update * add code for getting final reward, fix integration, add results * add avg cost calculation	2024-06-06 09:01:20 +08:00
tobitege	44bbe5e208	Fix agentskills tests (#2242 ) * Fix agentskills tests * Improved test_agent_skill --------- Co-authored-by: Leo <ifuryst@gmail.com>	2024-06-04 21:33:32 +00:00
tobitege	0082640ac8	fix test_config to prevent leaks (#2245 )	2024-06-04 21:32:46 +02:00
Graham Neubig	7a2122ebc2	Default to gpt-4o (#2158 ) * Default to gpt-4o * Fix default	2024-05-31 14:44:07 +00:00
Xingyao Wang	01ef90205d	Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105 ) * update swe_bench prompt; use minimal prompt for codeact; * upgrade agentskills and update testcases * update infer prompt * fix cwd * add icl for swebench * also log in_context_example to run infer * remove extra print * change prompt to abs path * update error message to include current file info * change cwd for jupyter if needed * update edit error message * update prompt * improve git get patch * update hint string * default to 50 turns * revert changes from codeact agent and create new CodeActSWEAgent * revert changes to codeact * revert instructions for run infer * revert instructions for run infer * update README * update max iter * add codeact swe agent * fix issue for CodeActSWEAgent * allow specifying max iter in cmdline script * stop printing * Update agenthub/codeact_swe_agent/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Fix prompt regression in jupyter plugin --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-29 21:19:00 -07:00
Engel Nyst	55fdee31ad	Remove unnecessary stuff from the sandboxes tests (#2095 )	2024-05-27 20:50:02 +05:30
Xingyao Wang	ae8cda1495	Support specifying custom cost per token (#2083 ) * support specifying custom cost per token * fix test for new attrs * add to docs --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-27 19:35:34 +08:00
Aleksandar	18d07bda89	feat: add max_budget_per_task configuration to control task cost (#2070 ) * feat: add max_budget_per_task configuration to control task cost * Fix test_arg_parser.py * Use the config.max_budget_per_task as default value * Add max_budget_per_task to core/main.py as well * Update opendevin/controller/agent_controller.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-27 02:04:31 +08:00
Engel Nyst	783fea62a0	Ignore pid for loop detection (Was: override eq...) (#2045 ) * rewrite, implement pid ignore in the controller * make the helper method private	2024-05-26 19:27:12 +02:00
Shimada666	b31f7701eb	Integrate Multimodal tools to `agentskills`. (#2016 ) * suport reading multimodal files * move file * update dependency * remove useless pip install * add comments * update the comment * Apply suggestions from code review * Add unit test for TXTReader * pre-commit hook corrupted utf16 test txt * Revert unnecessary dependency upgrades * feat: import some readers for agentskill * add dependencies * Integrate some multimodal tools * add shell pip dependency * update dependencies * update dependencies * update print window * remove __main__ * locally import cv2 * add c library for opencv * update lock file * update prompt * remove unuseful file * add some unittest * add unittest & remove excel-related parser * rollback poetry lock * remove markdown * remove requests * optimize parse_video output * Fix integration tests for CodeActAgent * remove test_parse_image unittest * Add a TODO to containers/sandbox/Dockerfile * update dependencies * remove pyproject.toml useless package * change document via openai key * Fix prompts after removing some actions --------- Co-authored-by: Mingchen Zhuge <mczhuge@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Mingchen Zhuge <64179323+mczhuge@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-25 18:58:49 +08:00
Robert Brennan	9ca2007201	fix json encoding (#2018 ) * fix json encoding * add test * add another test * fix integration tests --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-23 23:36:15 +00:00
Xingyao Wang	602ffcdffb	Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941 ) * add draft for skills * Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file * Remove new_sample.txt file * add some work from opendevin w/ fixes * Add unit tests for agentskills module * fix some issues and updated tests * add more tests for open * tweak and handle goto_line * add tests for some edge cases * add tests for scrolling * add tests for edit * add tests for search_dir * update tests to use pytest * use pytest --forked to avoid file op unit tests to interfere with each other via global var * update doc based on swe agent tool * update and add tests for find_file and search_file * move agent_skills to plugins * add agentskills as plugin and docs * add agentskill to ssh box and fix sandbox integration * remove extra returns in doc * add agentskills to initial tool for jupyter * support re-init jupyter kernel (for agentskills) after restart * fix print window's issue with indentation and add testcases * add prompt for codeact with the newest edit primitives * modify the way line number is presented (remove leading space) * change prompt to the newest display format * support tracking of costs via metrics * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * implement and add tests for py linting * remove extra text arg for incompatible subprocess ver * remove sample.txt * update test_edits integration tests * fix all integration * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * correctly setup plugins for swebench eval * bump swe-bench version and add logging * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit `2bd1055673`. * bump version * remove _AGENT_SKILLS_DOCS * move flake8 to test dep * update poetry.lock * remove extra arg * reduce max iter for eval * update poetry * fix integration tests --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-23 16:04:09 +00:00
Engel Nyst	0eccf31604	Refactor monologue and SWE agent to use the messages in state history (#1863 ) * Refactor monologue to use the messages in state history * add messages, clean up * fix monologue * update integration tests * move private method * update SWE agent to use the history from State * integration tests for SWE agent * rename monologue to initial_thoughts, since that is what it is	2024-05-23 07:29:12 +00:00
Robert Brennan	5bdacf738d	Refactor session management (#1810 ) * refactor session mgmt * defer file handling to runtime * add todo * refactor sessions a bit more * remove messages logic from FE * fix up socket handshake * refactor frontend auth a bit * first pass at redoing file explorer * implement directory suffix * fix up file tree * close agent on websocket close * remove session saving * move file refresh * remove getWorkspace * plumb path/code differently * fix build issues * fix the tests * fix npm build * add session rehydration * fix event serialization * logspam * fix user message rehydration * add get_event fn * agent state restoration * change history tracking for codeact * fix responsiveness of init * fix lint * lint * delint * fix prop * update tests * logspam * lint * fix test * revert codeact * change fileService to use API * fix up session loading * delint * delint * fix integration tests * revert test * fix up access to options endpoints * fix initial files load * delint * fix file initialization * fix mock server * fixl int * fix auth for html * Update frontend/src/i18n/translation.json Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor sessions and sockets * avoid reinitializing the same session * fix reconnect issue * change up intro message * more guards on reinit * rename agent_session * delint * fix a bunch of tests * delint * fix last test * remove code editor context * fix build * fix any * fix dot notation * Update frontend/src/services/api.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix up error handling * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update frontend/src/services/session.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix build errs * fix else * add closed state * delint * Update opendevin/server/session/session.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-22 18:33:16 +00:00
Engel Nyst	46352e890b	Logging security (#1943 ) * update .gitignore * Rename the confusing 'INFO' style to 'DETAIL' * override str and repr * feat: api_key desensitize * feat: add SensitiveDataFilter in file handler * tweak regex, add tests * more tweaks, include other attrs * add env vars, those with equivalent config * fix tests * tests are invaluable --------- Co-authored-by: Shimada666 <649940882@qq.com>	2024-05-22 18:27:38 +02:00
Yufan Song	4292998ee2	doc: add more cmd in unit test documentation (#1963 )	2024-05-22 19:47:03 +08:00
Yufan Song	d18e6c85a0	feat: add metrics related to cost for better observability (#1944 ) * add metrics for total_cost * make lint * refact codeact * change metrics into llm * add costs list, add into state * refactor log completion * refactor and test others * make lint * Update opendevin/core/metrics.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/llm/llm.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor * add code --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-22 08:53:31 +00:00
Engel Nyst	1e51bb9276	Fix/update controller is_stuck() (#1891 ) * Refactor monologue to use the messages in state history remove now unused method * is_stuck update * fix is_stuck * unit tests * fix tests * Revert "Refactor monologue to use the messages in state history" This reverts commit `76b4b765ef`. * Override eq for CmdOutputObservation to ignore the pid, compare the actual command only * Revert "Override eq for CmdOutputObservation to ignore the pid, compare the actual command only" This reverts commit `6418d856b5`.	2024-05-21 22:56:59 +08:00
Robert Brennan	110b878dd9	fix up serialization and deserialization of events (#1850 ) * fix up serialization and deserialization of events * fix tests * remove prints * fix test * regenerate tests * add try blocks	2024-05-17 01:09:15 +00:00
Engel Nyst	b3a45ed7fe	Fix workspace paths defaults (#1845 ) * workspace_mount_path is set to the workspace_base if unset * unit tests for paths * workspace_base is absolute path	2024-05-16 17:53:31 -04:00
Leo	e89cc8f19b	Feat: add stream output to exec_run (#1625 ) * Feat: add stream output to exec_run * Using command timeout to control the exec_box's timeout. * add bash -c to source command to compatible for sh. Signed-off-by: ifuryst <ifuryst@gmail.com> * Feat: add stream output to SSHBox execute Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case fail. Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case import wrong path for method. Signed-off-by: ifuryst <ifuryst@gmail.com> --------- Signed-off-by: ifuryst <ifuryst@gmail.com>	2024-05-16 14:37:49 +00:00
Xingyao Wang	2406b901df	feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 ) * add draft dockerfile for build all * add rsync for build * add all-in-one docker * update prepare scripts * Update swe_env_box.py * Add swe_entry.sh (buggy now) * Parse the test command in swe_entry.sh * Update README for instance eval in sandbox * revert specialized config * replace run_as_devin as an init arg * set container & run_as_root via args * update swe entry script * update env * remove mounting * allow error after swe_entry * update swe_env_box * move file * update gitignore * get swe_env_box a working demo * support faking user response & provide sandox ahead of time; also return state for controller * tweak main to support adding controller kwargs * add module * initialize plugin for provided sandbox * add pip cache to plugin & fix jupyter kernel waiting * better print Observation output * add run infer scripts * update readme * add utility for getting diff patch * use get_diff_patch in infer * update readme * support cost tracking for codeact * add swe agent edit hack * disable color in git diff * fix git diff cmd * fix state return * support limit eval * increase t imeout and export pip cache * add eval limit config * return state when hit turn limit * save log to file; allow agent to give up * run eval with max 50 turns * add outputs to gitignore * save swe_instance & instruction * add uuid to swebench * add streamlit dep * fix save series * fix the issue where session id might be duplicated * allow setting temperature for llm (use 0 for eval) * Get report from agent running log * support evaluating task success right after inference. * remove extra log * comment out prompt for baseline * add visualizer for eval * use plaintext for instruction * reduce timeout for all; only increase timeout for init * reduce timeout for all; only increase timeout for init * ignore sid for swe env * close sandbox in each eval loop * update visualizer instruction * increase max chars * add finish action to history too * show test result in metrics * add sidebars for visualizer * also visualize swe_instance * cleanup browser when agent controller finish runinng * do not mount workspace for swe-eval to avoid accidentally overwrite files * Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files" This reverts commit `8ef7739054`. * Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"" This reverts commit `016cfbb9f0`. * run jupyter command via copy to, instead of cp to mount * only print mixin output when failed * change ssh box logging * add visualizer for pass rate * add instance id to sandbox name * only remove container we created * use opendevin logger in main * support multi-processing infer * add back metadata, support keyboard interrupt * remove container with startswith * make pbar behave correctly * update instruction w/ multi-processing * show resolved rate by repo * rename tmp dir name * attempt to fix racing for copy to ssh_box * fix script * bump swe-bench-all version * fix ipython with self-contained commands * add jupyter demo to swe_env_box * make resolved count two column * increase height * do not add glob to url params * analyze obs length * print instance id prior to removal handler * add gold patch in visualizer * fix interactive git by adding a git --no-pager as alias * increase max_char to 10k to cover 98% of swe-bench obs cases * allow parsing note * prompt v2 * add iteration reminder * adjust user response * adjust order * fix return eval * fix typo * add reminder before logging * remove other resolve rate * re adjust to new folder structure * support adding eval note * fix eval note path * make sure first log of each instance is printed * add eval note * fix the display for visualizer * tweak visualizer for better git patch reading * exclude empty patch * add retry mechanism for swe_env_box start * fix ssh timeout issue * add stat field for apply test patch success * add visualization for fine-grained report * attempt to support monologue agent by constraining it to single thread * also log error msg when stopeed * save error as well * override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp * add retry mechanism for sshbox * remove retry for swe env box * try to handle loop state stopped * Add get report scripts * Add script to convert agent output to swe-bench format * Merge fine grained report for visualizer * Update eval readme * Update README.md * Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite * Update the script to get model report * Update get_model_report.sh * Update get_agent_report.sh * Update report merge script * Add agent output conversion script * Update swe_lite_env_setup.sh * Add example swe-bench output files * Update eval readme * Remove redundant scripts * set iteration count down to false by default * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io> * fix prepare_swe_util scripts * update builder images * update setup script * remove swe-bench build workflow * update lock * remove experiments since they are moved to hf * remove visualizer (since it is moved to hf repo) * simply jupyter execution via heredoc * update ssh_box * add initial docker readme * add pkg-config as dependency * add script for swe_bench all-in-one docker * add rsync to builder * rename var * update commit * update readme * update lock * support specify timeout for long running tasks * fix path * separate building of all deps and files * support returning states at the end of controller * remove return None * support specify timeout for long running tasks * add timeout for all existing sandbox impl * fix swe_env_box for new codebase * update llm config in config.py * support pass sandbox in * remove force set * update eval script * fix issue of overriding final state * change default eval output to hf demo * change default eval output to hf demo * fix config * only close it when it is NOT external sandbox * add scripts * tweak config * only put in hostory when state has history attr * fix agent controller on the case of run out interaction budget * always assume state is always not none * remove print of final state * catch all exception when cannot compute completion cost * Update README.md * save source into json * fix path * update docker path * return the final state on close * merge AgentState with State * fix integration test * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * move scripts for setup * update script and readme for infer * do not reset logger when n processes == 1 * update eval_infer scripts and readme * simplify readme * copy over dir after eval * copy over dir after eval * directly return get state * update lock * fix output saving of infer * replace print with logger * update eval_infer script * add back the missing .close * increase timeout * copy all swe_bench_format file * attempt to fix output parsing * log git commit id as metadata * fix eval script * update lock * update unit tests * fix argparser unit test * fix lock * the deps are now lightweight enough to be incude in make build * add spaces for tests * add eval outputs to gitignore * remove git submodule * readme * tweak git email * update upload instruction * bump codeact version for eval --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com> Co-authored-by: huybery <huybery@gmail.com> Co-authored-by: Bart Shappee <bshappee@gmail.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-05-15 16:15:55 +00:00
Frank Xu	a84d19f03c	Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807 ) * enable browsing in codeact, and arbitrary browsergym DSL support * fix * fix unit test case * update frontend for the new interactive browsing action * bump ver * Fix integration tests --------- Co-authored-by: OpenDevinBot <bot@opendevin.com>	2024-05-15 11:59:58 -04:00
Xingyao Wang	8d8ed0c3be	hotfix: Initialize plugin with new runtime (#1795 ) * fix plugin use in jupyter * fix jupyter plugin potential port conflict * update integration test * wait a bit for jupyter execution * add one unit tests for sandbox * fix integration test * fix integration * fix integration yet again * init sandbox plugins in the server	2024-05-14 21:15:19 +00:00
Robert Brennan	dcb5d1ce0a	Add permanent storage option for EventStream (#1697 ) * add storage classes * add minio * add event stream storage * storage test working * use fixture * event stream test passing * better serialization * factor out serialization pkg * move more serialization * fix tests * fix test * remove __all__ * add rehydration test * add more rehydration test * fix fixture * fix dict init * update tests * lock * regenerate tests * Update opendevin/events/stream.py * revert tests * revert old integration tests * only add fields if present * regen tests * pin pyarrow * fix unit tests * remove cause from memories * revert tests * regen tests	2024-05-14 11:09:45 -04:00
Robert Brennan	beb74a19f6	Use event stream for the runtime (#1776 ) * rebuild PR from scratch * fix max_iter * regenerate tests * cut down on history * Update opendevin/controller/agent_controller.py * regenerate tests * revert swe agent * revert some codeact chagnes * regenerate tests * add source to dict * only add source if not none * try to fix coverage issue * lock * add gevent	2024-05-14 13:35:25 +00:00
Robert Brennan	b028bd46bb	Use messages to drive tasks (#1688 ) * finish is working * start reworking main_goal * remove main_goal from microagents * remove main_goal from other agents * fix issues * revert codeact line * make plan a subclass of task * fix frontend for new plan setup * lint * fix type * more lint * fix build issues * fix codeact mgs * fix edge case in regen script * fix task validation errors * regenerate integration tests * fix up tests * fix sweagent * revert codeact prompt * update integration tests * update integration tests * handle loading state * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/agent_controller.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/state/plan.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * update docs * regenerate tests * remove none from state type * revert test files * update integration tests * rename plan to root_task * revert plugin perms * regen integration tests * tweak integration script * prettier * fix test * set workspace up for regeneration * regenerate tests * Change directory of copy * Updated tests * Disable PlannerAgent test * Fix listen * Updated prompts * Disable planner again * Make codecov more lenient * Update agenthub/README.md * Update opendevin/server/README.md * re-enable planner tests * finish top level tasks * regen planner * fix root task factory --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-13 23:14:15 +00:00
Engel Nyst	e5f1dbf5e7	Move json utility to the custom json parsing; apply it to the monologue-like agents (#1740 )	2024-05-12 13:39:38 -04:00
Robert Brennan	efd0d61e70	Fix the tests (#1737 ) * fix config patching * revert tests	2024-05-12 11:02:10 -04:00

1 2

66 Commits