Commit Graph

88 Commits

Author SHA1 Message Date
tobitege
0082640ac8 fix test_config to prevent leaks (#2245) 2024-06-04 21:32:46 +02:00
Graham Neubig
7a2122ebc2 Default to gpt-4o (#2158)
* Default to gpt-4o

* Fix default
2024-05-31 14:44:07 +00:00
மனோஜ்குமார் பழனிச்சாமி
961c96a2a1 Added ssh_password to config setup (#2139)
Co-authored-by: Aleksandar <isavitaisa@gmail.com>
2024-05-31 07:26:16 +05:30
Xingyao Wang
01ef90205d Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105)
* update swe_bench prompt;
use minimal prompt for codeact;

* upgrade agentskills and update testcases

* update infer prompt

* fix cwd

* add icl for swebench

* also log in_context_example to run infer

* remove extra print

* change prompt to abs path

* update error message to include current file info

* change cwd for jupyter if needed

* update edit error message

* update prompt

* improve git get patch

* update hint string

* default to 50 turns

* revert changes from codeact agent and create new CodeActSWEAgent

* revert changes to codeact

* revert instructions for run infer

* revert instructions for run infer

* update README

* update max iter

* add codeact swe agent

* fix issue for CodeActSWEAgent

* allow specifying max iter in cmdline script

* stop printing

* Update agenthub/codeact_swe_agent/README.md

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Fix prompt regression in jupyter plugin

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-29 21:19:00 -07:00
Rahul Anand
b3cce763a2 fix #2123 (#2125) 2024-05-29 17:56:45 -04:00
Boxuan Li
9b371b1b5f Refactor agent delegation and tweak micro agents (#1910)
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.
2024-05-28 20:01:16 -07:00
Engel Nyst
55fdee31ad Remove unnecessary stuff from the sandboxes tests (#2095) 2024-05-27 20:50:02 +05:30
Xingyao Wang
ae8cda1495 Support specifying custom cost per token (#2083)
* support specifying custom cost per token

* fix test for new attrs

* add to docs

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-27 19:35:34 +08:00
Aleksandar
18d07bda89 feat: add max_budget_per_task configuration to control task cost (#2070)
* feat: add max_budget_per_task configuration to control task cost

* Fix test_arg_parser.py

* Use the config.max_budget_per_task as default value

* Add max_budget_per_task to core/main.py as well

* Update opendevin/controller/agent_controller.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-27 02:04:31 +08:00
Engel Nyst
783fea62a0 Ignore pid for loop detection (Was: override eq...) (#2045)
* rewrite, implement pid ignore in the controller

* make the helper method private
2024-05-26 19:27:12 +02:00
Shimada666
b31f7701eb Integrate Multimodal tools to agentskills. (#2016)
* suport reading multimodal files

* move file

* update dependency

* remove useless pip install

* add comments

* update the comment

* Apply suggestions from code review

* Add unit test for TXTReader

* pre-commit hook corrupted utf16 test txt

* Revert unnecessary dependency upgrades

* feat: import some readers for agentskill

* add dependencies

* Integrate some multimodal tools

* add shell pip dependency

* update dependencies

* update dependencies

* update print window

* remove __main__

* locally import cv2

* add c library for opencv

* update lock file

* update prompt

* remove unuseful file

* add some unittest

* add unittest & remove excel-related parser

* rollback poetry lock

* remove markdown

* remove requests

* optimize parse_video output

* Fix integration tests for CodeActAgent

* remove test_parse_image unittest

* Add a TODO to containers/sandbox/Dockerfile

* update dependencies

* remove pyproject.toml useless package

* change document via openai key

* Fix prompts after removing some actions

---------

Co-authored-by: Mingchen Zhuge <mczhuge@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Mingchen Zhuge <64179323+mczhuge@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-25 18:58:49 +08:00
Boxuan Li
78241d9d43 Add tests for browser agent (#2031)
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 09:59:40 +00:00
Boxuan Li
c59bcbbffd Minor docstring & prompt fixes for AgentSkills (#2028)
* A few minor fixes to agentskills

* Regenerate prompts

* Remove redundant comment
2024-05-24 13:30:48 +08:00
Boxuan Li
633ece5f9c Fix integration tests (#2024) 2024-05-23 20:24:31 -07:00
Robert Brennan
9ca2007201 fix json encoding (#2018)
* fix json encoding

* add test

* add another test

* fix integration tests

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-23 23:36:15 +00:00
Xingyao Wang
602ffcdffb Implement agentskills for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
* add draft for skills

* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file

* Remove new_sample.txt file

* add some work from opendevin w/ fixes

* Add unit tests for agentskills module

* fix some issues and updated tests

* add more tests for open

* tweak and handle goto_line

* add tests for some edge cases

* add tests for scrolling

* add tests for edit

* add tests for search_dir

* update tests to use pytest

* use pytest --forked to avoid file op unit tests to interfere with each other via global var

* update doc based on swe agent tool

* update and add tests for find_file and search_file

* move agent_skills to plugins

* add agentskills as plugin and docs

* add agentskill to ssh box and fix sandbox integration

* remove extra returns in doc

* add agentskills to initial tool for jupyter

* support re-init jupyter kernel (for agentskills) after restart

* fix print window's issue with indentation and add testcases

* add prompt for codeact with the newest edit primitives

* modify the way line number is presented (remove leading space)

* change prompt to the newest display format

* support tracking of costs via metrics

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* implement and add tests for py linting

* remove extra text arg for incompatible subprocess ver

* remove sample.txt

* update test_edits integration tests

* fix all integration

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/runtime/plugins/agent_skills/agentskills.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* Revert "correctly setup plugins for swebench eval"

This reverts commit 2bd1055673.

* bump version

* remove _AGENT_SKILLS_DOCS

* move flake8 to test dep

* update poetry.lock

* remove extra arg

* reduce max iter for eval

* update poetry

* fix integration tests

---------

Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-23 16:04:09 +00:00
Engel Nyst
0eccf31604 Refactor monologue and SWE agent to use the messages in state history (#1863)
* Refactor monologue to use the messages in state history

* add messages, clean up

* fix monologue

* update integration tests

* move private method

* update SWE agent to use the history from State

* integration tests for SWE agent

* rename monologue to initial_thoughts, since that is what it is
2024-05-23 07:29:12 +00:00
Boxuan Li
acb430eef5 Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3bfb9.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 20:38:57 -07:00
Robert Brennan
5bdacf738d Refactor session management (#1810)
* refactor session mgmt

* defer file handling to runtime

* add todo

* refactor sessions a bit more

* remove messages logic from FE

* fix up socket handshake

* refactor frontend auth a bit

* first pass at redoing file explorer

* implement directory suffix

* fix up file tree

* close agent on websocket close

* remove session saving

* move file refresh

* remove getWorkspace

* plumb path/code differently

* fix build issues

* fix the tests

* fix npm build

* add session rehydration

* fix event serialization

* logspam

* fix user message rehydration

* add get_event fn

* agent state restoration

* change history tracking for codeact

* fix responsiveness of init

* fix lint

* lint

* delint

* fix prop

* update tests

* logspam

* lint

* fix test

* revert codeact

* change fileService to use API

* fix up session loading

* delint

* delint

* fix integration tests

* revert test

* fix up access to options endpoints

* fix initial files load

* delint

* fix file initialization

* fix mock server

* fixl int

* fix auth for html

* Update frontend/src/i18n/translation.json

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* refactor sessions and sockets

* avoid reinitializing the same session

* fix reconnect issue

* change up intro message

* more guards on reinit

* rename agent_session

* delint

* fix a bunch of tests

* delint

* fix last test

* remove code editor context

* fix build

* fix any

* fix dot notation

* Update frontend/src/services/api.ts

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* fix up error handling

* Update opendevin/server/session/agent.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/server/session/agent.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update frontend/src/services/session.ts

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* fix build errs

* fix else

* add closed state

* delint

* Update opendevin/server/session/session.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 18:33:16 +00:00
Engel Nyst
46352e890b Logging security (#1943)
* update .gitignore

* Rename the confusing 'INFO' style to 'DETAIL'

* override str and repr

* feat: api_key desensitize

* feat: add SensitiveDataFilter in file handler

* tweak regex, add tests

* more tweaks, include other attrs

* add env vars, those with equivalent config

* fix tests

* tests are invaluable

---------

Co-authored-by: Shimada666 <649940882@qq.com>
2024-05-22 18:27:38 +02:00
Yufan Song
4292998ee2 doc: add more cmd in unit test documentation (#1963) 2024-05-22 19:47:03 +08:00
Yufan Song
d18e6c85a0 feat: add metrics related to cost for better observability (#1944)
* add metrics for total_cost

* make lint

* refact codeact

* change metrics into llm

* add costs list, add into state

* refactor log completion

* refactor and test others

* make lint

* Update opendevin/core/metrics.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/llm/llm.py

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* refactor

* add code

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-22 08:53:31 +00:00
Engel Nyst
1e51bb9276 Fix/update controller is_stuck() (#1891)
* Refactor monologue to use the messages in state history

remove now unused method

* is_stuck update

* fix is_stuck

* unit tests

* fix tests

* Revert "Refactor monologue to use the messages in state history"

This reverts commit 76b4b765ef.

* Override eq for CmdOutputObservation to ignore the pid, compare the actual command only

* Revert "Override eq for CmdOutputObservation to ignore the pid, compare the actual command only"

This reverts commit 6418d856b5.
2024-05-21 22:56:59 +08:00
Robert Brennan
0ecba83e53 Move message history out of CodeAct (#1847)
* stop keeping history state in codeact

* regenerate tests

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* revert tests

* regen tests

* refactor codeact a bit

* regenerate without using LLM

* simplify logic

* change to heredoc

* fix heredoc

* fix end_of_edit docs

* regen tests

* regenerate

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 18:39:27 +00:00
மனோஜ்குமார் பழனிச்சாமி
b0b44ed467 Auto restarted Jupyter kernel (#1808)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 08:40:31 +05:30
Boxuan Li
735fbbfe3e (test) Include message separators in mock prompts (#1855)
* Add message separator to prompts in tests

* DEMO: remove existing prompts for PlannerAgent

* Add results after prompt regeneration
2024-05-18 00:33:55 +02:00
Robert Brennan
110b878dd9 fix up serialization and deserialization of events (#1850)
* fix up serialization and deserialization of events

* fix tests

* remove prints

* fix test

* regenerate tests

* add try blocks
2024-05-17 01:09:15 +00:00
Engel Nyst
b3a45ed7fe Fix workspace paths defaults (#1845)
* workspace_mount_path is set to the workspace_base if unset

* unit tests for paths

* workspace_base is absolute path
2024-05-16 17:53:31 -04:00
Boxuan Li
b6ff201780 Refactor integration test framework and relieve the pain of regeneration (#1818)
* Update README.md

* Fix WORKSPACE_MOUNT_PATH_IN_SANDBOX variable in regenerate.sh

* Regenerate prompts without calling real LLM

* Disable pytest warning capture

* Change planner agent prompt by a bit for demo

* Regenerate prompt files following prompt changes

* doc: elaborate on FORCE_USE_LLM

* Add another prompt change to monologue_agent for demo purpose

* Regenerate prompts with FORCE_USE_LLM=true

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-16 08:30:29 -07:00
Leo
e89cc8f19b Feat: add stream output to exec_run (#1625)
* Feat: add stream output to exec_run

* Using command timeout to control the exec_box's timeout.
* add bash -c to source command to compatible for sh.

Signed-off-by: ifuryst <ifuryst@gmail.com>

* Feat: add stream output to SSHBox execute

Signed-off-by: ifuryst <ifuryst@gmail.com>

* fix the test case fail.

Signed-off-by: ifuryst <ifuryst@gmail.com>

* fix the test case import wrong path for method.

Signed-off-by: ifuryst <ifuryst@gmail.com>

---------

Signed-off-by: ifuryst <ifuryst@gmail.com>
2024-05-16 14:37:49 +00:00
Xingyao Wang
2406b901df feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
* add draft dockerfile for build all

* add rsync for build

* add all-in-one docker

* update prepare scripts

* Update swe_env_box.py

* Add swe_entry.sh (buggy now)

* Parse the test command in swe_entry.sh

* Update README for instance eval in sandbox

* revert specialized config

* replace run_as_devin as an init arg

* set container & run_as_root via args

* update swe entry script

* update env

* remove mounting

* allow error after swe_entry

* update swe_env_box

* move file

* update gitignore

* get swe_env_box a working demo

* support faking user response & provide sandox ahead of time;
also return state for controller

* tweak main to support adding controller kwargs

* add module

* initialize plugin for provided sandbox

* add pip cache to plugin & fix jupyter kernel waiting

* better print Observation output

* add run infer scripts

* update readme

* add utility for getting diff patch

* use get_diff_patch in infer

* update readme

* support cost tracking for codeact

* add swe agent edit hack

* disable color in git diff

* fix git diff cmd

* fix state return

* support limit eval

* increase t imeout and export pip cache

* add eval limit config

* return state when hit turn limit

* save log to file; allow agent to give up

* run eval with max 50 turns

* add outputs to gitignore

* save swe_instance & instruction

* add uuid to swebench

* add streamlit dep

* fix save series

* fix the issue where session id might be duplicated

* allow setting temperature for llm (use 0 for eval)

* Get report from agent running log

* support evaluating task success right after inference.

* remove extra log

* comment out prompt for baseline

* add visualizer for eval

* use plaintext for instruction

* reduce timeout for all; only increase timeout for init

* reduce timeout for all; only increase timeout for init

* ignore sid for swe env

* close sandbox in each eval loop

* update visualizer instruction

* increase max chars

* add finish action to history too

* show test result in metrics

* add sidebars for visualizer

* also visualize swe_instance

* cleanup browser when agent controller finish runinng

* do not mount workspace for swe-eval to avoid accidentally overwrite files

* Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"

This reverts commit 8ef7739054.

* Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files""

This reverts commit 016cfbb9f0.

* run jupyter command via copy to, instead of cp to mount

* only print mixin output when failed

* change ssh box logging

* add visualizer for pass rate

* add instance id to sandbox name

* only remove container we created

* use opendevin logger in main

* support multi-processing infer

* add back metadata, support keyboard interrupt

* remove container with startswith

* make pbar behave correctly

* update instruction w/ multi-processing

* show resolved rate by repo

* rename tmp dir name

* attempt to fix racing for copy to ssh_box

* fix script

* bump swe-bench-all version

* fix ipython with self-contained commands

* add jupyter demo to swe_env_box

* make resolved count two column

* increase height

* do not add glob to url params

* analyze obs length

* print instance id prior to removal handler

* add gold patch in visualizer

* fix interactive git by adding a git --no-pager as alias

* increase max_char to 10k to cover 98% of swe-bench obs cases

* allow parsing note

* prompt v2

* add iteration reminder

* adjust user response

* adjust order

* fix return eval

* fix typo

* add reminder before logging

* remove other resolve rate

* re adjust to new folder structure

* support adding eval note

* fix eval note path

* make sure first log of each instance is printed

* add eval note

* fix the display for visualizer

* tweak visualizer for better git patch reading

* exclude empty patch

* add retry mechanism for swe_env_box start

* fix ssh timeout issue

* add stat field for apply test patch success

* add visualization for fine-grained report

* attempt to support monologue agent by constraining it to single thread

* also log error msg when stopeed

* save error as well

* override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp

* add retry mechanism for sshbox

* remove retry for swe env box

* try to handle loop state stopped

* Add get report scripts

* Add script to convert agent output to swe-bench format

* Merge fine grained report for visualizer

* Update eval readme

* Update README.md

* Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite

* Update the script to get model report

* Update get_model_report.sh

* Update get_agent_report.sh

* Update report merge script

* Add agent output conversion script

* Update swe_lite_env_setup.sh

* Add example swe-bench output files

* Update eval readme

* Remove redundant scripts

* set iteration count down to false by default

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>

* fix prepare_swe_util scripts

* update builder images

* update setup script

* remove swe-bench build workflow

* update lock

* remove experiments since they are moved to hf

* remove visualizer (since it is moved to hf repo)

* simply jupyter execution via heredoc

* update ssh_box

* add initial docker readme

* add pkg-config as dependency

* add script for swe_bench all-in-one docker

* add rsync to builder

* rename var

* update commit

* update readme

* update lock

* support specify timeout for long running tasks

* fix path

* separate building of all deps and files

* support returning states at the end of controller

* remove return None

* support specify timeout for long running tasks

* add timeout for all existing sandbox impl

* fix swe_env_box for new codebase

* update llm config in config.py

* support pass sandbox in

* remove force set

* update eval script

* fix issue of overriding final state

* change default eval output to hf demo

* change default eval output to hf demo

* fix config

* only close it when it is NOT external sandbox

* add scripts

* tweak config

* only put in hostory when state has history attr

* fix agent controller on the case of run out interaction budget

* always assume state is always not none

* remove print of final state

* catch all exception when cannot compute completion cost

* Update README.md

* save source into json

* fix path

* update docker path

* return the final state on close

* merge AgentState with State

* fix integration test

* merge AgentState with State

* fix integration test

* add ChangeAgentStateAction to history in attempt to fix integration

* add back set agent state

* update tests

* update tests

* move scripts for setup

* update script and readme for infer

* do not reset logger when n processes == 1

* update eval_infer scripts and readme

* simplify readme

* copy over dir after eval

* copy over dir after eval

* directly return get state

* update lock

* fix output saving of infer

* replace print with logger

* update eval_infer script

* add back the missing .close

* increase timeout

* copy all swe_bench_format file

* attempt to fix output parsing

* log git commit id as metadata

* fix eval script

* update lock

* update unit tests

* fix argparser unit test

* fix lock

* the deps are now lightweight enough to be incude in make build

* add spaces for tests

* add eval outputs to gitignore

* remove git submodule

* readme

* tweak git email

* update upload instruction

* bump codeact version for eval

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
Co-authored-by: huybery <huybery@gmail.com>
Co-authored-by: Bart Shappee <bshappee@gmail.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-15 16:15:55 +00:00
Frank Xu
a84d19f03c Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807)
* enable browsing in codeact, and  arbitrary browsergym DSL support

* fix

* fix unit test case

* update frontend for the new interactive browsing action

* bump ver

* Fix integration tests

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
2024-05-15 11:59:58 -04:00
Xia Zhenhua
bf14b47890 feat: make other agents support asking user input in MessageAction. (#1777)
* feat: make other agents support asking user input in MessageAction.

* Update agenthub/micro/_instructions/actions/message.md

Co-authored-by: Robert Brennan <accounts@rbren.io>

* Update agenthub/micro/_instructions/actions/message.md

Co-authored-by: Robert Brennan <accounts@rbren.io>

* feat: make other agents support asking user input in MessageAction.

* Regenerate test artifacts

---------

Co-authored-by: aaren.xzh <aaren.xzh@antfin.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-15 00:44:45 -07:00
Boxuan Li
6714000b2c CodeActAgent: Fix iteration reminder (#1803)
This PR includes three changes:
1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100
2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2`
3) Remove legacy ITERATION_REMINDER config
2024-05-15 13:48:47 +08:00
Xingyao Wang
d1fd277ad4 Support return final task states for evaluation (#1755)
* support returning states at the end of controller

* remove return None

* fix issue of overriding final state

* return the final state on close

* merge AgentState with State

* fix integration test

* add ChangeAgentStateAction to history in attempt to fix integration

* add back set agent state

* update tests

* update tests

* directly return get state

* add back the missing .close()

* Update typo in opendevin/core/main.py

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-15 03:43:01 +00:00
Graham Neubig
3cef8ee187 Add GitHub prompt to CodeAct (#1792)
* Added github to CodeAct

* More codeact

* Simplify prompt

* Modify codeact prompt

* fix integration test for CodeAct

* yet another integration test fix for codeact

* fix plugin use in jupyter

* update edit tests

* fix jupyter plugin potential port conflict

* fix test ipython with latest ipython fix

* update integration test

* wait a bit for jupyter execution

* add one unit tests for sandbox

* fix integration test

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-14 21:25:21 +00:00
Xingyao Wang
8d8ed0c3be hotfix: Initialize plugin with new runtime (#1795)
* fix plugin use in jupyter

* fix jupyter plugin potential port conflict

* update integration test

* wait a bit for jupyter execution

* add one unit tests for sandbox

* fix integration test

* fix integration

* fix integration yet again

* init sandbox plugins in the server
2024-05-14 21:15:19 +00:00
Robert Brennan
dcb5d1ce0a Add permanent storage option for EventStream (#1697)
* add storage classes

* add minio

* add event stream storage

* storage test working

* use fixture

* event stream test passing

* better serialization

* factor out serialization pkg

* move more serialization

* fix tests

* fix test

* remove __all__

* add rehydration test

* add more rehydration test

* fix fixture

* fix dict init

* update tests

* lock

* regenerate tests

* Update opendevin/events/stream.py

* revert tests

* revert old integration tests

* only add fields if present

* regen tests

* pin pyarrow

* fix unit tests

* remove cause from memories

* revert tests

* regen tests
2024-05-14 11:09:45 -04:00
Robert Brennan
beb74a19f6 Use event stream for the runtime (#1776)
* rebuild PR from scratch

* fix max_iter

* regenerate tests

* cut down on history

* Update opendevin/controller/agent_controller.py

* regenerate tests

* revert swe agent

* revert some codeact chagnes

* regenerate tests

* add source to dict

* only add source if not none

* try to fix coverage issue

* lock

* add gevent
2024-05-14 13:35:25 +00:00
Robert Brennan
82a798990c refactor remind_iterations (#1760)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* fix merge issues

* update integration tests
2024-05-14 08:27:12 -04:00
Boxuan Li
3d53d363b4 Integration test: Verify finish state & add auto-rerun in regenerate.sh (#1773)
* regenerate.sh: Allow testing on a specific agent and/or test

* Check agent finish state

* rengerate.sh: Rerun after fixing the prompts

* Fix SWEAgent test_write_simple_script

* Add more help message

* Add a known issue to README.md

* regenerate.sh: Fix help message typo

* Fix a typo in README
2024-05-14 03:50:29 -04:00
Boxuan Li
b84f25ab35 Integration test: exit if no prompt match (#1772) 2024-05-13 20:03:09 -07:00
Robert Brennan
b028bd46bb Use messages to drive tasks (#1688)
* finish is working

* start reworking main_goal

* remove main_goal from microagents

* remove main_goal from other agents

* fix issues

* revert codeact line

* make plan a subclass of task

* fix frontend for new plan setup

* lint

* fix type

* more lint

* fix build issues

* fix codeact mgs

* fix edge case in regen script

* fix task validation errors

* regenerate integration tests

* fix up tests

* fix sweagent

* revert codeact prompt

* update integration tests

* update integration tests

* handle loading state

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/agent_controller.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/state/plan.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* update docs

* regenerate tests

* remove none from state type

* revert test files

* update integration tests

* rename plan to root_task

* revert plugin perms

* regen integration tests

* tweak integration script

* prettier

* fix test

* set workspace up for regeneration

* regenerate tests

* Change directory of copy

* Updated tests

* Disable PlannerAgent test

* Fix listen

* Updated prompts

* Disable planner again

* Make codecov more lenient

* Update agenthub/README.md

* Update opendevin/server/README.md

* re-enable planner tests

* finish top level tasks

* regen planner

* fix root task factory

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-13 23:14:15 +00:00
Robert Brennan
e28b3ef9e8 Fix integration tests (#1764)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* add some helpers to the tests

* regenerate tests

* add to logs

* regenerate tests

* add debug info

* fix exit_on_message

* fix regen script

* regenerate tests

* Revert "Merge branch 'rb/test-regen' of ssh://github.com/opendevin/opendevin into rb/test-regen"

This reverts commit b9cd1acbf2, reversing
changes made to c888285304.

* remove prints

* revert files

* revert more

* revert more

* regenerate for the last time I hope

* add back remind_iter

* regenerate

* add back remind_iter

* regenerate

* fix remind_iter

* regenerate yet again

* regen

* remove comment

* regen again
2024-05-13 18:08:59 -04:00
Graham Neubig
b13d4647ab Print out the regenerate command (#1759)
* Print out the output of the regenerate command

* Update regenerate.sh
2024-05-13 18:43:58 +00:00
Boxuan Li
eba5ef8e67 Fix test_ipython (#1750) 2024-05-12 16:15:32 -07:00
Xingyao Wang
4db4a84e2e Simply Jupyter execution via heredoc (#1728)
* simply jupyter execution via heredoc

* make sure /tmp always exists

* add integration test for jupyter exec
2024-05-13 04:57:06 +08:00
Boxuan Li
49de262577 opendevin/core/main.py: Graceful shutdown (#1731)
* opendevin/core/main.py: Graceful shutdown

* Shutdown controller at exit

* Update opendevin/core/main.py

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-12 13:56:35 -07:00
Engel Nyst
e5f1dbf5e7 Move json utility to the custom json parsing; apply it to the monologue-like agents (#1740) 2024-05-12 13:39:38 -04:00
Robert Brennan
efd0d61e70 Fix the tests (#1737)
* fix config patching

* revert tests
2024-05-12 11:02:10 -04:00