152 Commits

Author SHA1 Message Date
Leo
9ada36e30b fix: restore python linting. (#2228)
* fix: restore python linting.

Signed-off-by: ifuryst <ifuryst@gmail.com>

* update: extend the Python lint check to evaluation.

Signed-off-by: ifuryst <ifuryst@gmail.com>

* Update evaluation/logic_reasoning/instruction.txt

---------

Signed-off-by: ifuryst <ifuryst@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-04 06:36:19 +00:00
RainRat
ed6dcc8381 fix typos (#2187)
* fix typos

no functional change

* fix typos
2024-06-01 20:40:30 +00:00
Aaron Xia
42c6b506b5 Lazy launching BrowseEnv / making BrowseEnv optional (#2155)
* feat: lazy launching browser; browser optional for diffrent agents.

* style: lint

* fix: integration test fail due to browser not started.

* fix: run by cli and integration test failed.

* fix: lint

* fix: lint

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-31 16:40:42 -04:00
Xingyao Wang
01ef90205d Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105)
* update swe_bench prompt;
use minimal prompt for codeact;

* upgrade agentskills and update testcases

* update infer prompt

* fix cwd

* add icl for swebench

* also log in_context_example to run infer

* remove extra print

* change prompt to abs path

* update error message to include current file info

* change cwd for jupyter if needed

* update edit error message

* update prompt

* improve git get patch

* update hint string

* default to 50 turns

* revert changes from codeact agent and create new CodeActSWEAgent

* revert changes to codeact

* revert instructions for run infer

* revert instructions for run infer

* update README

* update max iter

* add codeact swe agent

* fix issue for CodeActSWEAgent

* allow specifying max iter in cmdline script

* stop printing

* Update agenthub/codeact_swe_agent/README.md

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Fix prompt regression in jupyter plugin

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-29 21:19:00 -07:00
மனோஜ்குமார் பழனிச்சாமி
d4ccd48af8 Persistent docker session (#1998)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-29 13:22:34 +00:00
Boxuan Li
9b371b1b5f Refactor agent delegation and tweak micro agents (#1910)
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.
2024-05-28 20:01:16 -07:00
Yufan Song
3d29ec0418 add version (#2078) 2024-05-26 20:46:39 -07:00
Frank Xu
53f64ffa06 Improve browsing agent prompts, allowing agent to properly finish when done (#1993)
* improve browsing agent, allowing it to properly finish.

* handle parsing error, show user what the agent's browsing thoughts in the front end

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-24 00:02:19 -07:00
Xingyao Wang
602ffcdffb Implement agentskills for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
* add draft for skills

* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file

* Remove new_sample.txt file

* add some work from opendevin w/ fixes

* Add unit tests for agentskills module

* fix some issues and updated tests

* add more tests for open

* tweak and handle goto_line

* add tests for some edge cases

* add tests for scrolling

* add tests for edit

* add tests for search_dir

* update tests to use pytest

* use pytest --forked to avoid file op unit tests to interfere with each other via global var

* update doc based on swe agent tool

* update and add tests for find_file and search_file

* move agent_skills to plugins

* add agentskills as plugin and docs

* add agentskill to ssh box and fix sandbox integration

* remove extra returns in doc

* add agentskills to initial tool for jupyter

* support re-init jupyter kernel (for agentskills) after restart

* fix print window's issue with indentation and add testcases

* add prompt for codeact with the newest edit primitives

* modify the way line number is presented (remove leading space)

* change prompt to the newest display format

* support tracking of costs via metrics

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* implement and add tests for py linting

* remove extra text arg for incompatible subprocess ver

* remove sample.txt

* update test_edits integration tests

* fix all integration

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/runtime/plugins/agent_skills/agentskills.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* Revert "correctly setup plugins for swebench eval"

This reverts commit 2bd1055673.

* bump version

* remove _AGENT_SKILLS_DOCS

* move flake8 to test dep

* update poetry.lock

* remove extra arg

* reduce max iter for eval

* update poetry

* fix integration tests

---------

Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-23 16:04:09 +00:00
Engel Nyst
0eccf31604 Refactor monologue and SWE agent to use the messages in state history (#1863)
* Refactor monologue to use the messages in state history

* add messages, clean up

* fix monologue

* update integration tests

* move private method

* update SWE agent to use the history from State

* integration tests for SWE agent

* rename monologue to initial_thoughts, since that is what it is
2024-05-23 07:29:12 +00:00
Jeremi Joslin
3235836b00 Fix typo in prompt (#1992) 2024-05-23 07:11:46 +00:00
Boxuan Li
acb430eef5 Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3bfb9.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 20:38:57 -07:00
Engel Nyst
b9a5be2569 Add ruff for shared mutable defaults (B) (#1938)
* Add ruff for shared mutable defaults (B)

* Apply B006, B008 on current files, except fast API

* Update agenthub/SWE_agent/prompts.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix unintended behavior change

* this is correct, tell Ruff to leave it alone

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-22 20:06:00 -07:00
Engel Nyst
46352e890b Logging security (#1943)
* update .gitignore

* Rename the confusing 'INFO' style to 'DETAIL'

* override str and repr

* feat: api_key desensitize

* feat: add SensitiveDataFilter in file handler

* tweak regex, add tests

* more tweaks, include other attrs

* add env vars, those with equivalent config

* fix tests

* tests are invaluable

---------

Co-authored-by: Shimada666 <649940882@qq.com>
2024-05-22 18:27:38 +02:00
Yufan Song
d18e6c85a0 feat: add metrics related to cost for better observability (#1944)
* add metrics for total_cost

* make lint

* refact codeact

* change metrics into llm

* add costs list, add into state

* refactor log completion

* refactor and test others

* make lint

* Update opendevin/core/metrics.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/llm/llm.py

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* refactor

* add code

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-22 08:53:31 +00:00
Frank Xu
1fe290adf9 [Feat] A competitive Web Browsing agent (#1856)
* initial attempt at a browsing only agent

* add browsing agent

* update

* implement agent

* update

* fix comments

* remove unnecessary things from memory extras

* update image processing

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-21 19:20:33 +00:00
Boxuan Li
b845a38169 Small improvements & fixes to SWE-Bench (#1874)
I was able to run a few benchmark instances from SWE-Bench by myself following the documentation - it was great! In general the experience was smooth, thanks to @xingyaoww, @libowen2121 and the team! I made a few small enhancements and fixes to further improve the developer experience.

Always use poetry run python (using python from poetry's virtual environment) over python or python3 in scripts to make sure the behavior is consistent.
Make AGENT configurable. One can use an argument to control which agent they would like to benchmark. To facilitate this, I removed hardcoded CodeActAgent from run_infer.sh, and also added VERSION attribute to all agents, as the benchmark needs to record the agent version.
Make EVAL_LIMIT configurable. One can use an argument to control how many instances they'd like to benchmark. Useful for debugging & development purposes.
Fix 'eval_output_dir' not defined error in run_infer.py.
Other enhancements to the README file and logs.
I also notice that a lot of code from run_infer.py could be shared by other benchmarks, but since we only have one benchmark now, I think we could avoid over-engineering. A refactor and code dedup would be useful in the future once we have more benchmarks, though.
2024-05-20 08:03:30 +00:00
Robert Brennan
0ecba83e53 Move message history out of CodeAct (#1847)
* stop keeping history state in codeact

* regenerate tests

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* revert tests

* regen tests

* refactor codeact a bit

* regenerate without using LLM

* simplify logic

* change to heredoc

* fix heredoc

* fix end_of_edit docs

* regen tests

* regenerate

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 18:39:27 +00:00
Gant
f950e3b48e make CodeAct paper link correct (#1870) 2024-05-18 03:54:10 +00:00
மனோஜ்குமார் பழனிச்சாமி
b0b44ed467 Auto restarted Jupyter kernel (#1808)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 08:40:31 +05:30
Frank Xu
9856e76c1f add BrowseInteractiveAction in dummy agent (#1852) 2024-05-17 00:04:17 +00:00
Boxuan Li
b6ff201780 Refactor integration test framework and relieve the pain of regeneration (#1818)
* Update README.md

* Fix WORKSPACE_MOUNT_PATH_IN_SANDBOX variable in regenerate.sh

* Regenerate prompts without calling real LLM

* Disable pytest warning capture

* Change planner agent prompt by a bit for demo

* Regenerate prompt files following prompt changes

* doc: elaborate on FORCE_USE_LLM

* Add another prompt change to monologue_agent for demo purpose

* Regenerate prompts with FORCE_USE_LLM=true

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-16 08:30:29 -07:00
Xingyao Wang
2406b901df feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
* add draft dockerfile for build all

* add rsync for build

* add all-in-one docker

* update prepare scripts

* Update swe_env_box.py

* Add swe_entry.sh (buggy now)

* Parse the test command in swe_entry.sh

* Update README for instance eval in sandbox

* revert specialized config

* replace run_as_devin as an init arg

* set container & run_as_root via args

* update swe entry script

* update env

* remove mounting

* allow error after swe_entry

* update swe_env_box

* move file

* update gitignore

* get swe_env_box a working demo

* support faking user response & provide sandox ahead of time;
also return state for controller

* tweak main to support adding controller kwargs

* add module

* initialize plugin for provided sandbox

* add pip cache to plugin & fix jupyter kernel waiting

* better print Observation output

* add run infer scripts

* update readme

* add utility for getting diff patch

* use get_diff_patch in infer

* update readme

* support cost tracking for codeact

* add swe agent edit hack

* disable color in git diff

* fix git diff cmd

* fix state return

* support limit eval

* increase t imeout and export pip cache

* add eval limit config

* return state when hit turn limit

* save log to file; allow agent to give up

* run eval with max 50 turns

* add outputs to gitignore

* save swe_instance & instruction

* add uuid to swebench

* add streamlit dep

* fix save series

* fix the issue where session id might be duplicated

* allow setting temperature for llm (use 0 for eval)

* Get report from agent running log

* support evaluating task success right after inference.

* remove extra log

* comment out prompt for baseline

* add visualizer for eval

* use plaintext for instruction

* reduce timeout for all; only increase timeout for init

* reduce timeout for all; only increase timeout for init

* ignore sid for swe env

* close sandbox in each eval loop

* update visualizer instruction

* increase max chars

* add finish action to history too

* show test result in metrics

* add sidebars for visualizer

* also visualize swe_instance

* cleanup browser when agent controller finish runinng

* do not mount workspace for swe-eval to avoid accidentally overwrite files

* Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"

This reverts commit 8ef7739054.

* Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files""

This reverts commit 016cfbb9f0.

* run jupyter command via copy to, instead of cp to mount

* only print mixin output when failed

* change ssh box logging

* add visualizer for pass rate

* add instance id to sandbox name

* only remove container we created

* use opendevin logger in main

* support multi-processing infer

* add back metadata, support keyboard interrupt

* remove container with startswith

* make pbar behave correctly

* update instruction w/ multi-processing

* show resolved rate by repo

* rename tmp dir name

* attempt to fix racing for copy to ssh_box

* fix script

* bump swe-bench-all version

* fix ipython with self-contained commands

* add jupyter demo to swe_env_box

* make resolved count two column

* increase height

* do not add glob to url params

* analyze obs length

* print instance id prior to removal handler

* add gold patch in visualizer

* fix interactive git by adding a git --no-pager as alias

* increase max_char to 10k to cover 98% of swe-bench obs cases

* allow parsing note

* prompt v2

* add iteration reminder

* adjust user response

* adjust order

* fix return eval

* fix typo

* add reminder before logging

* remove other resolve rate

* re adjust to new folder structure

* support adding eval note

* fix eval note path

* make sure first log of each instance is printed

* add eval note

* fix the display for visualizer

* tweak visualizer for better git patch reading

* exclude empty patch

* add retry mechanism for swe_env_box start

* fix ssh timeout issue

* add stat field for apply test patch success

* add visualization for fine-grained report

* attempt to support monologue agent by constraining it to single thread

* also log error msg when stopeed

* save error as well

* override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp

* add retry mechanism for sshbox

* remove retry for swe env box

* try to handle loop state stopped

* Add get report scripts

* Add script to convert agent output to swe-bench format

* Merge fine grained report for visualizer

* Update eval readme

* Update README.md

* Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite

* Update the script to get model report

* Update get_model_report.sh

* Update get_agent_report.sh

* Update report merge script

* Add agent output conversion script

* Update swe_lite_env_setup.sh

* Add example swe-bench output files

* Update eval readme

* Remove redundant scripts

* set iteration count down to false by default

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>

* fix prepare_swe_util scripts

* update builder images

* update setup script

* remove swe-bench build workflow

* update lock

* remove experiments since they are moved to hf

* remove visualizer (since it is moved to hf repo)

* simply jupyter execution via heredoc

* update ssh_box

* add initial docker readme

* add pkg-config as dependency

* add script for swe_bench all-in-one docker

* add rsync to builder

* rename var

* update commit

* update readme

* update lock

* support specify timeout for long running tasks

* fix path

* separate building of all deps and files

* support returning states at the end of controller

* remove return None

* support specify timeout for long running tasks

* add timeout for all existing sandbox impl

* fix swe_env_box for new codebase

* update llm config in config.py

* support pass sandbox in

* remove force set

* update eval script

* fix issue of overriding final state

* change default eval output to hf demo

* change default eval output to hf demo

* fix config

* only close it when it is NOT external sandbox

* add scripts

* tweak config

* only put in hostory when state has history attr

* fix agent controller on the case of run out interaction budget

* always assume state is always not none

* remove print of final state

* catch all exception when cannot compute completion cost

* Update README.md

* save source into json

* fix path

* update docker path

* return the final state on close

* merge AgentState with State

* fix integration test

* merge AgentState with State

* fix integration test

* add ChangeAgentStateAction to history in attempt to fix integration

* add back set agent state

* update tests

* update tests

* move scripts for setup

* update script and readme for infer

* do not reset logger when n processes == 1

* update eval_infer scripts and readme

* simplify readme

* copy over dir after eval

* copy over dir after eval

* directly return get state

* update lock

* fix output saving of infer

* replace print with logger

* update eval_infer script

* add back the missing .close

* increase timeout

* copy all swe_bench_format file

* attempt to fix output parsing

* log git commit id as metadata

* fix eval script

* update lock

* update unit tests

* fix argparser unit test

* fix lock

* the deps are now lightweight enough to be incude in make build

* add spaces for tests

* add eval outputs to gitignore

* remove git submodule

* readme

* tweak git email

* update upload instruction

* bump codeact version for eval

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
Co-authored-by: huybery <huybery@gmail.com>
Co-authored-by: Bart Shappee <bshappee@gmail.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-15 16:15:55 +00:00
Frank Xu
a84d19f03c Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807)
* enable browsing in codeact, and  arbitrary browsergym DSL support

* fix

* fix unit test case

* update frontend for the new interactive browsing action

* bump ver

* Fix integration tests

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
2024-05-15 11:59:58 -04:00
Xia Zhenhua
76abca361c feat: simplify state.history with to_memory call in micro-agent. Or the call to LLM may exceed the token limit. (#1806)
* feat: simplify state.history with to_memory call in micro-agent.

* feat: merge master and replace to_memory with event_to_memory.

---------

Co-authored-by: aaren.xzh <aaren.xzh@antfin.com>
2024-05-15 14:47:37 +02:00
Xia Zhenhua
bf14b47890 feat: make other agents support asking user input in MessageAction. (#1777)
* feat: make other agents support asking user input in MessageAction.

* Update agenthub/micro/_instructions/actions/message.md

Co-authored-by: Robert Brennan <accounts@rbren.io>

* Update agenthub/micro/_instructions/actions/message.md

Co-authored-by: Robert Brennan <accounts@rbren.io>

* feat: make other agents support asking user input in MessageAction.

* Regenerate test artifacts

---------

Co-authored-by: aaren.xzh <aaren.xzh@antfin.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-15 00:44:45 -07:00
Boxuan Li
6714000b2c CodeActAgent: Fix iteration reminder (#1803)
This PR includes three changes:
1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100
2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2`
3) Remove legacy ITERATION_REMINDER config
2024-05-15 13:48:47 +08:00
Graham Neubig
3cef8ee187 Add GitHub prompt to CodeAct (#1792)
* Added github to CodeAct

* More codeact

* Simplify prompt

* Modify codeact prompt

* fix integration test for CodeAct

* yet another integration test fix for codeact

* fix plugin use in jupyter

* update edit tests

* fix jupyter plugin potential port conflict

* fix test ipython with latest ipython fix

* update integration test

* wait a bit for jupyter execution

* add one unit tests for sandbox

* fix integration test

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-14 21:25:21 +00:00
Robert Brennan
dcb5d1ce0a Add permanent storage option for EventStream (#1697)
* add storage classes

* add minio

* add event stream storage

* storage test working

* use fixture

* event stream test passing

* better serialization

* factor out serialization pkg

* move more serialization

* fix tests

* fix test

* remove __all__

* add rehydration test

* add more rehydration test

* fix fixture

* fix dict init

* update tests

* lock

* regenerate tests

* Update opendevin/events/stream.py

* revert tests

* revert old integration tests

* only add fields if present

* regen tests

* pin pyarrow

* fix unit tests

* remove cause from memories

* revert tests

* regen tests
2024-05-14 11:09:45 -04:00
Robert Brennan
beb74a19f6 Use event stream for the runtime (#1776)
* rebuild PR from scratch

* fix max_iter

* regenerate tests

* cut down on history

* Update opendevin/controller/agent_controller.py

* regenerate tests

* revert swe agent

* revert some codeact chagnes

* regenerate tests

* add source to dict

* only add source if not none

* try to fix coverage issue

* lock

* add gevent
2024-05-14 13:35:25 +00:00
Robert Brennan
82a798990c refactor remind_iterations (#1760)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* fix merge issues

* update integration tests
2024-05-14 08:27:12 -04:00
Robert Brennan
b028bd46bb Use messages to drive tasks (#1688)
* finish is working

* start reworking main_goal

* remove main_goal from microagents

* remove main_goal from other agents

* fix issues

* revert codeact line

* make plan a subclass of task

* fix frontend for new plan setup

* lint

* fix type

* more lint

* fix build issues

* fix codeact mgs

* fix edge case in regen script

* fix task validation errors

* regenerate integration tests

* fix up tests

* fix sweagent

* revert codeact prompt

* update integration tests

* update integration tests

* handle loading state

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/agent_controller.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/state/plan.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* update docs

* regenerate tests

* remove none from state type

* revert test files

* update integration tests

* rename plan to root_task

* revert plugin perms

* regen integration tests

* tweak integration script

* prettier

* fix test

* set workspace up for regeneration

* regenerate tests

* Change directory of copy

* Updated tests

* Disable PlannerAgent test

* Fix listen

* Updated prompts

* Disable planner again

* Make codecov more lenient

* Update agenthub/README.md

* Update opendevin/server/README.md

* re-enable planner tests

* finish top level tasks

* regen planner

* fix root task factory

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-13 23:14:15 +00:00
Engel Nyst
e5f1dbf5e7 Move json utility to the custom json parsing; apply it to the monologue-like agents (#1740) 2024-05-12 13:39:38 -04:00
Boxuan Li
316a772849 CodeAct: Emphasize open before edit (#1709)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-11 12:20:14 -07:00
Engel Nyst
98adbf54ec Small refactoring (#1614)
* move MemoryCondenser, LongTermMemory, json, out of the monologue

* PlannerAgent and Microagents use the custom json.loads/dumps

* Move short term history out of monologue agent...

* move memory in their package

* add __init__
2024-05-11 17:15:19 +02:00
Boxuan Li
bde12f4a09 CodeActAgent: Fix hack for multiple edits in same command (#1684)
* Fix edit hack for multiple edits in same command

This PR changes ([\s\S]*) to ([\s\S]*?) to make the capturing
group non-greedy. This change ensures that the regex captures
the smallest set of characters that extends up to the first
end_of_edit it encounters, rather than extending across multiple
edit commands.

Without the fix, a bash command consisting of multiple edits
would be corrupt and lead to unexpected edit results.
2024-05-10 23:32:09 -07:00
Graham Neubig
1787a7304e Disable ChromaDB telemetry (#1699) 2024-05-11 01:25:19 +02:00
Bart Shappee
78cd2e5b47 fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)
* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-10 13:57:37 -04:00
Jim Su
f8d4b1ab0d Use generic types (#1680) 2024-05-10 04:21:22 +02:00
Ammar Ladhani
564739d1db Removes upper-case 'List' imports from typing and replaces them with lower-case 'list' (#1676) 2024-05-09 19:05:46 -04:00
Robert Brennan
26d82841d5 Create runtime implementation (#1626)
* first pass at moving runtime

* fix import

* remove github refs

* remove unnecessary import

* remove unnecessary import

* add e2b

* refactor read and write file ops

* remove github test

* rm action

* revert permissions

* regenerate tests

* re-delete file operations

* regenerate integration tests

* Update opendevin/runtime/runtime.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix ref

* add docs

* remove logspam

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-09 19:04:49 -04:00
Engel Nyst
446eaec1e6 Refactor config to dataclasses (#1552)
* mypy is invaluable

* fix config, add test

* Add new-style toml support

* add singleton, small doc fixes

* fix some cases of loading toml, clean up, try to make it clearer

* Add defaults_dict for UI

* allow config to be mutable
error handling
fix toml parsing

* remove debug stuff

* Adapt Makefile

* Add defaults for temperature and top_p

* update to CodeActAgent

* comments

* fix unit tests

* implement groups of llm settings (CLI)

* fix merge issue

* small fix sandboxes, small refactoring

* adapt LLM init to accept overrides at runtime

* reading config is enough

* Encapsulate minimally embeddings initialization

* agent bug fix; fix tests

* fix sandboxes tests

* refactor globals in sandboxes to properties
2024-05-09 22:48:29 +02:00
Xia Zhenhua
4a72e83938 fix: AgentThinkAction deleted caused bug. (#1662)
* fix: AgentThinkAction deleted caused bug.

* fix: AgentThinkAction deleted caused bug in plannerAgent.

* fix: plan content-not-changed caused frontend crash bug.

---------

Co-authored-by: aaren.xzh <aaren.xzh@antfin.com>
2024-05-09 09:04:02 -04:00
Xingyao Wang
21fe8dc1eb Align codeact with swebench eval (#1612)
* align codeact agent with the slight adjustment on eval branch

* update integration test for new prompt

* Regenerate test artifacts for CodeActAgent

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-09 00:42:07 -07:00
Ikko Eltociear Ashimine
4cc462cc18 Update codeact_agent.py (#1653)
splited -> splitted
2024-05-08 13:55:40 -04:00
Boxuan Li
af5bdf67aa Add AgentRejectAction across multiple modules (#1615)
* Add AgentRejectAction across multiple modules

This commit introduces the AgentRejectAction class and integrates it across various modules and actions. It includes updates to READMEs, action definitions, and agent controllers to handle the new 'reject' action. This functionality will allow agents to properly signal task rejection.

* Fix unit test

* Remove wrong generates attributes from a few micro-agents
2024-05-08 10:03:14 -07:00
Robert Brennan
242c4a0df6 Remove extra message actions (#1608)
* remove extra actions

* remove message observations

* support null obs

* handle null obs

* fix frontend for changes

* fix the way messages flow to the UI

* change think to message

* add regen script

* regenerate all integration tests

* change task

* remove gh test

* fix messages

* fix tests

* help agent exit after hitting max iter

* Update opendevin/events/observation/success.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-07 21:13:08 +00:00
Aleksandar
4bf4119259 Introduce TypoFixerAgent for in-place typo corrections in agenthub/micro (#1613)
* Add TypoFixerAgent micro-agent to fix typos

* Improve parse_response to accurately extract the first complete JSON object

* Add tests for parse_response function handling complex scenarios

* Fix tests and logic to use action_from_dict

* Fix small formatting issues
2024-05-07 13:25:35 +02:00
Xingyao Wang
356caf0960 Fix the issue of newly import package by including instruction for Kernel restart (#1609)
* fix the issue of newly import package by including instruction for kernel restart

* fix integration test for new prompt

* fix integration yet again
2024-05-07 06:11:05 +08:00
Frank Xu
26dcf4fd7c remove screenshot in browser observation (#1588)
* remove screenshot in browser observation

* refactor utils

* allow only dict

* fix screenshot not showing up in frontend

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-06 13:56:28 -04:00