Commit Graph

826 Commits

Author SHA1 Message Date
Yizhe Zhang
0c829cd067 Support Entity-Deduction-Arena (EDA) Benchmark (#1931)
* adding draft evaluation code for EDA, using chatgpt as the temporal agent for now

* Update README.md

* Delete frontend/package.json

* reverse the irrelevant changes

* reverse package.json

* use chatgpt as the codeactagent

* integrate with opendevin

* Update evaluation/EDA/README.md

* Update evaluation/EDA/README.md

* Use poetry to manage packages

* integrate with opendevin

* minor update

* minor update

* update poetry

* update README

* clean-up infer scripts

* add run_infer script and improve readme

* log final success and final message & ground truth

---------

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-25 23:17:04 +08:00
Xingyao Wang
28ab00946b update README for GAIA (#2054)
* update README for GAIA

* Update evaluation/gaia/README.md

* Update evaluation/gaia/README.md

* Update evaluation/gaia/README.md

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-25 15:01:03 +00:00
Xingyao Wang
ec68af5b83 fix the openai_api_key detected by agentskills (#2052) 2024-05-25 22:09:07 +08:00
Xingyao Wang
221035d39a Add retry logic to ssh login (#2053)
* add retry logic to ssh login

* Update opendevin/runtime/docker/ssh_box.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-25 12:16:24 +00:00
Shimada666
b31f7701eb Integrate Multimodal tools to agentskills. (#2016)
* suport reading multimodal files

* move file

* update dependency

* remove useless pip install

* add comments

* update the comment

* Apply suggestions from code review

* Add unit test for TXTReader

* pre-commit hook corrupted utf16 test txt

* Revert unnecessary dependency upgrades

* feat: import some readers for agentskill

* add dependencies

* Integrate some multimodal tools

* add shell pip dependency

* update dependencies

* update dependencies

* update print window

* remove __main__

* locally import cv2

* add c library for opencv

* update lock file

* update prompt

* remove unuseful file

* add some unittest

* add unittest & remove excel-related parser

* rollback poetry lock

* remove markdown

* remove requests

* optimize parse_video output

* Fix integration tests for CodeActAgent

* remove test_parse_image unittest

* Add a TODO to containers/sandbox/Dockerfile

* update dependencies

* remove pyproject.toml useless package

* change document via openai key

* Fix prompts after removing some actions

---------

Co-authored-by: Mingchen Zhuge <mczhuge@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Mingchen Zhuge <64179323+mczhuge@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-25 18:58:49 +08:00
Boxuan Li
91f313c914 BrowserEnv: init exception handling (#2050)
* BrowserEnv: init exception handling

* Revert irrelevant changes

* Remove type ignore
2024-05-25 00:17:25 -07:00
மனோஜ்குமார் பழனிச்சாமி
36ff060c1a Added links in docs (#2051) 2024-05-25 11:23:20 +05:30
மனோஜ்குமார் பழனிச்சாமி
cfae6821fa refactored timeout (#2044) 2024-05-24 18:19:14 +02:00
mamoodi
752ce8c4ea Update bug template to include os version (#1982) 2024-05-24 15:58:05 +00:00
dependabot[bot]
cc6895a65c Bump streamlit from 1.34.0 to 1.35.0 (#2037)
Bumps [streamlit](https://github.com/streamlit/streamlit) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/streamlit/streamlit/releases)
- [Commits](https://github.com/streamlit/streamlit/compare/1.34.0...1.35.0)

---
updated-dependencies:
- dependency-name: streamlit
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 23:00:37 +08:00
dependabot[bot]
5538ee9bde Bump @types/react from 18.3.2 to 18.3.3 in /frontend (#2039)
Bumps [@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react) from 18.3.2 to 18.3.3.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react)

---
updated-dependencies:
- dependency-name: "@types/react"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 23:00:08 +08:00
dependabot[bot]
9a0bae6d9b Bump @testing-library/react from 13.4.0 to 15.0.7 in /frontend (#2040)
Bumps [@testing-library/react](https://github.com/testing-library/react-testing-library) from 13.4.0 to 15.0.7.
- [Release notes](https://github.com/testing-library/react-testing-library/releases)
- [Changelog](https://github.com/testing-library/react-testing-library/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testing-library/react-testing-library/compare/v13.4.0...v15.0.7)

---
updated-dependencies:
- dependency-name: "@testing-library/react"
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 22:59:53 +08:00
dependabot[bot]
de0f30f6cc Bump eslint-plugin-react-hooks from 4.6.0 to 4.6.2 in /frontend (#2041)
Bumps [eslint-plugin-react-hooks](https://github.com/facebook/react/tree/HEAD/packages/eslint-plugin-react-hooks) from 4.6.0 to 4.6.2.
- [Release notes](https://github.com/facebook/react/releases)
- [Changelog](https://github.com/facebook/react/blob/main/packages/eslint-plugin-react-hooks/CHANGELOG.md)
- [Commits](https://github.com/facebook/react/commits/HEAD/packages/eslint-plugin-react-hooks)

---
updated-dependencies:
- dependency-name: eslint-plugin-react-hooks
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 22:59:37 +08:00
dependabot[bot]
6ae16dbc48 Bump react-i18next from 14.1.1 to 14.1.2 in /frontend (#2043)
Bumps [react-i18next](https://github.com/i18next/react-i18next) from 14.1.1 to 14.1.2.
- [Changelog](https://github.com/i18next/react-i18next/blob/master/CHANGELOG.md)
- [Commits](https://github.com/i18next/react-i18next/compare/v14.1.1...v14.1.2)

---
updated-dependencies:
- dependency-name: react-i18next
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 22:59:18 +08:00
dependabot[bot]
b6be108f49 Bump monaco-editor from 0.48.0 to 0.49.0 in /frontend (#2042)
Bumps [monaco-editor](https://github.com/microsoft/monaco-editor) from 0.48.0 to 0.49.0.
- [Release notes](https://github.com/microsoft/monaco-editor/releases)
- [Changelog](https://github.com/microsoft/monaco-editor/blob/main/CHANGELOG.md)
- [Commits](https://github.com/microsoft/monaco-editor/compare/v0.48.0...v0.49.0)

---
updated-dependencies:
- dependency-name: monaco-editor
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 22:58:36 +08:00
dependabot[bot]
ef813af9d7 Bump litellm from 1.38.0 to 1.38.2 (#2038)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.38.0 to 1.38.2.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.38.0...v1.38.2)

---
updated-dependencies:
- dependency-name: litellm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 14:51:22 +00:00
dependabot[bot]
909d7b45ef Bump boto3 from 1.34.111 to 1.34.112 (#2036)
Bumps [boto3](https://github.com/boto/boto3) from 1.34.111 to 1.34.112.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/boto/boto3/compare/1.34.111...1.34.112)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 14:50:02 +00:00
Xingyao Wang
e731048ccf Improve action and observation logging for the CLI interface (#2035)
* properly log user messages;
format browser action/obs, summarize action, messages properly for logging

* add source to message

* add spaces for printing
2024-05-24 08:21:25 -04:00
Jiayi Pan
2d52298a1d Support GAIA benchmark (#1911)
* Add gaia test

* Improve gaia prompts

* Fix browser_env hang bug

* Fix gaia bugs

* add gaia to eval readme

* Fix gaia bugs

* minor fix

* add run_infer.sh and update readme

* set num eval worker to 1

* default to 2023 gaia level1 subset

* default to level 1

* add prompt to instruct model enclose answer within <solution> tag

* add missing break

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-24 11:22:28 +00:00
dependabot[bot]
2f6167b953 Bump framer-motion from 11.2.5 to 11.2.6 in /frontend (#2010)
Bumps [framer-motion](https://github.com/framer/motion) from 11.2.5 to 11.2.6.
- [Changelog](https://github.com/framer/motion/blob/main/CHANGELOG.md)
- [Commits](https://github.com/framer/motion/compare/v11.2.5...v11.2.6)

---
updated-dependencies:
- dependency-name: framer-motion
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 10:01:42 +00:00
Boxuan Li
78241d9d43 Add tests for browser agent (#2031)
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 09:59:40 +00:00
Boxuan Li
b13a40c05c README.md: Add CodeCov badge (#2022)
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 09:54:25 +00:00
dependabot[bot]
ad2784d534 Bump ruff from 0.4.4 to 0.4.5 (#2004)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.4.4 to 0.4.5.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.4.4...v0.4.5)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-24 05:51:05 -04:00
Boxuan Li
593b8d468b Fix CI workflows [mac-test] (#2025)
* Fix CI settings

* Stop saving cpu cycles for GitHub

* Conditionally run mac tests

* Random push to trigger CI checks again

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-24 09:25:00 +00:00
sp.wack
ae105c2faf feat(frontend): Add actions to send feedback to backend (#2020)
* Add feedback actions to send to backend

* Uncomment request

* Refactor and disable feedback when sending

* disable defaultProp error

---------

Co-authored-by: amanape <stephanpsaras@gmail.com>
2024-05-24 04:26:06 -04:00
dependabot[bot]
9207a8da01 Bump browsergym from 0.2.6 to 0.3.2 (#2013)
Bumps [browsergym](https://github.com/ServiceNow/BrowserGym) from 0.2.6 to 0.3.2.
- [Release notes](https://github.com/ServiceNow/BrowserGym/releases)
- [Commits](https://github.com/ServiceNow/BrowserGym/compare/v0.2.6...v0.3.2)

---
updated-dependencies:
- dependency-name: browsergym
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-24 16:03:56 +08:00
Frank Xu
53f64ffa06 Improve browsing agent prompts, allowing agent to properly finish when done (#1993)
* improve browsing agent, allowing it to properly finish.

* handle parsing error, show user what the agent's browsing thoughts in the front end

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-24 00:02:19 -07:00
Boxuan Li
c59bcbbffd Minor docstring & prompt fixes for AgentSkills (#2028)
* A few minor fixes to agentskills

* Regenerate prompts

* Remove redundant comment
2024-05-24 13:30:48 +08:00
Xingyao Wang
cbf4c4b4c4 fix ExceptionPxssh (#2023)
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-23 21:24:21 -07:00
Boxuan Li
633ece5f9c Fix integration tests (#2024) 2024-05-23 20:24:31 -07:00
Robert Brennan
9ca2007201 fix json encoding (#2018)
* fix json encoding

* add test

* add another test

* fix integration tests

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-23 23:36:15 +00:00
dependabot[bot]
b492b6293a Bump lint-staged from 15.2.2 to 15.2.4 in /frontend (#2009)
Bumps [lint-staged](https://github.com/okonet/lint-staged) from 15.2.2 to 15.2.4.
- [Release notes](https://github.com/okonet/lint-staged/releases)
- [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md)
- [Commits](https://github.com/okonet/lint-staged/compare/v15.2.2...v15.2.4)

---
updated-dependencies:
- dependency-name: lint-staged
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 12:25:00 -04:00
dependabot[bot]
0a6b26735b Bump @typescript-eslint/parser from 7.9.0 to 7.10.0 in /frontend (#2008)
Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 7.9.0 to 7.10.0.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.10.0/packages/parser)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/parser"
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 12:24:42 -04:00
dependabot[bot]
dff0f1be13 Bump @types/react-syntax-highlighter in /frontend (#2007)
Bumps [@types/react-syntax-highlighter](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-syntax-highlighter) from 15.5.11 to 15.5.13.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-syntax-highlighter)

---
updated-dependencies:
- dependency-name: "@types/react-syntax-highlighter"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 12:24:07 -04:00
dependabot[bot]
e7306b7226 Bump @react-types/shared from 3.23.0 to 3.23.1 in /frontend (#2006)
Bumps [@react-types/shared](https://github.com/adobe/react-spectrum) from 3.23.0 to 3.23.1.
- [Release notes](https://github.com/adobe/react-spectrum/releases)
- [Commits](https://github.com/adobe/react-spectrum/compare/@react-types/shared@3.23.0...@react-types/shared@3.23.1)

---
updated-dependencies:
- dependency-name: "@react-types/shared"
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 12:23:49 -04:00
DaxServer
b118df606f build: Add poetry command to use Python 3.11 for environment setup (#1972) 2024-05-23 12:05:19 -04:00
Xingyao Wang
602ffcdffb Implement agentskills for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
* add draft for skills

* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file

* Remove new_sample.txt file

* add some work from opendevin w/ fixes

* Add unit tests for agentskills module

* fix some issues and updated tests

* add more tests for open

* tweak and handle goto_line

* add tests for some edge cases

* add tests for scrolling

* add tests for edit

* add tests for search_dir

* update tests to use pytest

* use pytest --forked to avoid file op unit tests to interfere with each other via global var

* update doc based on swe agent tool

* update and add tests for find_file and search_file

* move agent_skills to plugins

* add agentskills as plugin and docs

* add agentskill to ssh box and fix sandbox integration

* remove extra returns in doc

* add agentskills to initial tool for jupyter

* support re-init jupyter kernel (for agentskills) after restart

* fix print window's issue with indentation and add testcases

* add prompt for codeact with the newest edit primitives

* modify the way line number is presented (remove leading space)

* change prompt to the newest display format

* support tracking of costs via metrics

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* implement and add tests for py linting

* remove extra text arg for incompatible subprocess ver

* remove sample.txt

* update test_edits integration tests

* fix all integration

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/runtime/plugins/agent_skills/agentskills.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* Revert "correctly setup plugins for swebench eval"

This reverts commit 2bd1055673.

* bump version

* remove _AGENT_SKILLS_DOCS

* move flake8 to test dep

* update poetry.lock

* remove extra arg

* reduce max iter for eval

* update poetry

* fix integration tests

---------

Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-23 16:04:09 +00:00
Robert Brennan
ea9c785075 fix session state after resuming (#1999)
* fix state resuming

* fix session reconnection

* fix lint
2024-05-23 11:47:36 -04:00
Xingyao Wang
6ff50ed369 Fix SWE-Bench evaluation due to setuptools version (#1995)
* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* Revert "correctly setup plugins for swebench eval"

This reverts commit 2bd1055673.

* bump version
2024-05-23 23:17:42 +08:00
dependabot[bot]
d6327f99ce Bump litellm from 1.37.20 to 1.38.0 (#2005)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.37.20 to 1.38.0.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.37.20...v1.38.0)

---
updated-dependencies:
- dependency-name: litellm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 14:57:56 +00:00
dependabot[bot]
1c40ea5222 Bump docker from 7.0.0 to 7.1.0 (#2002)
Bumps [docker](https://github.com/docker/docker-py) from 7.0.0 to 7.1.0.
- [Release notes](https://github.com/docker/docker-py/releases)
- [Commits](https://github.com/docker/docker-py/compare/7.0.0...7.1.0)

---
updated-dependencies:
- dependency-name: docker
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 16:45:46 +02:00
dependabot[bot]
58d45a1a8a Bump boto3 from 1.34.110 to 1.34.111 (#2001)
Bumps [boto3](https://github.com/boto/boto3) from 1.34.110 to 1.34.111.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/boto/boto3/compare/1.34.110...1.34.111)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-23 14:42:31 +00:00
Niklas Muennighoff
ef6cdb7532 HumanEvalFix integration (#1908)
* Preliminary HumanEvalFix integration

* Clean paths

* fix: set workspace path correctly for config
fix: task in that contains /

* add missing run_infer.sh

* update run_infer w/o hard coded agent

* fix typo

* change `instance_id` to `task_id`

* add the warning and env var setting to run_infer.sh

* reset back workspace mount at the end of each instance

* 10 max iter is probably enough for humanevalfix

* Remove unneeded section

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* Fix link

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Use logger

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Update run_infer.py

fix a bug:

ERROR:concurrent.futures:exception calling callback for <Future at 0x309cbc470 state=finished raised NameError>
concurrent.futures.process._RemoteTraceback:

* Update README.md

* Update README.md

* Update README.md

* Update README.md

added an example

* Update README.md

added: enable_auto_lint = true

* Update pyproject.toml

add: evaluate package

* Delete poetry.lock

update poetry.lock

* update poetry.lock

update poetry.lock

* Update README.md

* Update README.md

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Robert <871607149@qq.com>
2024-05-23 13:09:40 +00:00
Aaron Xia
f53a91b17c fix: catch session file not existed exception when init EventStream(maybe creating a new session with no session files stored). (#1994) 2024-05-23 11:35:55 +00:00
Engel Nyst
0eccf31604 Refactor monologue and SWE agent to use the messages in state history (#1863)
* Refactor monologue to use the messages in state history

* add messages, clean up

* fix monologue

* update integration tests

* move private method

* update SWE agent to use the history from State

* integration tests for SWE agent

* rename monologue to initial_thoughts, since that is what it is
2024-05-23 07:29:12 +00:00
Jeremi Joslin
3235836b00 Fix typo in prompt (#1992) 2024-05-23 07:11:46 +00:00
Boxuan Li
a605e59b7e Save CI cycles for backend tests (#1985) 2024-05-23 00:10:13 -07:00
jiangleo
d1475e6e04 Fix Repeated Responses in Chat by Adding IPythonRunCellObservation (#1987)
Co-authored-by: jianghongwei <jianghongwei@58.com>
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2024-05-23 11:10:27 +05:30
Boxuan Li
acb430eef5 Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3bfb9.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 20:38:57 -07:00
Engel Nyst
b9a5be2569 Add ruff for shared mutable defaults (B) (#1938)
* Add ruff for shared mutable defaults (B)

* Apply B006, B008 on current files, except fast API

* Update agenthub/SWE_agent/prompts.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix unintended behavior change

* this is correct, tell Ruff to leave it alone

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-22 20:06:00 -07:00