Compare commits

...

46 Commits

Author SHA1 Message Date
Graham Neubig 7a2122ebc2 Default to gpt-4o (#2158)
* Default to gpt-4o

* Fix default
2024-05-31 14:44:07 +00:00
dependabot[bot] a7b19a0048 Bump @nextui-org/react from 2.4.0 to 2.4.1 in /frontend (#2161)
Bumps [@nextui-org/react](https://github.com/nextui-org/nextui/tree/HEAD/packages/core/react) from 2.4.0 to 2.4.1.
- [Release notes](https://github.com/nextui-org/nextui/releases)
- [Changelog](https://github.com/nextui-org/nextui/blob/canary/packages/core/react/CHANGELOG.md)
- [Commits](https://github.com/nextui-org/nextui/commits/@nextui-org/react@2.4.1/packages/core/react)

---
updated-dependencies:
- dependency-name: "@nextui-org/react"
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 14:32:21 +00:00
dependabot[bot] e6c8e1c9d2 Bump framer-motion from 11.2.9 to 11.2.10 in /frontend (#2160)
Bumps [framer-motion](https://github.com/framer/motion) from 11.2.9 to 11.2.10.
- [Changelog](https://github.com/framer/motion/blob/main/CHANGELOG.md)
- [Commits](https://github.com/framer/motion/compare/v11.2.9...v11.2.10)

---
updated-dependencies:
- dependency-name: framer-motion
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 14:30:14 +00:00
Boxuan Li 4d14b44a9a SWE-bench: Add summarise utility script to view passed/failed task IDs (#2137)
* SWE-bench: Add summarise utility script to view passed/failed task IDs

* Fix typos

* Move file

* Prettify

* Use merged jsonl file
2024-05-31 12:32:17 +08:00
Boxuan Li f188abd7a3 Delete evaluation outputs files (#2152)
* Delete evaluation outputs files

* Fix README
2024-05-31 03:12:27 +00:00
மனோஜ்குமார் பழனிச்சாமி 961c96a2a1 Added ssh_password to config setup (#2139)
Co-authored-by: Aleksandar <isavitaisa@gmail.com>
2024-05-31 07:26:16 +05:30
dependabot[bot] f4bc52461a Bump openai from 1.30.4 to 1.30.5 (#2144)
Bumps [openai](https://github.com/openai/openai-python) from 1.30.4 to 1.30.5.
- [Release notes](https://github.com/openai/openai-python/releases)
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-python/compare/v1.30.4...v1.30.5)

---
updated-dependencies:
- dependency-name: openai
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:29:38 +08:00
dependabot[bot] cd6f863a49 Bump litellm from 1.39.2 to 1.39.3 (#2145)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.39.2 to 1.39.3.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.39.2...v1.39.3)

---
updated-dependencies:
- dependency-name: litellm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:29:11 +08:00
dependabot[bot] 486c5d983f Bump json-repair from 0.20.1 to 0.21.0 (#2146)
Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.20.1 to 0.21.0.
- [Release notes](https://github.com/mangiucugna/json_repair/releases)
- [Commits](https://github.com/mangiucugna/json_repair/compare/0.20.1...0.21.0)

---
updated-dependencies:
- dependency-name: json-repair
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:28:55 +08:00
dependabot[bot] 33d9882621 Bump @types/node from 18.19.30 to 20.12.13 in /frontend (#2147)
Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 18.19.30 to 20.12.13.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

---
updated-dependencies:
- dependency-name: "@types/node"
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:28:31 +08:00
dependabot[bot] 2fcaa2328e Bump boto3 from 1.34.113 to 1.34.115 (#2143)
Bumps [boto3](https://github.com/boto/boto3) from 1.34.113 to 1.34.115.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/boto/boto3/compare/1.34.113...1.34.115)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:24:59 +08:00
Rahul Anand a0373900be Docs enhancement (#2138)
* fix #2123

* Docs enhancement

* Update docs/src/components/CustomFooter.tsx

Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>

* Update docs/src/components/CustomFooter.tsx

Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>

* Update docs/src/pages/faq.tsx

Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>

---------

Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-05-30 17:05:09 +03:00
Ren Ma a9823491e6 Support Logic Reasoning Benchmark (#1973) 2024-05-30 16:35:15 +08:00
Xingyao Wang 01ef90205d Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105)
* update swe_bench prompt;
use minimal prompt for codeact;

* upgrade agentskills and update testcases

* update infer prompt

* fix cwd

* add icl for swebench

* also log in_context_example to run infer

* remove extra print

* change prompt to abs path

* update error message to include current file info

* change cwd for jupyter if needed

* update edit error message

* update prompt

* improve git get patch

* update hint string

* default to 50 turns

* revert changes from codeact agent and create new CodeActSWEAgent

* revert changes to codeact

* revert instructions for run infer

* revert instructions for run infer

* update README

* update max iter

* add codeact swe agent

* fix issue for CodeActSWEAgent

* allow specifying max iter in cmdline script

* stop printing

* Update agenthub/codeact_swe_agent/README.md

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Fix prompt regression in jupyter plugin

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-29 21:19:00 -07:00
Aaron Xia b1ec8e5dc2 style: Update agent_controller.py to clean log (#2124) 2024-05-29 18:56:11 -07:00
Rahul Anand b3cce763a2 fix #2123 (#2125) 2024-05-29 17:56:45 -04:00
Robert Brennan 89ac732cb6 Adjust docs a bit (#2135)
* tweak docs a bit

* move warning
2024-05-29 17:56:28 -04:00
dependabot[bot] eb1e0e9da8 Bump llama-index-embeddings-huggingface from 0.2.0 to 0.2.1 (#2132) 2024-05-29 20:48:14 +00:00
dependabot[bot] ab454e122a Bump browsergym from 0.3.3 to 0.3.4 (#2127)
Bumps [browsergym](https://github.com/ServiceNow/BrowserGym) from 0.3.3 to 0.3.4.
- [Release notes](https://github.com/ServiceNow/BrowserGym/releases)
- [Commits](https://github.com/ServiceNow/BrowserGym/compare/v0.3.3...v0.3.4)

---
updated-dependencies:
- dependency-name: browsergym
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-29 15:42:21 -04:00
dependabot[bot] cf95f1aabe Bump ruff from 0.4.5 to 0.4.6 (#2130)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.4.5 to 0.4.6.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.4.5...v0.4.6)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 00:53:53 +08:00
dependabot[bot] b011190b40 Bump litellm from 1.38.11 to 1.39.2 (#2133)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.38.11 to 1.39.2.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.38.11...v1.39.2)

---
updated-dependencies:
- dependency-name: litellm
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-29 15:51:50 +00:00
dependabot[bot] 439e9c0e60 Bump openai from 1.30.3 to 1.30.4 (#2131) 2024-05-29 15:44:51 +00:00
dependabot[bot] 53b3309a5a Bump @typescript-eslint/eslint-plugin from 7.10.0 to 7.11.0 in /frontend (#2129) 2024-05-29 15:29:54 +00:00
dependabot[bot] c45123ddb2 Bump framer-motion from 11.2.6 to 11.2.9 in /frontend (#2128) 2024-05-29 15:29:40 +00:00
dependabot[bot] af3ddddd33 Bump lint-staged from 15.2.4 to 15.2.5 in /frontend (#2126)
Bumps [lint-staged](https://github.com/okonet/lint-staged) from 15.2.4 to 15.2.5.
- [Release notes](https://github.com/okonet/lint-staged/releases)
- [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md)
- [Commits](https://github.com/okonet/lint-staged/compare/v15.2.4...v15.2.5)

---
updated-dependencies:
- dependency-name: lint-staged
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-29 15:29:30 +00:00
மனோஜ்குமார் பழனிச்சாமி d4ccd48af8 Persistent docker session (#1998)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-29 13:22:34 +00:00
Robert Brennan 03386a81e0 fix file uploads (#2102)
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-05-29 13:22:22 +00:00
மனோஜ்குமார் பழனிச்சாமி 343e5c73ae Parsed model_name for model_info (#2122) 2024-05-29 16:54:27 +08:00
Prithvi 13d04fa36c Fix issue #2029: Replace defaultProps with JavaScript default parameters (#2106)
* updated basemodal

Updated the basemodal.tsx file by removing the  BaseModal.defaultProps block and including the default values directly within the function parameters.

* Removed DefaultProps from the files

Removed DefaultProps from the files:
AgentControlBar.tsx, ChatInput.tsx, ExplorerTree.tsx, TreeNode.tsx, IconButton.tsx, HeaderContent.tsx, AutocompleteCombobox.tsx

and replaced the usage of defaultProps with JavaScript default parameters in the given components.

* Removed comments and updated eslintrc

Removed all the comments (Removed the defaultProps block comment), and updated the ESLint rules to ignore the defaultProps warning thrown by ESLint.

* Finished Linting Succesfully.

Ran the lint command with the --fix and --write arg to fix all remaining issues and errors before pushing. Thanks a lot @amanape for the support!

---------

Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-05-29 09:49:50 +03:00
Boxuan Li 9b371b1b5f Refactor agent delegation and tweak micro agents (#1910)
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.
2024-05-28 20:01:16 -07:00
mamoodi c37a474dc5 doc: Small fix for development.md and docs (#2119) 2024-05-28 20:43:58 +00:00
dependabot[bot] b9aee7046c Bump @nextui-org/react from 2.3.6 to 2.4.0 in /frontend (#2115)
Bumps [@nextui-org/react](https://github.com/nextui-org/nextui/tree/HEAD/packages/core/react) from 2.3.6 to 2.4.0.
- [Release notes](https://github.com/nextui-org/nextui/releases)
- [Changelog](https://github.com/nextui-org/nextui/blob/canary/packages/core/react/CHANGELOG.md)
- [Commits](https://github.com/nextui-org/nextui/commits/@nextui-org/react@2.4.0/packages/core/react)

---
updated-dependencies:
- dependency-name: "@nextui-org/react"
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-29 00:32:02 +08:00
dependabot[bot] 2a12642228 Bump uvicorn from 0.29.0 to 0.30.0 (#2111)
Bumps [uvicorn](https://github.com/encode/uvicorn) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.29.0...0.30.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 23:33:32 +08:00
dependabot[bot] 535d316a89 Bump vite from 5.2.11 to 5.2.12 in /frontend (#2112)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 5.2.11 to 5.2.12.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v5.2.12/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 23:33:06 +08:00
dependabot[bot] e3e4aaa31b Bump eslint-plugin-react from 7.34.1 to 7.34.2 in /frontend (#2113)
Bumps [eslint-plugin-react](https://github.com/jsx-eslint/eslint-plugin-react) from 7.34.1 to 7.34.2.
- [Release notes](https://github.com/jsx-eslint/eslint-plugin-react/releases)
- [Changelog](https://github.com/jsx-eslint/eslint-plugin-react/blob/v7.34.2/CHANGELOG.md)
- [Commits](https://github.com/jsx-eslint/eslint-plugin-react/compare/v7.34.1...v7.34.2)

---
updated-dependencies:
- dependency-name: eslint-plugin-react
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 23:32:38 +08:00
dependabot[bot] 6640a247c0 Bump @typescript-eslint/parser from 7.10.0 to 7.11.0 in /frontend (#2114)
Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 7.10.0 to 7.11.0.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.11.0/packages/parser)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/parser"
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 23:32:16 +08:00
dependabot[bot] 5b16ca7a45 Bump litellm from 1.38.10 to 1.38.11 (#2110)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.38.10 to 1.38.11.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.38.10...v1.38.11)

---
updated-dependencies:
- dependency-name: litellm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-28 16:57:43 +02:00
Ryan H. Tran 9434bcce48 Support MINT benchmark (MATH, GSM8K subset) (#1955)
* setup boilerplate and README

* setup test script and load dataset

* add temp intg that works

* refactor code

* add solution evaluation through 'fake_user_response_fn'

* finish integrating MATH subset

* Update evaluation/mint/run_infer.py

* Update evaluation/mint/run_infer.sh

* Update opendevin/core/main.py

* remove redudant templates, add eval_note, update README

* use <execute_ipython> tag instead of <execute>

* hardcode AGENT option for run_infer.sh

* Update evaluation/mint/task.py

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* fix: bug no message returned when task's success

* change message to make the agent exit

* import bash abstractmethod

* install all required packages inside sandbox before the agent runs, adjust prompt

* add subset eval folder separation and test for gsm8k

* fix bug in Reasoning task result check, add requirements.txt

* Fix syntax error in evaluation/mint/run_infer.py

* update README, add default values for `SUBSET` and `EVAL_LIMIT`

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-28 07:42:52 +00:00
dependabot[bot] 110c530582 Bump browsergym from 0.3.2 to 0.3.3 (#2091)
Bumps [browsergym](https://github.com/ServiceNow/BrowserGym) from 0.3.2 to 0.3.3.
- [Release notes](https://github.com/ServiceNow/BrowserGym/releases)
- [Commits](https://github.com/ServiceNow/BrowserGym/compare/v0.3.2...v0.3.3)

---
updated-dependencies:
- dependency-name: browsergym
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:32:30 +08:00
dependabot[bot] 1b10c5bcb8 Bump @testing-library/user-event from 13.5.0 to 14.5.2 in /frontend (#2096)
Bumps [@testing-library/user-event](https://github.com/testing-library/user-event) from 13.5.0 to 14.5.2.
- [Release notes](https://github.com/testing-library/user-event/releases)
- [Changelog](https://github.com/testing-library/user-event/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testing-library/user-event/compare/v13.5.0...v14.5.2)

---
updated-dependencies:
- dependency-name: "@testing-library/user-event"
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:32:21 +08:00
dependabot[bot] e52e24c5d5 Bump jsdom from 24.0.0 to 24.1.0 in /frontend (#2097)
Bumps [jsdom](https://github.com/jsdom/jsdom) from 24.0.0 to 24.1.0.
- [Release notes](https://github.com/jsdom/jsdom/releases)
- [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md)
- [Commits](https://github.com/jsdom/jsdom/compare/24.0.0...24.1.0)

---
updated-dependencies:
- dependency-name: jsdom
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:32:12 +08:00
dependabot[bot] b570354357 Bump openai from 1.30.1 to 1.30.3 (#2090)
Bumps [openai](https://github.com/openai/openai-python) from 1.30.1 to 1.30.3.
- [Release notes](https://github.com/openai/openai-python/releases)
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-python/compare/v1.30.1...v1.30.3)

---
updated-dependencies:
- dependency-name: openai
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:31:45 +08:00
dependabot[bot] 848746a1c8 Bump boto3 from 1.34.112 to 1.34.113 (#2092)
Bumps [boto3](https://github.com/boto/boto3) from 1.34.112 to 1.34.113.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/boto/boto3/compare/1.34.112...1.34.113)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:31:32 +08:00
dependabot[bot] bbcc1ab171 Bump json-repair from 0.19.2 to 0.20.1 (#2093)
Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.19.2 to 0.20.1.
- [Release notes](https://github.com/mangiucugna/json_repair/releases)
- [Commits](https://github.com/mangiucugna/json_repair/compare/0.19.2...0.20.1)

---
updated-dependencies:
- dependency-name: json-repair
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:31:23 +08:00
dependabot[bot] e6ae9ae259 Bump @vitejs/plugin-react from 4.2.1 to 4.3.0 in /frontend (#2094)
Bumps [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/tree/HEAD/packages/plugin-react) from 4.2.1 to 4.3.0.
- [Release notes](https://github.com/vitejs/vite-plugin-react/releases)
- [Changelog](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite-plugin-react/commits/v4.3.0/packages/plugin-react)

---
updated-dependencies:
- dependency-name: "@vitejs/plugin-react"
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:31:17 +08:00
dependabot[bot] 9507f4426a Bump typescript from 5.4.4 to 5.4.5 in /frontend (#2098)
Bumps [typescript](https://github.com/Microsoft/TypeScript) from 5.4.4 to 5.4.5.
- [Release notes](https://github.com/Microsoft/TypeScript/releases)
- [Changelog](https://github.com/microsoft/TypeScript/blob/main/azure-pipelines.release.yml)
- [Commits](https://github.com/Microsoft/TypeScript/compare/v5.4.4...v5.4.5)

---
updated-dependencies:
- dependency-name: typescript
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-27 23:31:05 +08:00
160 changed files with 8302 additions and 1839 deletions
+3
View File
@@ -10,6 +10,9 @@ on:
- main
pull_request:
env:
PERSIST_SANDBOX : "false"
jobs:
test:
runs-on: ubuntu-latest
@@ -15,6 +15,9 @@ on:
- 'evaluation/**'
pull_request:
env:
PERSIST_SANDBOX : "false"
jobs:
integration-tests-on-linux:
name: Integration Tests on Linux
+3
View File
@@ -15,6 +15,9 @@ on:
- 'evaluation/**'
pull_request:
env:
PERSIST_SANDBOX : "false"
jobs:
test-on-macos:
name: Test on macOS
+4 -3
View File
@@ -5,8 +5,8 @@ This guide is for people working on OpenDevin and editing the source code.
### 1. Requirements
* Linux, Mac OS, or [WSL on Windows](https://learn.microsoft.com/en-us/windows/wsl/install)
* [Docker](https://docs.docker.com/engine/install/)(For those on MacOS, make sure to allow the default Docker socket to be used from advanced settings!)
* [Python](https://www.python.org/downloads/) >= 3.11
* [Docker](https://docs.docker.com/engine/install/) (For those on MacOS, make sure to allow the default Docker socket to be used from advanced settings!)
* [Python](https://www.python.org/downloads/) = 3.11
* [NodeJS](https://nodejs.org/en/download/package-manager) >= 18.17.1
* [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) >= 1.8
@@ -45,6 +45,7 @@ To configure the LM of your choice, follow these steps:
make setup-config
```
This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenDevin is tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI, please set the model in the UI.
Set `persist_sandbox` to false if you want to use clean sandbox for each task. If `persist_sandbox` is set to true, you will need to set the `ssh_password` as well.
**Note on Alternative Models:**
Some alternative models may prove more challenging to tame than others. Fear not, brave adventurer! We shall soon unveil LLM-specific documentation to guide you on your quest. And if you've already mastered the art of wielding a model other than OpenAI's GPT, we encourage you to [share your setup instructions with us](https://github.com/OpenDevin/OpenDevin/issues/417).
@@ -98,4 +99,4 @@ Please refer to [this README](./tests/integration/README.md) for details.
### 9. Add or update dependency
1. Add your dependency in `pyproject.toml` or use `peotry add xxx`
2. Update the poetry.lock file via `poetry lock --no-update`
2. Update the poetry.lock file via `poetry lock --no-update`
+10 -1
View File
@@ -7,7 +7,7 @@ BACKEND_PORT = 3000
BACKEND_HOST = "127.0.0.1:$(BACKEND_PORT)"
FRONTEND_PORT = 3001
DEFAULT_WORKSPACE_DIR = "./workspace"
DEFAULT_MODEL = "gpt-3.5-turbo"
DEFAULT_MODEL = "gpt-4o"
CONFIG_FILE = config.toml
PRECOMMIT_CONFIG_PATH = "./dev_config/python/.pre-commit-config.yaml"
@@ -226,6 +226,15 @@ setup-config-prompts:
workspace_dir=$${workspace_dir:-$(DEFAULT_WORKSPACE_DIR)}; \
echo "workspace_base=\"$$workspace_dir\"" >> $(CONFIG_FILE).tmp
@read -p "Do you want to persist the sandbox container? [true/false] [default: true]: " persist_sandbox; \
persist_sandbox=$${persist_sandbox:-true}; \
if [ "$$persist_sandbox" = "true" ]; then \
read -p "Enter a password for the sandbox container: " ssh_password; \
echo "ssh_password=\"$$ssh_password\"" >> $(CONFIG_FILE).tmp; \
else \
echo "persist_sandbox=\"$$persist_sandbox\"" >> $(CONFIG_FILE).tmp
fi
@echo "" >> $(CONFIG_FILE).tmp
@echo "[llm]" >> $(CONFIG_FILE).tmp
+7 -6
View File
@@ -51,20 +51,21 @@ You must be using Linux, Mac OS, or WSL on Windows.
To start the app, run these commands, replacing `$(pwd)/workspace` with the directory you want OpenDevin to work with.
> [!WARNING]
> OpenDevin runs bash commands within a Docker sandbox, so it should not affect your machine.
> But your workspace directory will be attached to that sandbox, and files in the directory may be modified or deleted.
```bash
# The directory you want OpenDevin to work with. MUST be an absolute path!
export WORKSPACE_BASE=$(pwd)/workspace;
```
> [!WARNING]
> OpenDevin runs bash commands within a Docker sandbox, so it should not affect your machine.
> But your workspace directory will be attached to that sandbox, and files in the directory may be modified or deleted.
```bash
docker run \
-it \
docker run -it \
--pull=always \
-e SANDBOX_USER_ID=$(id -u) \
-e PERSIST_SANDBOX="true" \
-e SSH_PASSWORD="make something up here" \
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
-v $WORKSPACE_BASE:/opt/workspace_base \
-v /var/run/docker.sock:/var/run/docker.sock \
+2
View File
@@ -12,6 +12,7 @@ from . import ( # noqa: E402
SWE_agent,
browsing_agent,
codeact_agent,
codeact_swe_agent,
delegator_agent,
dummy_agent,
monologue_agent,
@@ -21,6 +22,7 @@ from . import ( # noqa: E402
__all__ = [
'monologue_agent',
'codeact_agent',
'codeact_swe_agent',
'planner_agent',
'SWE_agent',
'delegator_agent',
+15 -9
View File
@@ -105,6 +105,18 @@ def truncate_observation(observation: str, max_chars: int = 10_000) -> str:
)
# FIXME: We can tweak these two settings to create MicroAgents specialized toward different area
def get_system_message() -> str:
if ENABLE_GITHUB:
return f'{SYSTEM_PREFIX}\n{GITHUB_MESSAGE}\n\n{COMMAND_DOCS}\n\n{SYSTEM_SUFFIX}'
else:
return f'{SYSTEM_PREFIX}\n\n{COMMAND_DOCS}\n\n{SYSTEM_SUFFIX}'
def get_in_context_example() -> str:
return EXAMPLES
class CodeActAgent(Agent):
VERSION = '1.5'
"""
@@ -152,11 +164,8 @@ class CodeActAgent(Agent):
]
jupyter_kernel_init_code: str = 'from agentskills import *'
system_message: str = (
f'{SYSTEM_PREFIX}\n{GITHUB_MESSAGE}\n\n{COMMAND_DOCS}\n\n{SYSTEM_SUFFIX}'
if ENABLE_GITHUB
else f'{SYSTEM_PREFIX}\n\n{COMMAND_DOCS}\n\n{SYSTEM_SUFFIX}'
)
system_message: str = get_system_message()
in_context_example: str = f"Here is an example of how you can interact with the environment for task solving:\n{get_in_context_example()}\n\nNOW, LET'S START!"
def __init__(
self,
@@ -194,10 +203,7 @@ class CodeActAgent(Agent):
"""
messages: list[dict[str, str]] = [
{'role': 'system', 'content': self.system_message},
{
'role': 'user',
'content': f"Here is an example of how you can interact with the environment for task solving:\n{EXAMPLES}\n\nNOW, LET'S START!",
},
{'role': 'user', 'content': self.in_context_example},
]
for prev_action, obs in state.history:
+11 -3
View File
@@ -8,17 +8,23 @@ COMMAND_DOCS = (
"Please note that THE `edit_file` FUNCTION REQUIRES PROPER INDENTATION. If the assistant would like to add the line ' print(x)', it must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run."
)
SYSTEM_PREFIX = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
# ======= SYSTEM MESSAGE =======
MINIMAL_SYSTEM_PREFIX = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "<execute_ipython>" tag, for example:
<execute_ipython>
print("Hello World!")
</execute_ipython>
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
The assistant can browse the Internet with commands on behalf of the user by wrapping them with <execute_browse> and </execute_browse>.
"""
BROWSING_PREFIX = """The assistant can browse the Internet with commands on behalf of the user by wrapping them with <execute_browse> and </execute_browse>.
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them."""
"""
PIP_INSTALL_PREFIX = """The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them."""
SYSTEM_PREFIX = MINIMAL_SYSTEM_PREFIX + BROWSING_PREFIX + PIP_INSTALL_PREFIX
GITHUB_MESSAGE = """To do any activities on GitHub, the assistant should use the token in the $GITHUB_TOKEN environment variable.
For instance, to push a local branch `my_branch` to the github repo `owner/repo`, the assistant can use the following four commands:
@@ -30,6 +36,8 @@ The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <ex
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
"""
# ======= EXAMPLE MESSAGE =======
EXAMPLES = """
--- START OF EXAMPLE ---
+7
View File
@@ -0,0 +1,7 @@
# CodeAct (SWE Edit Specialized)
This agent is an adaptation of the original [SWE Agent](https://swe-agent.com/) based on CodeAct using the `agentskills` library of OpenDevin.
Its intended use is **solving Github issues**.
It removes web-browsing and Github capability from the original CodeAct agent to avoid confusion to the agent.
+5
View File
@@ -0,0 +1,5 @@
from opendevin.controller.agent import Agent
from .codeact_swe_agent import CodeActSWEAgent
Agent.register('CodeActSWEAgent', CodeActSWEAgent)
@@ -0,0 +1,246 @@
import re
from agenthub.codeact_swe_agent.prompt import (
COMMAND_DOCS,
MINIMAL_SYSTEM_PREFIX,
SWE_EXAMPLE,
SYSTEM_SUFFIX,
)
from opendevin.controller.agent import Agent
from opendevin.controller.state.state import State
from opendevin.events.action import (
Action,
AgentFinishAction,
BrowseInteractiveAction,
CmdRunAction,
IPythonRunCellAction,
MessageAction,
)
from opendevin.events.observation import (
BrowserOutputObservation,
CmdOutputObservation,
IPythonRunCellObservation,
)
from opendevin.llm.llm import LLM
from opendevin.runtime.plugins import (
AgentSkillsRequirement,
JupyterRequirement,
PluginRequirement,
)
def parse_response(response) -> str:
action = response.choices[0].message.content
for lang in ['bash', 'ipython', 'browse']:
if f'<execute_{lang}>' in action and f'</execute_{lang}>' not in action:
action += f'</execute_{lang}>'
return action
def action_to_str(action: Action) -> str:
if isinstance(action, CmdRunAction):
return f'{action.thought}\n<execute_bash>\n{action.command}\n</execute_bash>'
elif isinstance(action, IPythonRunCellAction):
return f'{action.thought}\n<execute_ipython>\n{action.code}\n</execute_ipython>'
elif isinstance(action, BrowseInteractiveAction):
return f'{action.thought}\n<execute_browse>\n{action.browser_actions}\n</execute_browse>'
elif isinstance(action, MessageAction):
return action.content
return ''
def get_action_message(action: Action) -> dict[str, str] | None:
if (
isinstance(action, BrowseInteractiveAction)
or isinstance(action, CmdRunAction)
or isinstance(action, IPythonRunCellAction)
or isinstance(action, MessageAction)
):
return {
'role': 'user' if action.source == 'user' else 'assistant',
'content': action_to_str(action),
}
return None
def get_observation_message(obs) -> dict[str, str] | None:
if isinstance(obs, CmdOutputObservation):
content = 'OBSERVATION:\n' + truncate_observation(obs.content)
content += (
f'\n[Command {obs.command_id} finished with exit code {obs.exit_code}]]'
)
return {'role': 'user', 'content': content}
elif isinstance(obs, IPythonRunCellObservation):
content = 'OBSERVATION:\n' + obs.content
# replace base64 images with a placeholder
splitted = content.split('\n')
for i, line in enumerate(splitted):
if '![image](data:image/png;base64,' in line:
splitted[i] = (
'![image](data:image/png;base64, ...) already displayed to user'
)
content = '\n'.join(splitted)
content = truncate_observation(content)
return {'role': 'user', 'content': content}
elif isinstance(obs, BrowserOutputObservation):
content = 'OBSERVATION:\n' + truncate_observation(obs.content)
return {'role': 'user', 'content': content}
return None
def truncate_observation(observation: str, max_chars: int = 10_000) -> str:
"""
Truncate the middle of the observation if it is too long.
"""
if len(observation) <= max_chars:
return observation
half = max_chars // 2
return (
observation[:half]
+ '\n[... Observation truncated due to length ...]\n'
+ observation[-half:]
)
def get_system_message() -> str:
return f'{MINIMAL_SYSTEM_PREFIX}\n\n{COMMAND_DOCS}\n\n{SYSTEM_SUFFIX}'
def get_in_context_example() -> str:
return SWE_EXAMPLE
class CodeActSWEAgent(Agent):
VERSION = '1.5'
"""
This agent is an adaptation of the original [SWE Agent](https://swe-agent.com/) based on CodeAct 1.5 using the `agentskills` library of OpenDevin.
It is intended use is **solving Github issues**.
It removes web-browsing and Github capability from the original CodeAct agent to avoid confusion to the agent.
"""
sandbox_plugins: list[PluginRequirement] = [
# NOTE: AgentSkillsRequirement need to go before JupyterRequirement, since
# AgentSkillsRequirement provides a lot of Python functions
# and it need to be initialized before Jupyter for Jupyter to use those functions.
AgentSkillsRequirement(),
JupyterRequirement(),
]
jupyter_kernel_init_code: str = 'from agentskills import *'
system_message: str = get_system_message()
in_context_example: str = f"Here is an example of how you can interact with the environment for task solving:\n{get_in_context_example()}\n\nNOW, LET'S START!"
def __init__(
self,
llm: LLM,
) -> None:
"""
Initializes a new instance of the CodeActAgent class.
Parameters:
- llm (LLM): The llm to be used by this agent
"""
super().__init__(llm)
self.reset()
def reset(self) -> None:
"""
Resets the CodeAct Agent.
"""
super().reset()
def step(self, state: State) -> Action:
"""
Performs one step using the CodeAct Agent.
This includes gathering info on previous steps and prompting the model to make a command to execute.
Parameters:
- state (State): used to get updated info and background commands
Returns:
- CmdRunAction(command) - bash command to run
- IPythonRunCellAction(code) - IPython code to run
- BrowseInteractiveAction(browsergym_command) - BrowserGym commands to run
- MessageAction(content) - Message action to run (e.g. ask for clarification)
- AgentFinishAction() - end the interaction
"""
messages: list[dict[str, str]] = [
{'role': 'system', 'content': self.system_message},
{'role': 'user', 'content': self.in_context_example},
]
for prev_action, obs in state.history:
action_message = get_action_message(prev_action)
if action_message:
messages.append(action_message)
obs_message = get_observation_message(obs)
if obs_message:
messages.append(obs_message)
latest_user_message = [m for m in messages if m['role'] == 'user'][-1]
if latest_user_message:
if latest_user_message['content'].strip() == '/exit':
return AgentFinishAction()
latest_user_message['content'] += (
f'\n\nENVIRONMENT REMINDER: You have {state.max_iterations - state.iteration} turns left to complete the task.'
)
response = self.llm.do_completion(
messages=messages,
stop=[
'</execute_ipython>',
'</execute_bash>',
'</execute_browse>',
],
temperature=0.0,
)
action_str: str = parse_response(response)
state.num_of_chars += sum(
len(message['content']) for message in messages
) + len(action_str)
if finish_command := re.search(r'<finish>.*</finish>', action_str, re.DOTALL):
thought = action_str.replace(finish_command.group(0), '').strip()
return AgentFinishAction(thought=thought)
if bash_command := re.search(
r'<execute_bash>(.*?)</execute_bash>', action_str, re.DOTALL
):
# remove the command from the action string to get thought
thought = action_str.replace(bash_command.group(0), '').strip()
# a command was found
command_group = bash_command.group(1).strip()
if command_group.strip() == 'exit':
return AgentFinishAction()
return CmdRunAction(command=command_group, thought=thought)
elif python_code := re.search(
r'<execute_ipython>(.*?)</execute_ipython>', action_str, re.DOTALL
):
# a code block was found
code_group = python_code.group(1).strip()
thought = action_str.replace(python_code.group(0), '').strip()
return IPythonRunCellAction(
code=code_group,
thought=thought,
kernel_init_code=self.jupyter_kernel_init_code,
)
elif browse_command := re.search(
r'<execute_browse>(.*)</execute_browse>', action_str, re.DOTALL
):
# BrowserGym actions was found
browse_actions = browse_command.group(1).strip()
thought = action_str.replace(browse_command.group(0), '').strip()
return BrowseInteractiveAction(
browser_actions=browse_actions, thought=thought
)
else:
# We assume the LLM is GOOD enough that when it returns pure natural language
# it want to talk to the user
return MessageAction(content=action_str, wait_for_response=True)
def search_memory(self, query: str) -> list[str]:
raise NotImplementedError('Implement this abstract method')
+451
View File
@@ -0,0 +1,451 @@
from opendevin.runtime.plugins import AgentSkillsRequirement
_AGENT_SKILLS_DOCS = AgentSkillsRequirement.documentation
COMMAND_DOCS = (
'\nApart from the standard Python library, the assistant can also use the following functions (already imported) in <execute_ipython> environment:\n'
f'{_AGENT_SKILLS_DOCS}'
"Please note that THE `edit_file` FUNCTION REQUIRES PROPER INDENTATION. If the assistant would like to add the line ' print(x)', it must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run."
)
# ======= SYSTEM MESSAGE =======
MINIMAL_SYSTEM_PREFIX = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "<execute_ipython>" tag, for example:
<execute_ipython>
print("Hello World!")
</execute_ipython>
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
"""
SYSTEM_SUFFIX = """The assistant's response should be concise.
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
"""
SWE_EXAMPLE = """
--- START OF EXAMPLE ---
USER: We are currently solving the following issue within our repository. Here is the issue text:
--- BEGIN ISSUE ---
TimeDelta serialization precision
Hi there!
I just found quite strange behaviour of `TimeDelta` field serialization
```python3
from marshmallow.fields import TimeDelta
from datetime import timedelta
td_field = TimeDelta(precision="milliseconds")
obj = dict()
obj["td_field"] = timedelta(milliseconds=345)
print(td_field.serialize("td_field", obj))
```
Output of this snippet is `344`, but it seems that `345` is correct.
Looks like a rounding issue here: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/fields.py#L1474
--- END ISSUE ---
[Current directory: /workspace/marshmallow-code__marshmallow]
ASSISTANT:
Let's first start by reproducing the results of the issue. The issue includes some example code for reproduction, which we can use. We'll create a new file called `reproduce.py` and paste the example code into it.
<execute_ipython>
create_file('/workspace/marshmallow-code__marshmallow/reproduce.py')
</execute_ipython>
USER:
OBSERVATION:
[File: /workspace/marshmallow-code__marshmallow/reproduce.py (1 lines total)]
1|
[File /workspace/marshmallow-code__marshmallow/reproduce.py created.]
ASSISTANT:
Now let's paste in the example code from the issue.
<execute_ipython>
EDITED_CODE = \"\"\"from marshmallow.fields import TimeDelta
from datetime import timedelta
td_field = TimeDelta(precision="milliseconds")
obj = dict()
obj["td_field"] = timedelta(milliseconds=345)
print(td_field.serialize("td_field", obj))
\"\"\"
edit_file(start=1, end=1, content=EDITED_CODE)
</execute_ipython>
USER:
OBSERVATION:
[File: /workspace/marshmallow-code__marshmallow/reproduce.py (9 lines total)]
1|from marshmallow.fields import TimeDelta
2|from datetime import timedelta
3|
4|td_field = TimeDelta(precision="milliseconds")
5|
6|obj = dict()
7|obj["td_field"] = timedelta(milliseconds=345)
8|
9|print(td_field.serialize("td_field", obj))
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
Now let's run the code to see if we see the same output as the issue.
<execute_bash>
python reproduce.py
</execute_bash>
USER:
OBSERVATION:
344
ASSISTANT:
We are indeed seeing the same output as the issue. The issue suggests that we should look at line 1474 of the `fields.py` file to see if there is a rounding issue. Let's open the `fields.py` file and navigate to line 1474 to see the relevant code. First we should find the `fields.py` file in the repository. It is likely to be in the `src/marshmallow` directory, but we should check to be sure.
<execute_bash>
ls -F
</execute_bash>
USER:
OBSERVATION:
AUTHORS.rst
CHANGELOG.rst
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
NOTICE
README.rst
RELEASING.md
azure-pipelines.yml
docs/
examples/
performance/
pyproject.toml
reproduce.py
setup.cfg
setup.py
src/
tests/
tox.ini
ASSISTANT:
It looks like the `src` directory is present, which suggests that the `fields.py` file is likely to be in the `src` directory. Let's use find_file to see where it is.
<execute_ipython>
find_file("fields.py", "/workspace/marshmallow-code__marshmallow/src")
</execute_ipython>
USER:
OBSERVATION:
[Found 1 matches for "fields.py" in /workspace/marshmallow-code__marshmallow/src]
/workspace/marshmallow-code__marshmallow/src/marshmallow/fields.py
ASSISTANT:
It looks like the `fields.py` file is present in the `./src/marshmallow/` directory. The issue also points to a specific URL with line number 1474. We should navigate to that line in fields.py to see the relevant code for the `TimeDelta` serialization.
<execute_ipython>
open_file("/workspace/marshmallow-code__marshmallow/src/marshmallow/fields.py", 1474)
</execute_ipython>
USER:
OBSERVATION:
[File: /workspace/marshmallow-code__marshmallow/src/marshmallow/fields.py (1997 lines total)]
(1458 more lines above)
1459| self.WEEKS,
1460| )
1461|
1462| if precision not in units:
1463| msg = 'The precision must be {} or "{}".'.format(
1464| ", ".join([f'"{each}"' for each in units[:-1]]), units[-1]
1465| )
1466| raise ValueError(msg)
1467|
1468| self.precision = precision
1469| super().__init__(**kwargs)
1470|
1471| def _serialize(self, value, attr, obj, **kwargs):
1472| if value is None:
1473| return None
1474| base_unit = dt.timedelta(**{self.precision: 1})
1475| return int(value.total_seconds() / base_unit.total_seconds())
1476|
1477| def _deserialize(self, value, attr, data, **kwargs):
1478| try:
1479| value = int(value)
1480| except (TypeError, ValueError) as error:
1481| raise self.make_error("invalid") from error
1482|
1483| kwargs = {self.precision: value}
1484|
1485| try:
1486| return dt.timedelta(**kwargs)
1487| except OverflowError as error:
1488| raise self.make_error("invalid") from error
1489|
1490|
1491|class Mapping(Field):
1492| \"\"\"An abstract class for objects with key-value pairs.
1493|
1494| :param keys: A field class or instance for dict keys.
1495| :param values: A field class or instance for dict values.
1496| :param kwargs: The same keyword arguments that :class:`Field` receives.
1497|
1498| .. note::
1499| When the structure of nested data is not known, you may omit the
1500| `keys` and `values` arguments to prevent content validation.
1501|
1502| .. versionadded:: 3.0.0rc4
1503| \"\"\"
1504|
1505| mapping_type = dict
1506|
1507| #: Default error messages.
1508| default_error_messages = {"invalid": "Not a valid mapping type."}
1509|
1510| def __init__(
1511| self,
1512| keys: typing.Optional[typing.Union[Field, type]] = None,
1513| values: typing.Optional[typing.Union[Field, type]] = None,
1514| **kwargs,
1515| ):
1516| super().__init__(**kwargs)
1517| if keys is None:
1518| self.key_field = None
1519| else:
1520| try:
1521| self.key_field = resolve_field_instance(keys)
1522| except FieldInstanceResolutionError as error:
1523| raise ValueError(
1524| '"keys" must be a subclass or instance of '
1525| "marshmallow.base.FieldABC."
1526| ) from error
1527|
1528| if values is None:
1529| self.value_field = None
1530| else:
1531| try:
1532| self.value_field = resolve_field_instance(values)
1533| except FieldInstanceResolutionError as error:
1534| raise ValueError(
1535| '"values" must be a subclass or instance of '
1536| "marshmallow.base.FieldABC."
1537| ) from error
1538| if isinstance(self.value_field, Nested):
1539| self.only = self.value_field.only
1540| self.exclude = self.value_field.exclude
1541|
1542| def _bind_to_schema(self, field_name, schema):
1543| super()._bind_to_schema(field_name, schema)
1544| if self.value_field:
1545| self.value_field = copy.deepcopy(self.value_field)
1546| self.value_field._bind_to_schema(field_name, self)
1547| if isinstance(self.value_field, Nested):
1548| self.value_field.only = self.only
1549| self.value_field.exclude = self.exclude
1550| if self.key_field:
1551| self.key_field = copy.deepcopy(self.key_field)
1552| self.key_field._bind_to_schema(field_name, self)
1553|
1554| def _serialize(self, value, attr, obj, **kwargs):
1555| if value is None:
1556| return None
1557| if not self.value_field and not self.key_field:
1558| return self.mapping_type(value)
(439 more lines below)
ASSISTANT:
We are now looking at the relevant section of the `fields.py` file where the `TimeDelta` serialization occurs. The issue suggests that there is a rounding problem near line 1474. The current implementation uses integer division, which truncates the decimal part and can lead to incorrect rounding on line 1475. To fix this issue, we should ensure that the division results in a float, and then round the result to the nearest integer before converting it to an int. This should preserve the milliseconds precision as expected. Let's make the necessary edit to the code.
<execute_ipython>
edit_file(start=1475, end=1475, content="return int(round(value.total_seconds() / base_unit.total_seconds())) # round to nearest int")
</execute_ipython>
USER:
OBSERVATION:
[Your proposed edit has introduced new syntax error(s). Please understand the errors and retry your edit command.]
ERRORS:
- E999 IndentationError: unexpected indent
[This is how your edit would have looked if applied]
-------------------------------------------------
[File: /marshmallow-code__marshmallow/src/marshmallow/fields.py (1997 lines total)]
(1468 more lines above)
1469| super().__init__(**kwargs)
1470|
1471| def _serialize(self, value, attr, obj, **kwargs):
1472| if value is None:
1473| return None
1474| base_unit = dt.timedelta(**{self.precision: 1})
1475|return int(round(value.total_seconds() / base_unit.total_seconds())) # round to nearest int
1476|
1477| def _deserialize(self, value, attr, data, **kwargs):
1478| try:
1479| value = int(value)
(518 more lines below)
-------------------------------------------------
[This is the original code before your edit]
-------------------------------------------------
[File: /marshmallow-code__marshmallow/src/marshmallow/fields.py (1997 lines total)]
(1469 more lines above)
1470|
1471| def _serialize(self, value, attr, obj, **kwargs):
1472| if value is None:
1473| return None
1474| base_unit = dt.timedelta(**{self.precision: 1})
1475| return int(value.total_seconds() / base_unit.total_seconds())
1476|
1477| def _deserialize(self, value, attr, data, **kwargs):
1478| try:
1479| value = int(value)
1480| except (TypeError, ValueError) as error:
(517 more lines below)
-------------------------------------------------
Your changes have NOT been applied. Please fix your edit command and try again.
You either need to 1) Specify the correct start/end line arguments or 2) Correct your edit code.
DO NOT re-run the same failed edit command. Running it again will lead to the same error.
ASSISTANT:
My edit command did not use the proper indentation, I will fix my syntax in this follow up edit command.
<execute_ipython>
EDITED_CODE=" return int(round(value.total_seconds() / base_unit.total_seconds())) # round to nearest int"
edit_file(start=1475, end=1475, content=EDITED_CODE)
</execute_ipython>
USER:
OBSERVATION:
[File: /marshmallow-code__marshmallow/src/marshmallow/fields.py (1997 lines total)]
(1424 more lines above)
1425|
1426| :param precision: Influences how the integer is interpreted during
1427| (de)serialization. Must be 'days', 'seconds', 'microseconds',
1428| 'milliseconds', 'minutes', 'hours' or 'weeks'.
1429| :param kwargs: The same keyword arguments that :class:`Field` receives.
1430|
1431| .. versionchanged:: 2.0.0
1432| Always serializes to an integer value to avoid rounding errors.
1433| Add `precision` parameter.
1434| \"\"\"
1435|
1436| DAYS = "days"
1437| SECONDS = "seconds"
1438| MICROSECONDS = "microseconds"
1439| MILLISECONDS = "milliseconds"
1440| MINUTES = "minutes"
1441| HOURS = "hours"
1442| WEEKS = "weeks"
1443|
1444| #: Default error messages.
1445| default_error_messages = {
1446| "invalid": "Not a valid period of time.",
1447| "format": "{input!r} cannot be formatted as a timedelta.",
1448| }
1449|
1450| def __init__(self, precision: str = SECONDS, **kwargs):
1451| precision = precision.lower()
1452| units = (
1453| self.DAYS,
1454| self.SECONDS,
1455| self.MICROSECONDS,
1456| self.MILLISECONDS,
1457| self.MINUTES,
1458| self.HOURS,
1459| self.WEEKS,
1460| )
1461|
1462| if precision not in units:
1463| msg = 'The precision must be {} or "{}".'.format(
1464| ", ".join([f'"{each}"' for each in units[:-1]]), units[-1]
1465| )
1466| raise ValueError(msg)
1467|
1468| self.precision = precision
1469| super().__init__(**kwargs)
1470|
1471| def _serialize(self, value, attr, obj, **kwargs):
1472| if value is None:
1473| return None
1474| base_unit = dt.timedelta(**{self.precision: 1})
1475| return int(round(value.total_seconds() / base_unit.total_seconds())) # round to nearest int
1476|
1477| def _deserialize(self, value, attr, data, **kwargs):
1478| try:
1479| value = int(value)
1480| except (TypeError, ValueError) as error:
1481| raise self.make_error("invalid") from error
1482|
1483| kwargs = {self.precision: value}
1484|
1485| try:
1486| return dt.timedelta(**kwargs)
1487| except OverflowError as error:
1488| raise self.make_error("invalid") from error
1489|
1490|
1491|class Mapping(Field):
1492| \"\"\"An abstract class for objects with key-value pairs.
1493|
1494| :param keys: A field class or instance for dict keys.
1495| :param values: A field class or instance for dict values.
1496| :param kwargs: The same keyword arguments that :class:`Field` receives.
1497|
1498| .. note::
1499| When the structure of nested data is not known, you may omit the
1500| `keys` and `values` arguments to prevent content validation.
1501|
1502| .. versionadded:: 3.0.0rc4
1503| \"\"\"
1504|
1505| mapping_type = dict
1506|
1507| #: Default error messages.
1508| default_error_messages = {"invalid": "Not a valid mapping type."}
1509|
1510| def __init__(
1511| self,
1512| keys: typing.Optional[typing.Union[Field, type]] = None,
1513| values: typing.Optional[typing.Union[Field, type]] = None,
1514| **kwargs,
1515| ):
1516| super().__init__(**kwargs)
1517| if keys is None:
1518| self.key_field = None
1519| else:
1520| try:
1521| self.key_field = resolve_field_instance(keys)
1522| except FieldInstanceResolutionError as error:
1523| raise ValueError(
1524| '"keys" must be a subclass or instance of '
(473 more lines below)
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
The code has been updated to use the `round` function, which should fix the rounding issue. Before submitting the changes, it would be prudent to run the reproduce.py code again to ensure that this change has the desired effect. I see that "Current directory" is still `/marshmallow-code__marshmallow`, so we can just run `reproduce.py` directly.
<execute_bash>
python reproduce.py
</execute_bash>
USER:
OBSERVATION:
345
ASSISTANT:
The output has changed from 344 to 345, which suggests that the rounding issue has been fixed. Let's remove the reproduce.py file since it is no longer needed.
<execute_bash>
rm reproduce.py
</execute_bash>
USER:
OBSERVATION:
ASSISTANT:
rm doesn't have any output when it deletes successfully, so that must have worked. Now that we have fixed the issue, we can exit the current task.
<execute_bash>
exit
</execute_bash>
--- END OF EXAMPLE ---
"""
@@ -1,2 +1,2 @@
* `finish` - if you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working. Arguments:
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
+1 -2
View File
@@ -55,14 +55,13 @@ class MicroAgent(Agent):
del self.delegates[self.agent_definition['name']]
def step(self, state: State) -> Action:
latest_user_message = state.get_current_user_intent()
prompt = self.prompt_template.render(
state=state,
instructions=instructions,
to_json=to_json,
history_to_json=history_to_json,
delegates=self.delegates,
latest_user_message=latest_user_message,
latest_user_message=state.get_current_user_intent(),
)
messages = [{'content': prompt, 'role': 'user'}]
resp = self.llm.do_completion(messages=messages)
+1 -1
View File
@@ -2,5 +2,5 @@ name: CoderAgent
description: Given a particular task, and a detailed description of the codebase, accomplishes the task
inputs:
task: string
codebase_summary: string
summary: string
outputs: {}
+1 -1
View File
@@ -2,7 +2,7 @@
You are a software engineer. You've inherited an existing codebase, which you
need to modify to complete this task:
{{ latest_user_message }}
{{ state.inputs.task }}
{% if state.inputs.summary %}
Here's a summary of the codebase, as it relates to this task:
+1 -1
View File
@@ -1,7 +1,7 @@
# Task
You are a brilliant mathematician and programmer. You've been given the following problem to solve:
{{ latest_user_message }}
`{{ state.inputs.task }}`
Please write a python script that solves this problem, and prints the answer to stdout.
ONLY print the answer to stdout, nothing else.
+1 -1
View File
@@ -2,7 +2,7 @@
You are a database engineer. You are working on an existing Postgres project, and have been given
the following task:
{{ latest_user_message }}
{{ state.inputs.task }}
You must:
* Investigate the existing migrations to understand the current schema
+4 -1
View File
@@ -4,7 +4,10 @@ import yaml
all_microagents = {}
for dir in os.listdir(os.path.dirname(__file__)):
# Get the list of directories and sort them to preserve determinism
dirs = sorted(os.listdir(os.path.dirname(__file__)))
for dir in dirs:
base = os.path.dirname(__file__) + '/' + dir
if os.path.isfile(base):
continue
+44 -6
View File
@@ -1,9 +1,11 @@
# Task
You are a software engineer. You've inherited an existing codebase, which you're
learning about for the first time. You need to study the codebase to find all
the information needed to complete this task:
You are a software architect. Your team has inherited an existing codebase, and
need to finish a project:
{{ latest_user_message }}
{{ state.inputs.task }}
As an architect, you need to study the codebase to find all the information that
might be helpful for your software engineering team.
## Available Actions
{{ instructions.actions.run }}
@@ -11,11 +13,14 @@ the information needed to complete this task:
{{ instructions.actions.message }}
{{ instructions.actions.finish }}
You must ONLY `run` commands that have no side-effects, like `ls` and `grep`.
You must ONLY `run` commands that have no side-effects, like `ls` and `grep`. You
MUST NOT modify or write to any file.
Do NOT finish until you have a complete understanding of which parts of the
codebase are relevant to the task, including particular files, functions, and classes.
codebase are relevant to the project, including particular files, functions, and classes.
When you're done, put your summary in `outputs.summary` in the `finish` action.
Remember, your task is to explore and study the current repository, not actually
implement the solution. If the codebase is empty, you shoud call the `finish` action.
## History
{{ instructions.history_truncated }}
@@ -23,3 +28,36 @@ When you're done, put your summary in `outputs.summary` in the `finish` action.
## Format
{{ instructions.format.action }}
## Examples
Here is an example of how you can interact with the environment for task solving:
--- START OF EXAMPLE ---
USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
ASSISTANT:
{
"action": "run",
"args": {
"command": "ls",
"background": false
}
}
USER:
OBSERVATION:
[]
ASSISTANT:
{
"action": "finish",
"args": {
"outputs": {
"summary": "The codebase appears to be empty. Engineers should start everything from scratch."
}
}
}
--- END OF EXAMPLE ---
+2 -1
View File
@@ -1,5 +1,6 @@
name: TypoFixerAgent
description: Fixes typos in files in the current working directory
inputs: {}
inputs:
task: string
outputs:
summary: string
+11 -3
View File
@@ -1,5 +1,13 @@
# Task
You are a proofreader tasked with fixing typos in the files in your current working directory. Your goal is to:
You are a proofreader tasked with fixing typos in the files in your current working directory.
{% if state.inputs.task %}
Specifically, your task is:
{{ state.inputs.task }}
{% endif %}
To achieve this goal, you should:
1. Scan the files for typos
2. Overwrite the files with the typos fixed
3. Provide a summary of the typos fixed
@@ -13,10 +21,10 @@ You are a proofreader tasked with fixing typos in the files in your current work
To complete this task:
1. Use the `read` action to read the contents of the files in your current working directory. Make sure to provide the file path in the format `'./file_name.ext'`.
2. Use the `think` action to analyze the contents and identify typos.
2. Use the `message` action to analyze the contents and identify typos.
3. Use the `write` action to create new versions of the files with the typos fixed.
- Overwrite the original files with the corrected content. Make sure to provide the file path in the format `'./file_name.ext'`.
4. Use the `think` action to generate a summary of the typos fixed, including the original and fixed versions of each typo, and the file(s) they were found in.
4. Use the `message` action to generate a summary of the typos fixed, including the original and fixed versions of each typo, and the file(s) they were found in.
5. Use the `finish` action to return the summary in the `outputs.summary` field.
Do NOT finish until you have fixed all the typos and generated a summary.
+3 -2
View File
@@ -2,9 +2,10 @@
You are a quality assurance engineer. Another engineer has made changes to the
codebase which are supposed to solve this task:
{{ latest_user_message }}
{{ state.inputs.task }}
Your goal is to verify that the changes are correct and bug-free.
Note the changes might have already been applied in-line. You should focus on
validating if the task is solved, nothing else.
## Available Actions
{{ instructions.actions.run }}
-37
View File
@@ -81,43 +81,6 @@ const config: Config = {
},
],
},
footer: {
style: "dark",
links: [
{
title: "OpenDevin",
items: [
{
label: "Docs",
to: "/modules/usage/intro",
},
],
},
{
title: "Community",
items: [
{
label: "Slack",
href: "https://join.slack.com/t/opendevin/shared_invite/zt-2ggtwn3k5-PvAA2LUmqGHVZ~XzGq~ILw"
},
{
label: "Discord",
href: "https://discord.gg/ESHStjSjD4",
},
],
},
{
title: "More",
items: [
{
label: "GitHub",
href: "https://github.com/OpenDevin/OpenDevin",
},
],
},
],
copyright: `Copyright © ${new Date().getFullYear()} OpenDevin`,
},
prism: {
theme: prismThemes.oneLight,
darkTheme: prismThemes.oneDark,
+4 -4
View File
@@ -73,11 +73,11 @@ OpenDevin runs bash commands within a Docker sandbox, so it should not affect yo
:::
```
docker run \
-it \
docker run -it \
--pull=always \
-e LLM_API_KEY \
-e SANDBOX_USER_ID=$(id -u) \
-e PERSIST_SANDBOX="true" \
-e SSH_PASSWORD="make something up here" \
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
-v $WORKSPACE_BASE:/opt/workspace_base \
-v /var/run/docker.sock:/var/run/docker.sock \
@@ -92,7 +92,7 @@ You'll find OpenDevin running at [http://localhost:3000](http://localhost:3000).
If you want to use the **(unstable!)** bleeding edge, you can use `ghcr.io/opendevin/opendevin:main` as the image (last line).
:::
See [Development.md](https://github.com/OpenDevin/OpenDevin/blob/main/Development.md) for instructions on running OpenDevin without Docker.
For the development workflow, see [Development.md](https://github.com/OpenDevin/OpenDevin/blob/main/Development.md).
Are you having trouble? Check out our [Troubleshooting Guide](https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting).
+61 -10
View File
@@ -11,19 +11,20 @@
"@docusaurus/core": "3.2.1",
"@docusaurus/preset-classic": "3.2.1",
"@mdx-js/react": "^3.0.0",
"autoprefixer": "^10.4.19",
"clsx": "^2.0.0",
"postcss": "^8.4.38",
"prism-react-renderer": "^2.3.0",
"react": "^18.0.0",
"react-dom": "^18.0.0",
"react-use": "^17.5.0",
"tailwindcss": "^3.4.3"
"react-icons": "^5.2.1",
"react-use": "^17.5.0"
},
"devDependencies": {
"@docusaurus/module-type-aliases": "3.2.1",
"@docusaurus/tsconfig": "3.2.1",
"@docusaurus/types": "3.2.1",
"autoprefixer": "^10.4.19",
"postcss": "^8.4.38",
"tailwindcss": "^3.4.3",
"typescript": "~5.2.2"
},
"engines": {
@@ -213,6 +214,7 @@
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/@alloc/quick-lru/-/quick-lru-5.2.0.tgz",
"integrity": "sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==",
"dev": true,
"engines": {
"node": ">=10"
},
@@ -2763,6 +2765,7 @@
"version": "8.0.2",
"resolved": "https://registry.npmjs.org/@isaacs/cliui/-/cliui-8.0.2.tgz",
"integrity": "sha512-O8jcjabXaleOG9DQ0+ARXWZBTfnP4WNAqzuiJK7ll44AmxGKv/J2M4TPjxjY3znBCfvBXFzucm1twdyFybFqEA==",
"dev": true,
"dependencies": {
"string-width": "^5.1.2",
"string-width-cjs": "npm:string-width@^4.2.0",
@@ -2779,6 +2782,7 @@
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.0.1.tgz",
"integrity": "sha512-n5M855fKb2SsfMIiFFoVrABHJC8QtHwVx+mHWP3QcEqBHYienj5dHSgjbxtC0WEZXYt4wcD6zrQElDPhFuZgfA==",
"dev": true,
"engines": {
"node": ">=12"
},
@@ -2790,6 +2794,7 @@
"version": "7.1.0",
"resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-7.1.0.tgz",
"integrity": "sha512-iq6eVVI64nQQTRYq2KtEg2d2uU7LElhTJwsH4YzIHZshxlgZms/wIc4VoDQTlG/IvVIrBKG06CrZnp0qv7hkcQ==",
"dev": true,
"dependencies": {
"ansi-regex": "^6.0.1"
},
@@ -2970,6 +2975,7 @@
"version": "0.11.0",
"resolved": "https://registry.npmjs.org/@pkgjs/parseargs/-/parseargs-0.11.0.tgz",
"integrity": "sha512-+1VkjdD0QBLPodGrJUeqarH8VAIvQODIbwh9XpP5Syisf7YoQgsJKPNFoqqLQlu+VQ/tVSshMR6loPMn8U+dPg==",
"dev": true,
"optional": true,
"engines": {
"node": ">=14"
@@ -4048,7 +4054,8 @@
"node_modules/any-promise": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/any-promise/-/any-promise-1.3.0.tgz",
"integrity": "sha512-7UvmKalWRt1wgjL1RrGxoSJW/0QZFIegpeGvZG9kjp8vrRu55XTHbwnqq2GpXm9uLbcuhxm3IqX9OB4MZR1b2A=="
"integrity": "sha512-7UvmKalWRt1wgjL1RrGxoSJW/0QZFIegpeGvZG9kjp8vrRu55XTHbwnqq2GpXm9uLbcuhxm3IqX9OB4MZR1b2A==",
"dev": true
},
"node_modules/anymatch": {
"version": "3.1.3",
@@ -4472,6 +4479,7 @@
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/camelcase-css/-/camelcase-css-2.0.1.tgz",
"integrity": "sha512-QOSvevhslijgYwRx6Rv7zKdMF8lbRmx+uQGx2+vDc+KI/eBnsy9kit5aj23AgGu3pa4t9AgwbnXWqS+iOY+2aA==",
"dev": true,
"engines": {
"node": ">= 6"
}
@@ -5626,7 +5634,8 @@
"node_modules/didyoumean": {
"version": "1.2.2",
"resolved": "https://registry.npmjs.org/didyoumean/-/didyoumean-1.2.2.tgz",
"integrity": "sha512-gxtyfqMg7GKyhQmb056K7M3xszy/myH8w+B4RT+QXBQsvAOdc3XymqDDPHx1BgPgsdAA5SIifona89YtRATDzw=="
"integrity": "sha512-gxtyfqMg7GKyhQmb056K7M3xszy/myH8w+B4RT+QXBQsvAOdc3XymqDDPHx1BgPgsdAA5SIifona89YtRATDzw==",
"dev": true
},
"node_modules/dir-glob": {
"version": "3.0.1",
@@ -5642,7 +5651,8 @@
"node_modules/dlv": {
"version": "1.1.3",
"resolved": "https://registry.npmjs.org/dlv/-/dlv-1.1.3.tgz",
"integrity": "sha512-+HlytyjlPKnIG8XuRG8WvmBP8xs8P71y+SKKS6ZXWoEgLuePxtDoUEiH7WkdePWrQ5JBpE6aoVqfZfJUQkjXwA=="
"integrity": "sha512-+HlytyjlPKnIG8XuRG8WvmBP8xs8P71y+SKKS6ZXWoEgLuePxtDoUEiH7WkdePWrQ5JBpE6aoVqfZfJUQkjXwA==",
"dev": true
},
"node_modules/dns-packet": {
"version": "5.6.1",
@@ -6464,6 +6474,7 @@
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/foreground-child/-/foreground-child-3.1.1.tgz",
"integrity": "sha512-TMKDUnIte6bfb5nWv7V/caI169OHgvwjb7V4WkeUvbQQdjr5rWKqHFiKWb/fcOwB+CzBT+qbWjvj+DVwRskpIg==",
"dev": true,
"dependencies": {
"cross-spawn": "^7.0.0",
"signal-exit": "^4.0.1"
@@ -6479,6 +6490,7 @@
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/signal-exit/-/signal-exit-4.1.0.tgz",
"integrity": "sha512-bzyZ1e88w9O1iNJbKnOlvYTrWPDl46O1bG0D3XInv+9tkPrxrN8jUUTiFlDkkmKWgn1M6CfIA13SuGqOa9Korw==",
"dev": true,
"engines": {
"node": ">=14"
},
@@ -7958,6 +7970,7 @@
"version": "2.3.6",
"resolved": "https://registry.npmjs.org/jackspeak/-/jackspeak-2.3.6.tgz",
"integrity": "sha512-N3yCS/NegsOBokc8GAdM8UcmfsKiSS8cipheD/nivzr700H+nsMOxJjQnvwOcRYVuFkdH0wGUvW2WbXGmrZGbQ==",
"dev": true,
"dependencies": {
"@isaacs/cliui": "^8.0.2"
},
@@ -10501,6 +10514,7 @@
"version": "7.0.4",
"resolved": "https://registry.npmjs.org/minipass/-/minipass-7.0.4.tgz",
"integrity": "sha512-jYofLM5Dam9279rdkWzqHozUo4ybjdZmCsDHePy5V/PbBcVMiSZR97gmAy45aqi8CK1lG2ECd356FU86avfwUQ==",
"dev": true,
"engines": {
"node": ">=16 || 14 >=14.17"
}
@@ -10534,6 +10548,7 @@
"version": "2.7.0",
"resolved": "https://registry.npmjs.org/mz/-/mz-2.7.0.tgz",
"integrity": "sha512-z81GNO7nnYMEhrGh9LeymoE4+Yr0Wn5McHIZMK5cfQCl+NDX08sCZgUc9/6MHni9IWuFLm1Z3HTCXu2z9fN62Q==",
"dev": true,
"dependencies": {
"any-promise": "^1.0.0",
"object-assign": "^4.0.1",
@@ -10691,6 +10706,7 @@
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/object-hash/-/object-hash-3.0.0.tgz",
"integrity": "sha512-RSn9F68PjH9HqtltsSnqYC1XXoWe9Bju5+213R98cNGttag9q9yAOTzdbsqvIa7aNm5WffBZFpWYr2aWrklWAw==",
"dev": true,
"engines": {
"node": ">= 6"
}
@@ -11029,6 +11045,7 @@
"version": "1.10.2",
"resolved": "https://registry.npmjs.org/path-scurry/-/path-scurry-1.10.2.tgz",
"integrity": "sha512-7xTavNy5RQXnsjANvVvMkEjvloOinkAjv/Z6Ildz9v2RinZ4SBKTWFOVRbaF8p0vpHnyjV/UwNDdKuUv6M5qcA==",
"dev": true,
"dependencies": {
"lru-cache": "^10.2.0",
"minipass": "^5.0.0 || ^6.0.2 || ^7.0.0"
@@ -11044,6 +11061,7 @@
"version": "10.2.1",
"resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-10.2.1.tgz",
"integrity": "sha512-tS24spDe/zXhWbNPErCHs/AGOzbKGHT+ybSBqmdLm8WZ1xXLWvH8Qn71QPAlqVhd0qUTWjy+Kl9JmISgDdEjsA==",
"dev": true,
"engines": {
"node": "14 || >=16.14"
}
@@ -11094,6 +11112,7 @@
"version": "2.3.0",
"resolved": "https://registry.npmjs.org/pify/-/pify-2.3.0.tgz",
"integrity": "sha512-udgsAY+fTnvv7kI7aaxbqwWNb0AHiB0qBO89PZKPkoTmGOgdbrHDKD+0B2X4uTfJ/FT1R09r9gTsjUjNJotuog==",
"dev": true,
"engines": {
"node": ">=0.10.0"
}
@@ -11102,6 +11121,7 @@
"version": "4.0.6",
"resolved": "https://registry.npmjs.org/pirates/-/pirates-4.0.6.tgz",
"integrity": "sha512-saLsH7WeYYPiD25LDuLRRY/i+6HaPYr6G1OUlN39otzkSTxKnubR9RTxS3/Kk50s1g2JTgFwWQDQyplC5/SHZg==",
"dev": true,
"engines": {
"node": ">= 6"
}
@@ -11320,6 +11340,7 @@
"version": "15.1.0",
"resolved": "https://registry.npmjs.org/postcss-import/-/postcss-import-15.1.0.tgz",
"integrity": "sha512-hpr+J05B2FVYUAXHeK1YyI267J/dDDhMU6B6civm8hSY1jYJnBXxzKDKDswzJmtLHryrjhnDjqqp/49t8FALew==",
"dev": true,
"dependencies": {
"postcss-value-parser": "^4.0.0",
"read-cache": "^1.0.0",
@@ -11336,6 +11357,7 @@
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/postcss-js/-/postcss-js-4.0.1.tgz",
"integrity": "sha512-dDLF8pEO191hJMtlHFPRa8xsizHaM82MLfNkUHdUtVEV3tgTp5oj+8qbEqYM57SLfc74KSbw//4SeJma2LRVIw==",
"dev": true,
"dependencies": {
"camelcase-css": "^2.0.1"
},
@@ -11354,6 +11376,7 @@
"version": "4.0.2",
"resolved": "https://registry.npmjs.org/postcss-load-config/-/postcss-load-config-4.0.2.tgz",
"integrity": "sha512-bSVhyJGL00wMVoPUzAVAnbEoWyqRxkjv64tUl427SKnPrENtq6hJwUojroMz2VB+Q1edmi4IfrAPpami5VVgMQ==",
"dev": true,
"funding": [
{
"type": "opencollective",
@@ -11388,6 +11411,7 @@
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/lilconfig/-/lilconfig-3.1.1.tgz",
"integrity": "sha512-O18pf7nyvHTckunPWCV1XUNXU1piu01y2b7ATJ0ppkUkk8ocqVWBrYjJBCwHDjD/ZWcfyrA0P4gKhzWGi5EINQ==",
"dev": true,
"engines": {
"node": ">=14"
},
@@ -11399,6 +11423,7 @@
"version": "2.4.1",
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.4.1.tgz",
"integrity": "sha512-pIXzoImaqmfOrL7teGUBt/T7ZDnyeGBWyXQBvOVhLkWLN37GXv8NMLK406UY6dS51JfcQHsmcW5cJ441bHg6Lg==",
"dev": true,
"bin": {
"yaml": "bin.mjs"
},
@@ -11618,6 +11643,7 @@
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/postcss-nested/-/postcss-nested-6.0.1.tgz",
"integrity": "sha512-mEp4xPMi5bSWiMbsgoPfcP74lsWLHkQbZc3sY+jWYd65CUwXrUaTp0fmNpa01ZcETKlIgUdFN/MpS2xZtqL9dQ==",
"dev": true,
"dependencies": {
"postcss-selector-parser": "^6.0.11"
},
@@ -12282,6 +12308,14 @@
"react-dom": "^16.6.0 || ^17.0.0 || ^18.0.0"
}
},
"node_modules/react-icons": {
"version": "5.2.1",
"resolved": "https://registry.npmjs.org/react-icons/-/react-icons-5.2.1.tgz",
"integrity": "sha512-zdbW5GstTzXaVKvGSyTaBalt7HSfuK5ovrzlpyiWHAFXndXTdd/1hdDHI4xBM1Mn7YriT6aqESucFl9kEXzrdw==",
"peerDependencies": {
"react": "*"
}
},
"node_modules/react-is": {
"version": "16.13.1",
"resolved": "https://registry.npmjs.org/react-is/-/react-is-16.13.1.tgz",
@@ -12412,6 +12446,7 @@
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/read-cache/-/read-cache-1.0.0.tgz",
"integrity": "sha512-Owdv/Ft7IjOgm/i0xvNDZ1LrRANRfew4b2prF3OWMQLxLfu3bS8FVhCsrSCMK4lR56Y9ya+AThoTpDCTxCmpRA==",
"dev": true,
"dependencies": {
"pify": "^2.3.0"
}
@@ -13616,6 +13651,7 @@
"version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
"integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
"dev": true,
"dependencies": {
"emoji-regex": "^8.0.0",
"is-fullwidth-code-point": "^3.0.0",
@@ -13628,7 +13664,8 @@
"node_modules/string-width-cjs/node_modules/emoji-regex": {
"version": "8.0.0",
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A=="
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==",
"dev": true
},
"node_modules/string-width/node_modules/ansi-regex": {
"version": "6.0.1",
@@ -13697,6 +13734,7 @@
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
"integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
"dev": true,
"dependencies": {
"ansi-regex": "^5.0.1"
},
@@ -13763,6 +13801,7 @@
"version": "3.35.0",
"resolved": "https://registry.npmjs.org/sucrase/-/sucrase-3.35.0.tgz",
"integrity": "sha512-8EbVDiu9iN/nESwxeSxDKe0dunta1GOlHufmSSXxMD2z2/tMZpDMpvXQGsc+ajGo8y2uYUmixaSRUc/QPoQ0GA==",
"dev": true,
"dependencies": {
"@jridgewell/gen-mapping": "^0.3.2",
"commander": "^4.0.0",
@@ -13784,6 +13823,7 @@
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.1.tgz",
"integrity": "sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==",
"dev": true,
"dependencies": {
"balanced-match": "^1.0.0"
}
@@ -13792,6 +13832,7 @@
"version": "4.1.1",
"resolved": "https://registry.npmjs.org/commander/-/commander-4.1.1.tgz",
"integrity": "sha512-NOKm8xhkzAjzFx8B2v5OAHT+u5pRQc2UCa2Vq9jYL/31o2wi9mxBA7LIFs3sV5VSC49z6pEhfbMULvShKj26WA==",
"dev": true,
"engines": {
"node": ">= 6"
}
@@ -13800,6 +13841,7 @@
"version": "10.3.12",
"resolved": "https://registry.npmjs.org/glob/-/glob-10.3.12.tgz",
"integrity": "sha512-TCNv8vJ+xz4QiqTpfOJA7HvYv+tNIRHKfUWw/q+v2jdgN4ebz+KY9tGx5J4rHP0o84mNP+ApH66HRX8us3Khqg==",
"dev": true,
"dependencies": {
"foreground-child": "^3.1.0",
"jackspeak": "^2.3.6",
@@ -13821,6 +13863,7 @@
"version": "9.0.4",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-9.0.4.tgz",
"integrity": "sha512-KqWh+VchfxcMNRAJjj2tnsSJdNbHsVgnkBhTNrW7AjVo6OvLtxw8zfT9oLw1JSohlFzJ8jCoTgaoXvJ+kHt6fw==",
"dev": true,
"dependencies": {
"brace-expansion": "^2.0.1"
},
@@ -13953,6 +13996,7 @@
"version": "3.4.3",
"resolved": "https://registry.npmjs.org/tailwindcss/-/tailwindcss-3.4.3.tgz",
"integrity": "sha512-U7sxQk/n397Bmx4JHbJx/iSOOv5G+II3f1kpLpY2QeUv5DcPdcTsYLlusZfq1NthHS1c1cZoyFmmkex1rzke0A==",
"dev": true,
"dependencies": {
"@alloc/quick-lru": "^5.2.0",
"arg": "^5.0.2",
@@ -13989,6 +14033,7 @@
"version": "6.0.2",
"resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-6.0.2.tgz",
"integrity": "sha512-XxwI8EOhVQgWp6iDL+3b0r86f4d6AX6zSU55HfB4ydCEuXLXc5FcYeOu+nnGftS4TEju/11rt4KJPTMgbfmv4A==",
"dev": true,
"dependencies": {
"is-glob": "^4.0.3"
},
@@ -14140,6 +14185,7 @@
"version": "3.3.1",
"resolved": "https://registry.npmjs.org/thenify/-/thenify-3.3.1.tgz",
"integrity": "sha512-RVZSIV5IG10Hk3enotrhvz0T9em6cyHBLkH/YAZuKqd8hRkKhSfCGIcP2KUY0EPxndzANBmNllzWPwak+bheSw==",
"dev": true,
"dependencies": {
"any-promise": "^1.0.0"
}
@@ -14148,6 +14194,7 @@
"version": "1.6.0",
"resolved": "https://registry.npmjs.org/thenify-all/-/thenify-all-1.6.0.tgz",
"integrity": "sha512-RNxQH/qI8/t3thXJDwcstUO4zeqo64+Uy/+sNVRBx4Xn2OX+OZ9oP+iJnNFqplFra2ZUVeKCSa2oVWi3T4uVmA==",
"dev": true,
"dependencies": {
"thenify": ">= 3.1.0 < 4"
},
@@ -14244,7 +14291,8 @@
"node_modules/ts-interface-checker": {
"version": "0.1.13",
"resolved": "https://registry.npmjs.org/ts-interface-checker/-/ts-interface-checker-0.1.13.tgz",
"integrity": "sha512-Y/arvbn+rrz3JCKl9C4kVNfTfSm2/mEp5FSz5EsZSANGPSlQrpRI5M4PKF+mJnE52jOO90PnPSc3Ur3bTQw0gA=="
"integrity": "sha512-Y/arvbn+rrz3JCKl9C4kVNfTfSm2/mEp5FSz5EsZSANGPSlQrpRI5M4PKF+mJnE52jOO90PnPSc3Ur3bTQw0gA==",
"dev": true
},
"node_modules/tslib": {
"version": "2.6.2",
@@ -15202,6 +15250,7 @@
"version": "7.0.0",
"resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz",
"integrity": "sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==",
"dev": true,
"dependencies": {
"ansi-styles": "^4.0.0",
"string-width": "^4.1.0",
@@ -15217,12 +15266,14 @@
"node_modules/wrap-ansi-cjs/node_modules/emoji-regex": {
"version": "8.0.0",
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A=="
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==",
"dev": true
},
"node_modules/wrap-ansi-cjs/node_modules/string-width": {
"version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
"integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
"dev": true,
"dependencies": {
"emoji-regex": "^8.0.0",
"is-fullwidth-code-point": "^3.0.0",
+5 -4
View File
@@ -18,19 +18,20 @@
"@docusaurus/core": "3.2.1",
"@docusaurus/preset-classic": "3.2.1",
"@mdx-js/react": "^3.0.0",
"autoprefixer": "^10.4.19",
"clsx": "^2.0.0",
"postcss": "^8.4.38",
"prism-react-renderer": "^2.3.0",
"react": "^18.0.0",
"react-dom": "^18.0.0",
"react-use": "^17.5.0",
"tailwindcss": "^3.4.3"
"react-icons": "^5.2.1",
"react-use": "^17.5.0"
},
"devDependencies": {
"@docusaurus/module-type-aliases": "3.2.1",
"@docusaurus/tsconfig": "3.2.1",
"@docusaurus/types": "3.2.1",
"autoprefixer": "^10.4.19",
"postcss": "^8.4.38",
"tailwindcss": "^3.4.3",
"typescript": "~5.2.2"
},
"browserslist": {
+6
View File
@@ -0,0 +1,6 @@
module.exports = {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
};
+27
View File
@@ -0,0 +1,27 @@
import { FaSlack, FaDiscord, FaGithub } from "react-icons/fa";
function CustomFooter() {
return (
<footer style={{ backgroundColor: 'dark' }} className="dark:text-white h-[25vh] bg-gradient-to-b from-gray-900 to-gray-900">
<div className="flex flex-col justify-between w-full items-center p-2 h-full">
<div className="flex gap-2">
<div className="font-bold text-lg md:text-3xl">OpenDevin</div>
<div className="text-sm"><a className="hover:text-white transition-all duration-300 cursor-pointer hover:no-underline" href="/modules/usage/intro">Docs</a></div>
</div>
<div className="uppercase font-light">Community</div>
<div className="flex gap-6 text-3xl">
<div><a className="hover:text-white trasnition-all duration-300" href="https://join.slack.com/t/opendevin/shared_invite/zt-2ggtwn3k5-PvAA2LUmqGHVZ~XzGq~ILw" target="_blank"><FaSlack /></a></div>
<div><a className="hover:text-white trasnition-all duration-300" href="https://discord.gg/ESHStjSjD4" target="_blank"><FaDiscord /></a></div>
<div><a className="hover:text-white trasnition-all duration-300" href="https://github.com/OpenDevin/OpenDevin" target="_blank"><FaGithub /></a></div>
</div>
<div >
</div>
<div >
<p className="uppercase">Copyright &copy; {new Date().getFullYear()} OpenDevin</p>
</div>
</div>
</footer>
);
}
export default CustomFooter;
@@ -7,9 +7,14 @@ import styles from "./index.module.css";
export function HomepageHeader() {
const { siteConfig } = useDocusaurusContext();
return (
<div className={styles.headerContainer}>
<div className={styles.header}>
<Heading as="h1" className="hero__title">
<div className="h-screen bg-gradient-to-t from-slate-600 to-black">
{/* <div className={styles.headerContainer}> */}
<div className={`text-white flex flex-col
items-center p-6 font-light w-full`}>
<Heading as="h1" className="
text-5xl
">
{/* hero__title */}
{siteConfig.title}
</Heading>
<p className="hero__subtitle">{siteConfig.tagline}</p>
@@ -21,8 +26,8 @@ export function HomepageHeader() {
Get Started
</Link>
</div>
</div>{" "}
<Demo />
</div>
</div>
);
}
+8 -5
View File
@@ -1,11 +1,14 @@
import styles from "./styles.module.css";
import "../../pages/index.module.css"
export function Welcome() {
return (
<div className={styles.container}>
<div className={styles.innerContainer}>
<img src="img/logo.png" className={styles.sidebarImage} />
<p className={styles.welcomeText}>
<div className="text-white">
<div className="flex justify-center items-center flex-col md:flex-row bg-gradient-to-b from-slate-600 dark:to-gray-900 to-gray-200">
<img src="img/logo.png" className="
max-sm:h-[40vw] max-sm:w-[40vw]
h-[45vh] w-[45vw]
md:h-[60vh] md:w-[350px]" />
<p className=" px-6 md:p-2 mb-6 font-light text-lg md:text-2xl">
Welcome to OpenDevin, an open-source project aiming to replicate
Devin, an autonomous AI software engineer who is capable of executing
complex engineering tasks and collaborating actively with users on
+4
View File
@@ -0,0 +1,4 @@
/* src/css/main.css */
@tailwind base;
@tailwind components;
@tailwind utilities;
+35 -26
View File
@@ -1,31 +1,32 @@
import Layout from "@theme/Layout";
import CustomFooter from "../components/CustomFooter";
export default function FAQ() {
return (
<>
<Layout title="FAQ" description="Frequently Asked Questions">
<div
id="faq"
style={{
maxWidth: "900px",
margin: "0px auto",
padding: "40px",
textAlign: "justify",
}}
className="m-auto p-6 flex flex-col gap-2 mb-6"
>
<h1 style={{ fontSize: "3rem" }}>Frequently Asked Questions</h1>
<h2 style={{ fontSize: "2rem" }}>Support</h2>
<h3>How can I report an issue with OpenDevin?</h3>
<p>
<div className="flex items-center justify-center text-2xl lg:text-6xl p-2 uppercase font-bold">Frequently Asked Questions</div>
<div className="flex flex-col gap-2 w-full mb-6" >
<div className="uppercase font-bold text-4xl tracking-wider">Support</div>
<div>How can I report an issue with OpenDevin?</div>
<div>
Please file a bug on{" "}
<a href="https://github.com/OpenDevin/OpenDevin/issues">GitHub</a> if
<a href="https://github.com/OpenDevin/OpenDevin/issues" target="_blank">GitHub</a> if
you notice a problem that likely affects others.
If you're having trouble installing, or have general questions, reach out on{" "}
<a href="https://discord.gg/mBuDGRzzES">Discord</a> or{" "}
<a href="https://join.slack.com/t/opendevin/shared_invite/zt-2ggtwn3k5-PvAA2LUmqGHVZ~XzGq~ILw">Slack</a>.
</p>
<h2 style={{ fontSize: "2rem" }}>General</h2>
<h3>What is Devin?</h3>
<p>
<a href="https://discord.gg/mBuDGRzzES" target="_blank">Discord</a> or{" "}
<a href="https://join.slack.com/t/opendevin/shared_invite/zt-2ggtwn3k5-PvAA2LUmqGHVZ~XzGq~ILw" target="_blank">Slack</a>.
</div>
</div>
<div className="flex flex-col gap-2 w-full mb-6">
<div className="uppercase font-bold text-4xl tracking-wider" >General</div>
<div>What is Devin?</div>
<div>
<span style={{ fontWeight: "600", color: "var(--logo)" }}>Devin</span>{" "}
represents a cutting-edge autonomous agent designed to navigate the
complexities of software engineering. It leverages a combination of
@@ -34,8 +35,10 @@ export default function FAQ() {
explore and expand upon Devin's capabilities, identifying both its
strengths and areas for improvement, to guide the progress of open
code models.
</p>
<h3>Why OpenDevin?</h3>
</div>
</div>
<div className="flex flex-col gap-2 w-full mb-6">
<div className="uppercase font-bold text-4xl tracking-wider">Why OpenDevin?</div>
<p>
The{" "}
<span style={{ fontWeight: "600", color: "var(--logo)" }}>
@@ -50,8 +53,11 @@ export default function FAQ() {
scenarios, producing works that significantly contribute to the
community and pave the way for future advancements.
</p>
<h3>How to fix an issue on OpenDevin?</h3>
<p>
</div>
<div className="flex flex-col gap-2 w-full mb-6">
<div className="uppercase font-bold text-4xl tracking-wider">How to fix an issue on OpenDevin?</div>
<div>
To fix an issue on GitHub using OpenDevin, send a prompt to OpenDevin asking it to follow these steps:
<ol>
<li>Read the issue on <a href="https://github.com/OpenDevin/OpenDevin/issues/1611">GitHub</a></li>
@@ -61,16 +67,19 @@ export default function FAQ() {
<li>Tell me the link that I need to go to to send a pull request</li>
</ol>
Before you run OpenDevin, you can do:
<pre>
<div className="flex flex-col p-2 bg-gray-300 rounded-md my-2">
export SANDBOX_ENV_GITHUB_TOKEN=XXX
</pre>
</div>
where XXX is a GitHub token that you created that has permissions to push to the OpenDevin repo. If you dont have write permission to the OpenDevin repo, you might need to change that to:
<pre>
<div className="flex flex-col p-2 bg-gray-300 rounded-md my-2">
4. Push the resulting output to my fork at https://github.com/USERNAME/OpenDevin/ using the GITHUB_TOKEN environment variable
</pre>
</div>
where USERNAME is your GitHub username.
</p>
</div>
</div>
</div>
</Layout>
<CustomFooter/>
</>
);
}
+7 -2
View File
@@ -1,12 +1,14 @@
import useDocusaurusContext from "@docusaurus/useDocusaurusContext";
import Layout from "@theme/Layout";
import '../css/main.css';
import { HomepageHeader } from "../components/HomepageHeader/HomepageHeader";
import { Welcome } from "../components/Welcome/Welcome";
import CustomFooter from "../components/CustomFooter";
export function Header({ title, summary, description }): JSX.Element {
return (
<div>
<h1>{title}</h1>
<h2 style={{ fontSize: "40px" }}>{summary}</h2>
<h3 className="headerDescription">{description}</h3>
</div>
@@ -16,8 +18,9 @@ export function Header({ title, summary, description }): JSX.Element {
export default function Home(): JSX.Element {
const { siteConfig } = useDocusaurusContext();
return (
<>
<Layout
title={`Hello from ${siteConfig.title}`}
title={`${siteConfig.title}`}
description="AI-powered code generation for software engineering."
>
<div>
@@ -27,5 +30,7 @@ export default function Home(): JSX.Element {
</div>
</div>
</Layout>
<CustomFooter />
</>
);
}
+12
View File
@@ -0,0 +1,12 @@
/** @type {import('tailwindcss').Config} */
module.exports = {
content: [
"./src/**/*.{js,jsx,ts,tsx}",
"./src/components/**/*.{js,jsx,ts,tsx}",
"./src/pages/**/*.{js,jsx,ts,tsx}",
],
theme: {
extend: {},
},
plugins: [],
};
+1
View File
@@ -16,6 +16,7 @@ all the preprocessing/evaluation/analysis scripts.
- HumanEvalFix: [`evaluation/humanevalfix`](./humanevalfix)
- GAIA: [`evaluation/gaia`](./gaia)
- Entity deduction Arena (EDA): [`evaluation/EDA`](./EDA)
- MINT: [`evaluation/mint`](./mint)
### Result Visualization
@@ -0,0 +1,12 @@
Cold(Bob, True)
Quiet(Bob, True)
Red(Bob, True)
Smart(Bob, True)
Kind(Charlie, True)
Quiet(Charlie, True)
Red(Charlie, True)
Rough(Charlie, True)
Cold(Dave, True)
Kind(Dave, True)
Smart(Dave, True)
Quiet(Fiona, True)
@@ -0,0 +1,52 @@
fact1
foreach
facts.Quiet($x, True)
facts.Cold($x, True)
assert
facts.Smart($x, True)
fact2
foreach
facts.Red($x, True)
facts.Cold($x, True)
assert
facts.Round($x, True)
fact3
foreach
facts.Kind($x, True)
facts.Rough($x, True)
assert
facts.Red($x, True)
fact4
foreach
facts.Quiet($x, True)
assert
facts.Rough($x, True)
fact5
foreach
facts.Cold($x, True)
facts.Smart($x, True)
assert
facts.Red($x, True)
fact6
foreach
facts.Rough($x, True)
assert
facts.Cold($x, True)
fact7
foreach
facts.Red($x, True)
assert
facts.Rough($x, True)
fact8
foreach
facts.Smart(Dave, True)
facts.Kind(Dave, True)
assert
facts.Quiet(Dave, True)
+35
View File
@@ -0,0 +1,35 @@
# Logic Reasoning Evaluation
This folder contains evaluation harness for evaluating agents on the logic reasoning benchmark [ProntoQA](https://github.com/asaparov/prontoqa) and [ProofWriter](https://allenai.org/data/proofwriter).
## Configure OpenDevin and your LLM
Create a `config.toml` file if it does not exist at the root of the workspace.
Add the following configurations:
```toml
[core]
max_iterations = 100
cache_dir = "/tmp/cache"
ssh_hostname = "localhost"
enable_auto_lint = true
# TODO: Change these to the model you want to evaluate
[eval_gpt4_1106_preview]
model = "gpt-4-1106-preview"
api_key = "XXX"
temperature = 0.0
[eval_some_openai_compatible_model]
model = "openai/MODEL_NAME"
base_url = "https://OPENAI_COMPATIBLE_URL/v1"
api_key = "XXX"
temperature = 0.0
```
## Run Inference on logic_reasoning
The following code will run inference on the first example of the ProntoQA dataset with model gpt-4o.
```bash
./evaluation/logic_reasoning/scripts/run_infer.sh ProntoQA gpt-4o 1
```
@@ -0,0 +1,20 @@
You are a helpful assistant assigned with logic reasoning task. You need to determine the correctness of a query given some facts and fules.
you can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "<execute_ipython>" tag.
In this task, you need to use the code in [[logic_inference_path.py]] to help you. Specifically, you first need to instantiate a **LogicInferenceEngine** class and use the **safe_execute_program** method to prove the **logic programs**. You should receive *answer*, *flag*, *error_message* from the output.
An example would be look like this:
<execute_ipython>
import sys
sys.path.append(workspace_mount_path)
engine = LogicInferenceEngine(dataset_name, workspace_mount_path)
answer, flag, error_message = engine.safe_execute_program(logic_programs)
</execute_ipython>
Please send the *answer* variable through message.
dataset_name:
[[dataset_name]]
logic_programs:
[[logic_programs]]
@@ -0,0 +1,220 @@
import os
import random
import re
import shutil
from pyke import knowledge_engine
class PykeProgram:
def __init__(
self, logic_program: str, dataset_name='ProntoQA', workspace_mount_path='./'
) -> None:
self.logic_program = logic_program
self.flag = self.parse_logic_program()
self.dataset_name = dataset_name
self.cache_dir = os.path.join(workspace_mount_path, '.cache_program')
# prepare the files for facts and rules
try:
self.create_fact_file(self.Facts)
self.create_rule_file(self.Rules)
self.flag = True
except Exception:
self.flag = False
self.answer_map = {
'ProntoQA': self.answer_map_prontoqa,
'ProofWriter': self.answer_map_proofwriter,
}
def parse_logic_program(self):
keywords = ['Query:', 'Rules:', 'Facts:', 'Predicates:']
program_str = self.logic_program
for keyword in keywords:
try:
program_str, segment_list = self._parse_segment(program_str, keyword)
setattr(self, keyword[:-1], segment_list)
except Exception:
setattr(self, keyword[:-1], None)
return self.validate_program()
def _parse_segment(self, program_str, key_phrase):
remain_program_str, segment = program_str.split(key_phrase)
segment_list = segment.strip().split('\n')
for i in range(len(segment_list)):
segment_list[i] = segment_list[i].split(':::')[0].strip()
return remain_program_str, segment_list
# check if the program is valid; if not, try to fix it
def validate_program(self):
if self.Rules is not None and self.Facts is not None:
if not self.Rules[0] == '' and not self.Facts[0] == '':
return True
# try to fix the program
tmp_rules = []
tmp_facts = []
statements = self.Facts if self.Facts is not None else self.Rules
if statements is None:
return False
for fact in statements:
if fact.find('>>>') >= 0: # this is a rule
tmp_rules.append(fact)
else:
tmp_facts.append(fact)
self.Rules = tmp_rules
self.Facts = tmp_facts
return False
def create_fact_file(self, facts):
with open(os.path.join(self.cache_dir, 'facts.kfb'), 'w') as f:
for fact in facts:
# check for invalid facts
if not fact.find('$x') >= 0:
f.write(fact + '\n')
def create_rule_file(self, rules):
pyke_rules = []
for idx, rule in enumerate(rules):
pyke_rules.append(self.parse_forward_rule(idx + 1, rule))
with open(os.path.join(self.cache_dir, 'rules.krb'), 'w') as f:
f.write('\n\n'.join(pyke_rules))
# example rule: Furry($x, True) && Quite($x, True) >>> White($x, True)
def parse_forward_rule(self, f_index, rule):
premise, conclusion = rule.split('>>>')
premise = premise.strip()
# split the premise into multiple facts if needed
premise = premise.split('&&')
premise_list = [p.strip() for p in premise]
conclusion = conclusion.strip()
# split the conclusion into multiple facts if needed
conclusion = conclusion.split('&&')
conclusion_list = [c.strip() for c in conclusion]
# create the Pyke rule
pyke_rule = f"""fact{f_index}\n\tforeach"""
for p in premise_list:
pyke_rule += f"""\n\t\tfacts.{p}"""
pyke_rule += """\n\tassert"""
for c in conclusion_list:
pyke_rule += f"""\n\t\tfacts.{c}"""
return pyke_rule
"""
for example: Is Marvin from Mars?
Query: FromMars(Marvin, $label)
"""
def check_specific_predicate(self, subject_name, predicate_name, engine):
results = []
with engine.prove_goal(
f'facts.{predicate_name}({subject_name}, $label)'
) as gen:
for vars, plan in gen:
results.append(vars['label'])
with engine.prove_goal(
f'rules.{predicate_name}({subject_name}, $label)'
) as gen:
for vars, plan in gen:
results.append(vars['label'])
if len(results) == 1:
return results[0]
elif len(results) == 2:
return results[0] and results[1]
elif len(results) == 0:
return None
"""
Input Example: Metallic(Wren, False)
"""
def parse_query(self, query):
pattern = r'(\w+)\(([^,]+),\s*([^)]+)\)'
match = re.match(pattern, query)
if match:
function_name = match.group(1)
arg1 = match.group(2)
arg2 = match.group(3)
arg2 = True if arg2 == 'True' else False
return function_name, arg1, arg2
else:
raise ValueError(f'Invalid query: {query}')
def execute_program(self):
# delete the compiled_krb dir
complied_krb_dir = './models/compiled_krb'
if os.path.exists(complied_krb_dir):
print('removing compiled_krb')
# os.system(f'rm -rf {complied_krb_dir}/*')
shutil.rmtree(complied_krb_dir)
# absolute_path = os.path.abspath(complied_krb_dir)
# print(absolute_path)
try:
engine = knowledge_engine.engine(self.cache_dir)
engine.reset()
engine.activate('rules')
engine.get_kb('facts')
# parse the logic query into pyke query
predicate, subject, value_to_check = self.parse_query(self.Query[0])
result = self.check_specific_predicate(subject, predicate, engine)
answer = self.answer_map[self.dataset_name](result, value_to_check)
except Exception as err:
return None, err
return answer, ''
def answer_mapping(self, answer):
return answer
def answer_map_prontoqa(self, result, value_to_check):
if result == value_to_check:
return 'A'
else:
return 'B'
def answer_map_proofwriter(self, result, value_to_check):
if result is None:
return 'C'
elif result == value_to_check:
return 'A'
else:
return 'B'
class LogicInferenceEngine:
def __init__(self, dataset_name, workspace_mount_path):
self.dataset_name = dataset_name
self.workspace_mount_path = workspace_mount_path
def random_backup(self):
if self.dataset_name == 'ProntoQA':
return random.choice(['A', 'B'])
elif self.dataset_name == 'ProofWriter':
return random.choice(['A', 'B', 'C'])
def safe_execute_program(self, logic_program):
program = PykeProgram(
logic_program, self.dataset_name, self.workspace_mount_path
)
# cannot parse the program
if not program.flag:
answer = self.random_backup()
return answer, 'parsing error', ''
# execuate the program
answer, error_message = program.execute_program()
# not executable
if answer is None:
answer = self.random_backup()
return answer, 'execution error', error_message
# successfully executed
answer = program.answer_mapping(answer)
return answer, 'success', ''
+436
View File
@@ -0,0 +1,436 @@
import asyncio
import json
import logging
import multiprocessing as mp
import os
import pathlib
import shutil
import time
from concurrent.futures import ProcessPoolExecutor
from datasets import load_dataset
from tqdm import tqdm
from evaluation.swe_bench.swe_env_box import DockerSSHBox
from opendevin.controller.state.state import State
from opendevin.core.config import config, get_llm_config_arg, get_parser
from opendevin.core.logger import get_console_handler
from opendevin.core.logger import opendevin_logger as logger
from opendevin.core.main import main
from opendevin.events.action import MessageAction
from opendevin.events.serialization.event import event_to_dict
def cleanup():
logger.info('Cleaning up child processes...')
for process in mp.active_children():
logger.info(f'Terminating child process: {process.name}')
process.terminate()
process.join()
def codeact_user_response(state: State) -> str:
msg = (
'Please continue working on the task on whatever approach you think is suitable.\n'
'If you think you have solved the task, please run the following command: <execute_bash> exit </execute_bash>.\n'
'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP OR USE THE INTERNET TO SOLVE THIS TASK.\n'
)
if state.history:
user_msgs = [
action
for action, _ in state.history
if isinstance(action, MessageAction) and action.source == 'user'
]
if len(user_msgs) >= 2:
# let the agent know that it can give up when it has tried 3 times
return (
msg
+ 'If you want to give up, run: <execute_bash> exit </execute_bash>.\n'
)
return msg
def monologue_user_response(state: State) -> str:
raise NotImplementedError('MonologueAgent should never ask for user responses.')
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
'MonologueAgent': monologue_user_response,
}
AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
}
def get_choice(answer_str):
choices = [
'A',
'B',
'C',
'D',
'E',
'F',
'G',
'H',
'A)',
'B)',
'C)',
'D)',
'E)',
'F)',
'G)',
'H)',
'A.',
'B.',
'C.',
'D.',
'E.',
'F.',
'G.',
'H.',
]
for c in choices:
if answer_str.startswith(c):
return c.replace(')', '')
if answer_str.startswith(':'):
return answer_str.replace(':', '').replace('.', '').strip()
return None
def get_test_result(
model_answer: str,
ground_truth: str,
) -> bool:
gold_answer = ground_truth.replace('(', '').replace(')', '').strip()
answer_str = model_answer if model_answer is not None else ''
prediction = get_choice(answer_str)
indicators = [
'the correct option is',
'the correct answer is',
'The correct answer is',
'The correct option is',
'Thus, the answer is',
]
if prediction is None:
for indicator in indicators:
if answer_str.find(indicator) >= 0:
answer_str = answer_str.split(indicator)[1].strip()
prediction = get_choice(answer_str)
break
isTrue = prediction == gold_answer
test_result = {'result': isTrue}
return test_result
def process_instance(
instance,
agent_class,
# metadata,
dataset_name,
skip_workspace_mount,
eval_output_dir,
reset_logger: bool = True,
):
old_workspace_mount_path = config.workspace_mount_path
old_workspace_base = config.workspace_base
workspace_mount_path = os.path.join(config.workspace_mount_path, '_eval_workspace')
# create process-specific workspace dir
# if `not skip_workspace_mount` - we will create a workspace directory for EACH process
# so that different agent don't interfere with each other.
if not skip_workspace_mount:
workspace_mount_path = os.path.join(workspace_mount_path, str(os.getpid()))
pathlib.Path(workspace_mount_path).mkdir(parents=True, exist_ok=True)
# reset workspace to config
config.workspace_base = workspace_mount_path
config.workspace_mount_path = workspace_mount_path
# Setup the logger properly, so you can run multi-processing to parallize the evaluation
if reset_logger:
# Set up logger
log_file = os.path.join(
eval_output_dir, 'logs', f'instance_{instance["id"]}.log'
)
# Remove all existing handlers from logger
for handler in logger.handlers[:]:
logger.removeHandler(handler)
# add back the console handler to print ONE line
logger.addHandler(get_console_handler())
logger.info(
f'Starting evaluation for instance {instance["id"]}.\nLOG: tail -f {log_file}'
)
# Remove all existing handlers from logger
for handler in logger.handlers[:]:
logger.removeHandler(handler)
file_handler = logging.FileHandler(log_file)
file_handler.setFormatter(
logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
)
logger.addHandler(file_handler)
if not skip_workspace_mount:
logger.info(f'Process-specific workspace mounted at {workspace_mount_path}')
# sandbox = DockerSSHBox()
logic_inference_path = os.path.join(workspace_mount_path, 'logic_inference.py')
if not os.path.exists(logic_inference_path):
shutil.copyfile(
'./evaluation/logic_reasoning/logic_inference.py', logic_inference_path
)
logger.info(f'logic_inference.py copied to {workspace_mount_path}')
cache_dir = os.path.join(workspace_mount_path, '.cache_program')
if not os.path.exists(cache_dir):
os.makedirs(cache_dir)
# Prepare instruction
with open('./evaluation/logic_reasoning/instruction.txt', 'r') as f:
instruction = f.read()
instance_logic_programs = instance['raw_logic_programs'][0].strip()
instruction = instruction.replace('[[dataset_name]]', dataset_name)
instruction = instruction.replace('[[logic_programs]]', instance_logic_programs)
instruction = instruction.replace(
'[[logic_inference_path.py]]', logic_inference_path
)
# NOTE: You can actually set slightly different instruction for different agents
instruction += AGENT_CLS_TO_INST_SUFFIX.get(agent_class, '')
sandbox = DockerSSHBox()
exit_code, command_output = sandbox.execute(f'pip install scitools-pyke')
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State = asyncio.run(
main(
instruction,
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN.get(agent_class),
sandbox=sandbox,
)
)
# ======= Attempt to evaluate the agent's edits =======
# If you are working on simplier benchmark that only evaluates the final model output (e.g., in a MessageAction)
# You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
if state is None:
raise ValueError('State should not be None.')
final_message = ''
messages = []
for action, obs in reversed(state.history):
# if isinstance(act, MessageAction):
messages.append(obs.content)
# print("obs.content:", obs.content)
if str(obs.content) in ["'A'", "'B'", "'C'"]:
final_message = obs.content
break
final_message = final_message.strip("'")
logger.info(f'Predicted answer: {final_message}, Ground truth: {instance["answer"]}')
test_result = get_test_result(
model_answer=final_message, ground_truth=instance['answer']
)
# Save the output
output = {
'id': instance['id'],
'instance': instance,
'instruction': instruction,
# 'metadata': metadata,
'history': [
(event_to_dict(action), event_to_dict(obs)) for action, obs in state.history
],
'final_message': final_message,
'messages': messages,
'error': state.error if state and state.error else None,
'test_result': test_result,
}
config.workspace_mount_path = old_workspace_mount_path
config.workspace_base = old_workspace_base
# Close the sandbox
sandbox.close()
return output
if __name__ == '__main__':
parser = get_parser()
parser.add_argument(
'--dataset',
type=str,
help='the logic reasoning dataset to evaluate on {ProntoQA, ProofWriter}',
default='ProntoQA',
)
parser.add_argument(
'--data_split',
type=str,
help='data split to evaluate on {validation}', # right now we only support validation split
default='validation',
)
args, _ = parser.parse_known_args()
if args.directory:
config.workspace_base = os.path.abspath(args.directory)
print(f'Setting workspace base to {config.workspace_base}')
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
# so we don't need to manage file uploading to OpenDevin's repo
dataset_name = args.dataset
data_split = args.data_split
dataset = load_dataset(f'renma/{dataset_name}')
logic_reasoning_tests = dataset[data_split]
logger.info(f'Evaluating logic reasoning dataset {dataset_name} {data_split} split')
# Check https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md#configure-opendevin-and-your-llm
# for details of how to set `llm_config`
if args.llm_config:
specified_llm_config = get_llm_config_arg(args.llm_config)
if specified_llm_config:
config.llm = specified_llm_config
logger.info(f'Config for evaluation: {config}')
# TEST METADATA
agent_class = args.agent_cls
assert (
agent_class in AGENT_CLS_TO_FAKE_USER_RESPONSE_FN
), f'Unsupported agent class: {agent_class}'
model_name = config.llm.model.split('/')[-1]
max_iterations = args.max_iterations
eval_note = ''
if args.eval_note is not None:
eval_note += '_N_' + args.eval_note
eval_output_dir = os.path.join(
args.eval_output_dir,
'logic_reasoning',
agent_class,
dataset_name,
model_name + '_maxiter_' + str(max_iterations) + eval_note
)
pathlib.Path(eval_output_dir).mkdir(parents=True, exist_ok=True)
pathlib.Path(os.path.join(eval_output_dir, 'logs')).mkdir(
parents=True, exist_ok=True
)
logger.info(f'Using evaluation output directory: {eval_output_dir}')
# LIMIT EVALUATION
eval_n_limit = args.eval_n_limit
if eval_n_limit:
logic_reasoning_tests = logic_reasoning_tests.select(list(range(eval_n_limit)))
logger.info(f'Limiting evaluation to first {eval_n_limit} instances.')
start_time = time.strftime('%Y-%m-%d %H:%M:%S')
# OUTPUT FILE
output_file = os.path.join(eval_output_dir, 'output.jsonl')
logger.info(f'Writing evaluation output to {output_file}')
finished_task_ids = set()
if os.path.exists(output_file):
with open(output_file, 'r') as f:
for line in f:
data = json.loads(line)
finished_task_ids.add(data['id'])
logger.warning(
f'Output file {output_file} already exists. Loaded {len(finished_task_ids)} finished instances.'
)
output_fp = open(output_file, 'a')
logger.info(
f'Evaluation started with Agent {agent_class}, model {model_name}, max iterations {max_iterations}.'
)
# =============================================
# filter out finished instances
new_logic_reasoning_tests = []
for instance in logic_reasoning_tests:
if instance['id'] in finished_task_ids:
logger.info(
f'Skipping instance {instance["id"]} as it is already finished.'
)
continue
new_logic_reasoning_tests.append(instance)
logic_reasoning_tests = new_logic_reasoning_tests
logger.info(
f'Finished instances: {len(finished_task_ids)}, Remaining instances: {len(logic_reasoning_tests)}'
)
# =============================================
pbar = tqdm(total=len(logic_reasoning_tests))
# This function tracks the progress AND write the output to a JSONL file
def update_progress(future):
pbar.update(1)
output = future.result()
pbar.set_description(f'Instance {output["id"]}')
pbar.set_postfix_str(f'Test Result: {output["test_result"]["result"]}')
logger.info(
f'Finished evaluation for instance {output["id"]}: {output["test_result"]["result"]}'
)
output_fp.write(json.dumps(output) + '\n')
# json.dump(output, output_fp, indent=4)
output_fp.flush()
# This sets the multi-processing
num_workers = args.eval_num_workers
# num_workers = 1
logger.info(f'Using {num_workers} workers for evaluation.')
# This is SWE-Bench specific - CodeActAgent don't requires mounted workspace to work
skip_workspace_mount = False
logger.info(f'Skipping workspace mount: {skip_workspace_mount}')
try:
with ProcessPoolExecutor(num_workers) as executor:
futures = []
# This is how we perform multi-processing
for instance in logic_reasoning_tests:
future = executor.submit(
process_instance,
instance,
agent_class,
dataset_name,
skip_workspace_mount,
eval_output_dir,
reset_logger=bool(num_workers > 1),
)
future.add_done_callback(update_progress)
futures.append(future)
# Wait for all futures to complete
for future in futures:
future.result()
except KeyboardInterrupt:
print('KeyboardInterrupt received. Cleaning up...')
cleanup()
output_fp.close()
with open(output_file, 'r') as f:
test_result = [(json.loads(line))["test_result"]["result"] for line in f]
metadata = {
"Dataset": dataset_name,
"Data split": data_split,
"Number of Samples": len(test_result),
'Agent class': agent_class,
'Model name': model_name,
'Start_time': start_time,
"End_time": time.strftime('%Y-%m-%d %H:%M:%S'),
"Final Accuracy": f"{sum(test_result)/len(test_result):.2f}",
}
with open(os.path.join(eval_output_dir, 'metadata.json'), 'w') as f:
json.dump(metadata, f, indent=4)
logger.info(f'Metadata: {json.dumps(metadata, indent=4)}')
logger.info(f'Evaluation finished. Metadata saved to {eval_output_dir}/metadata.json')
+37
View File
@@ -0,0 +1,37 @@
#!/bin/bash
DATASET=$1
MODEL_CONFIG=$2
EVAL_LIMIT=$3
AGENT=$4
# ################################################################################
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
# IMPORTANT: Because Agent's prompt changes fairly often in the rapidly evolving codebase of OpenDevin
# We need to track the version of Agent in the evaluation to make sure results are comparable
AGENT_VERSION=v$(poetry run python -c "import agenthub; from opendevin.controller.agent import Agent; print(Agent.get_cls('$AGENT').VERSION)")
echo "AGENT: $AGENT"
echo "AGENT_VERSION: $AGENT_VERSION"
echo "MODEL_CONFIG: $MODEL_CONFIG"
COMMAND="poetry run python evaluation/logic_reasoning/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--dataset $DATASET \
--max-iterations 10 \
--max-chars 10000000 \
--eval-num-workers 1 \
--eval-note $AGENT_VERSION"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
+1
View File
@@ -0,0 +1 @@
!requirements.txt
+45
View File
@@ -0,0 +1,45 @@
# MINT Benchmark
This folder contains the evaluation harness for the [MINT benchmark](https://arxiv.org/abs/2309.10691) on LLMs' ability to solve tasks with multi-turn interactions.
## Configure OpenDevin and LM
Create a `config.toml` file if it does not exist at the root of the workspace. Please check [README.md](../../README.md) for how to set this up.
## Start the evaluation
We are using the MINT dataset hosted on [Hugging Face](https://huggingface.co/datasets/ryanhoangt/xingyaoww-mint-bench).
Following is the basic command to start the evaluation. Currently, the only agent supported with MINT is `CodeActAgent`.
```bash
./evaluation/mint/scripts/run_infer.sh [model_config] [subset] [eval_limit]
```
where `model_config` is mandatory, while `subset` and `eval_limit` are optional.
- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your LLM settings, as defined in your `config.toml`.
- `subset`, e.g. `math`, is the subset of the MINT benchmark to evaluate on, defaulting to `math`.
- `eval_limit`, e.g. `2`, limits the evaluation to the first `eval_limit` instances, defaulting to all instances.
Note: in order to use `eval_limit`, you must also set `subset`.
Let's say you'd like to run 3 instances on the `gsm8k` subset using `eval_gpt4_1106_preview`,
then your command would be:
```bash
./evaluation/swe_bench/scripts/run_infer.sh eval_gpt4_1106_preview gsm8k 3
```
## Reference
```
@misc{wang2024mint,
title={MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback},
author={Xingyao Wang and Zihan Wang and Jiateng Liu and Yangyi Chen and Lifan Yuan and Hao Peng and Heng Ji},
year={2024},
eprint={2309.10691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
+5
View File
@@ -0,0 +1,5 @@
TASK_INFO_MAP = {
# === Reasoning ===
'gsm8k': {'class': 'ReasoningTask', 'type': 'reasoning'},
'math': {'class': 'ReasoningTask', 'type': 'reasoning'},
}
+82
View File
@@ -0,0 +1,82 @@
import enum
from typing import Any, Dict, Tuple
class TaskState:
def __init__(
self,
finished: bool = False,
success: bool = False,
agent_action_count: dict = None,
terminate_reason: str = None,
latest_output: Dict[str, Any] = None,
):
self.finished = finished
self.success = success
self.agent_action_count: Dict[str, int] = agent_action_count or {
'propose_solution': 0,
'use_tool': 0,
'invalid_action': 0,
}
self.terminate_reason = terminate_reason
self.latest_output = latest_output
def to_dict(self) -> Dict[str, Any]:
return {
'finished': self.finished,
'success': self.success,
'agent_action_count': self.agent_action_count,
'terminate_reason': self.terminate_reason,
'latest_output': self.latest_output,
}
class ParseError(Exception):
pass
class FeedbackType(enum.Enum):
FEEDBACK_WITH_GT = 'feedback_with_gt'
FEEDBACK_WO_GT = 'feedback_wo_gt'
NO_FEEDBACK = 'no_feedback'
class StepOutput:
def __init__(
self,
observation: str = None,
success: bool = False,
extra: Dict[str, Any] = None,
turn_info: Tuple[int, int] = None,
):
self.observation: str = observation
self.success: bool = success
self.extra: Dict[str, Any] = extra
self.turn_info = turn_info
def __repr__(self) -> str:
return self.observation
def to_str(self) -> str:
output = 'Observation:\n'
if self.observation is not None:
output += self.observation + '\n'
else:
if not self.success:
output += 'Your answer is wrong.\n'
if self.turn_info is not None:
n_steps_left, n_propose_solution_left = self.turn_info
output += 'You have {} steps left and {} chances to propose solution left.\n'.format(
n_steps_left, n_propose_solution_left
)
if n_steps_left <= 1:
output += 'You should take the last step to propose a solution.\n'
return output
def to_dict(self) -> Dict[str, Any]:
return {
'observation': self.observation,
'success': self.success,
}
+119
View File
@@ -0,0 +1,119 @@
import re
import traceback
from typing import Dict, Optional
from datatypes import ParseError, StepOutput, TaskState
from task import Task
from opendevin.controller.state.state import State
class SimplifiedEnv:
INVALID_INPUT_MESSAGE = (
"I don't understand your input. \n"
'If you want to execute code, please use <execute_ipython> YOUR_CODE_HERE </execute_ipython>.\n'
'If you want to give me an answer, please use <solution> YOUR_SOLUTION_HERE </solution>.\n'
'For example: The answer to the question is <solution> 42 </solution>. \n'
)
def __init__(self, agent_state: State, task: Task, task_config: Dict[str, int]):
self.agent_state = agent_state
self.task = task
self.task_state = TaskState()
self.task_config = task_config
def step(self, lm_message: str):
observation = self.handle_propose_solution(lm_message)
self.check_max_iteration()
turn_info = (
self.task_config['max_iterations'] - self.agent_state.iteration,
self.task_config['max_propose_solution']
- self.task_state.agent_action_count['propose_solution'],
)
output = StepOutput(
observation=observation,
success=self.task_state.success,
turn_info=turn_info,
)
self.log_output(output)
return self.task_state
def handle_propose_solution(self, lm_message) -> Optional[str]:
"""Propose answer to check the task success.
It might set self.state.finished = True if the task is successful.
"""
self.task_state.agent_action_count['propose_solution'] += 1
try:
parsed = self.parse_propose_solution(lm_message)
task_success = self.check_task_success(parsed['answer'])
if task_success:
self.task_state.finished = True
self.task_state.success = True
self.task_state.terminate_reason = 'task_success'
# NOTE: should not return the function now, because we need to log the output
# Set state.finished = True will terminate the episode
except ParseError:
return SimplifiedEnv.INVALID_INPUT_MESSAGE
except Exception:
error_traceback = traceback.format_exc()
return f'{error_traceback}'
def parse_propose_solution(self, lm_message: str) -> dict:
"""Define the parsing logic."""
lm_output = '\n' + lm_message + '\n'
answer = '\n'.join(
[
i.strip()
for i in re.findall(r'<solution>(.*?)</solution>', lm_output, re.DOTALL)
]
)
if answer == '':
raise ParseError('No answer found.')
return {'answer': answer}
def log_output(self, output: StepOutput) -> None:
if self.task_state.finished:
return
content = output.to_str()
# self.state.history.append({"role": "user", "content": content})
self.task_state.latest_output = output.to_dict()
self.task_state.latest_output['content'] = content
def check_task_success(self, answer: str) -> bool:
# log_message.info(f"STUDENT ANSWER: [{answer}]")
# log_message.info(f"REFERENCE ANSWER: [{self.task.reference}]")
return self.task.success(answer)
def check_max_iteration(self):
"""Check if the agent has reached the max iteration limit.
It might set self.state.finished = True if the agent has reached the max iteration limit.
"""
if self.task_state.finished:
# ignore if the episode is already finished (e.g., task success)
return
if (
# propose solution > max output solution
self.task_state.agent_action_count['propose_solution']
>= self.task_config['max_propose_solution']
):
self.task_state.finished = True
self.task_state.success = False
self.task_state.terminate_reason = 'max_propose_steps'
elif (
# (propose_solution + use_tool) > max iteration limit
sum(self.task_state.agent_action_count.values())
>= self.task_config['max_iterations']
):
self.task_state.finished = True
self.task_state.success = False
self.task_state.terminate_reason = 'max_iterations'
@@ -0,0 +1,48 @@
Task:
Evaluate $\int\int_C (x+y)^2 dx,dy$, where C is the circle defined by $x^2 + y^2 = M$, and M is the birth year of Li Bai.
Output format requirement: Round to three decimal places, using half up rounding rule.
Assistant:
<thought> To solve this problem, let's start by determining Li Bai's birth year. </thought>
<execute_ipython> print(wikipedia_search('Li Bai')) </execute_ipython>
Observation:
Page: Li Bai
Summary: Li Bai (Chinese: 李白; pinyin: Lǐ Bái, 701762), also pronounced as Li Bo, courtesy name Taibai (Chinese: 太白), was a Chinese poet, acclaimed from his own time to the present as a brilliant and romantic figure who took traditional poetic forms to new heights.
Assistant:
<thought>
From the wiki_search result, we can see that Li Bai was born in 701, thus M = 701.
To evaluate the double integral of $(x+y)^2$ over the region $C$, where $C$ is the circle defined by $x^2 + y^2 = 701$, we should use polar coordinates.
In polar coordinates, $x = r\cos(\theta)$ and $y = r\sin(\theta)$, and the differential area element $dx,dy$ becomes $r,dr,d\theta$.
$\int\int_C (x+y)^2 dx,dy = \int_0^{2\pi}\int_0^{\sqrt{701}} (r\cos(\theta) + r\sin(\theta))^2r,dr,d\theta$
Let's use the SymPy package to calculate this.
</thought>
<execute_ipython>
from sympy import symbols, integrate, cos, sin, pi
r, theta = symbols('r theta')
# the integrand
integrand = (r*cos(theta) + r*sin(theta))**2 * r
# perform the double integration
double_integral = integrate(integrate(integrand, (r, 0, 701 ** 0.5)), (theta, 0, 2*pi))
print(double_integral)
</execute_ipython>
Observation:
Out[2]: 245700.5*pi
Assistant:
<thought> Let's round this result to three decimal places using half up rounding rule. </thought>
<execute_ipython>
import math
print(round(245700.5*math.pi, 3))
</execute_ipython>
Observation:
771890.886
Assistant:
The answer is <solution> 771890.886 </solution>.
+25
View File
@@ -0,0 +1,25 @@
import os
from utils import load_file
PROMPT_DIR = os.path.dirname(__file__)
TEMPLATE_WITH_TOOL = load_file(os.path.join(PROMPT_DIR, 'template_with_tool.txt'))
class PromptTemplate:
"""A prompt template."""
def __init__(self, template: str):
self.template: str = template
def __call__(self, **kwargs) -> str:
return self.template.format(**kwargs)
class ToolPromptTemplate(PromptTemplate):
def __init__(self, use_tool: bool):
if use_tool:
template = TEMPLATE_WITH_TOOL
else:
raise NotImplementedError('Evaluation without tool is not supported yet.')
super().__init__(template)
@@ -0,0 +1,19 @@
You are a helpful assistant assigned with the task of problem-solving.
To solve the task, you can only interact with the interactive Python (Jupyter Notebook) environment using <execute_ipython> tag. Other tools cannot be used.
At each turn, you should first provide your step-by-step thinking for solving the task. Your thought process should be enclosed using "<thought>" tag, for example: <thought> I need to print "Hello World!" </thought>.
After that, you have two options:
1) Interact with a Python programming environment and receive the corresponding output.
2) Directly provide a solution by sending your answer to user through message that adheres to the required format for the given task. Your solution should be enclosed using "<solution>" tag, for example: The answer is <solution> A </solution>.
Either you choose to interact with the Python environment or provide a solution, you need to send a message to the user to evaluate your response and provide feedback.
You have {max_total_steps} chances to interact with the environment or propose a solution. You can only propose a solution {max_propose_solution} times.
---
{in_context_example}
---
# Problem statement:
{task_prompt}
+32
View File
@@ -0,0 +1,32 @@
pre-commit
openai
datasets
backoff
charset-normalizer==3.1.0
# Alfworld
pandas==1.4.4
opencv-python
networkx
tqdm
vocab
revtok
Click
ai2thor==2.1.0
transformers
tokenizers
scipy==1.10.1
ipython
matplotlib
cython
nltk
gym==0.15.4
pipreqs
pyyaml
pytz
visdom
sympy
pycocotools
seaborn
google-generativeai
python-dateutil
statsmodels
+357
View File
@@ -0,0 +1,357 @@
import asyncio
import functools
import json
import logging
import multiprocessing as mp
import os
import pathlib
import subprocess
import time
from concurrent.futures import ProcessPoolExecutor
from typing import Dict
from datasets import load_dataset
from datatypes import TaskState
from env import SimplifiedEnv
from prompts import ToolPromptTemplate
from task import ReasoningTask, Task
from tqdm import tqdm
from evaluation.swe_bench.swe_env_box import DockerSSHBox
from opendevin.controller.state.state import State
from opendevin.core.config import config, get_llm_config_arg, get_parser
from opendevin.core.logger import get_console_handler
from opendevin.core.logger import opendevin_logger as logger
from opendevin.core.main import main
from opendevin.events.serialization.event import event_to_dict
def cleanup():
print('Cleaning up child processes...')
for process in mp.active_children():
print(f'Terminating child process: {process.name}')
process.terminate()
process.join()
def codeact_user_response(state: State, task: Task, task_config: Dict[str, int]):
logger.info(f'Gold reference: {task.reference}')
logger.info(f'Task config: {task_config}')
env = SimplifiedEnv(
agent_state=state,
task=task,
task_config=task_config,
)
last_action, _ = state.history[-1]
result_state: TaskState = env.step(last_action.message)
state.task_state = result_state
if not result_state.latest_output:
if result_state.success:
msg = 'Your answer is correct. Please EXIT using the following command: <execute_bash> exit </execute_bash>.'
else:
msg = 'Something went wrong! No output from the model.'
else:
msg = result_state.latest_output['content']
logger.info('User response:' + msg)
return msg
def monologue_user_response(state: State) -> str:
raise NotImplementedError('MonologueAgent should never ask for user responses.')
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
'MonologueAgent': monologue_user_response,
}
AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': '\nIMPORTANT: When your answer is confirmed by the user to be correct, you can exit using the following command: <execute_bash> exit </execute_bash>.\n'
}
def process_instance(
instance: Task,
agent_class,
metadata,
skip_workspace_mount,
eval_output_dir,
reset_logger: bool = True,
):
workspace_mount_path = os.path.join(config.workspace_mount_path, '_eval_workspace')
# create process-specific workspace dir
# if `not skip_workspace_mount` - we will create a workspace directory for EACH process
# so that different agent don't interfere with each other.
if not skip_workspace_mount:
workspace_mount_path = os.path.join(workspace_mount_path, str(os.getpid()))
pathlib.Path(workspace_mount_path).mkdir(parents=True, exist_ok=True)
# Setup the logger properly, so you can run multi-processing to parallize the evaluation
if reset_logger:
# Set up logger
log_file = os.path.join(
eval_output_dir, 'logs', f'instance_{instance.task_id}.log'
)
# Remove all existing handlers from logger
for handler in logger.handlers[:]:
logger.removeHandler(handler)
# add back the console handler to print ONE line
logger.addHandler(get_console_handler())
logger.info(
f'Starting evaluation for instance {instance.task_id}.\nHint: run "tail -f {log_file}" to see live logs in a seperate shell'
)
# Remove all existing handlers from logger
for handler in logger.handlers[:]:
logger.removeHandler(handler)
file_handler = logging.FileHandler(log_file)
file_handler.setFormatter(
logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
)
logger.addHandler(file_handler)
if not skip_workspace_mount:
logger.info(f'Process-specific workspace mounted at {workspace_mount_path}')
sandbox = DockerSSHBox()
requirements_host_src = 'evaluation/mint/requirements.txt'
requirements_sandbox_dest = '/opendevin/plugins/mint/requirements.txt'
sandbox.copy_to(
host_src=requirements_host_src,
sandbox_dest=requirements_sandbox_dest,
recursive=False,
)
logger.info(
f'Copied files from [{requirements_host_src}] to [{requirements_sandbox_dest}] inside sandbox.'
)
exit_code, output = sandbox.execute(f'pip install -r {requirements_sandbox_dest}')
# Prepare instruction
instruction = ToolPromptTemplate(use_tool=True)(
max_total_steps=metadata['max_iterations'],
max_propose_solution=metadata['max_propose_solution'],
in_context_example=instance.in_context_example(
use_tool=True, with_feedback=False
),
task_prompt='Task:\n' + instance.prompt,
)
instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you or provide the solution inside <solution> tag AND NEVER ASK FOR HUMAN HELP.\n'
# NOTE: You can actually set slightly different instruction for different agents
instruction += AGENT_CLS_TO_INST_SUFFIX.get(agent_class, '')
# Here's how you can run the agent (similar to the `main` function) and get the final task state
fake_user_response_fn = functools.partial(
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN.get(agent_class),
task=instance,
task_config={
'max_iterations': metadata['max_iterations'],
'max_propose_solution': metadata['max_propose_solution'],
},
)
state: State = asyncio.run(
main(
instruction,
fake_user_response_fn=fake_user_response_fn,
sandbox=sandbox,
)
)
if state is None:
raise ValueError('State should not be None.')
logger.info('Msgs: ' + str(state.history))
task_state: TaskState = state.task_state
logger.info('Task state: ' + str(task_state.to_dict()))
# Save the output
output = {
'id': instance.task_id,
'instance': instance.to_dict(),
'instruction': instruction,
'metadata': metadata,
'history': [
(event_to_dict(action), event_to_dict(obs)) for action, obs in state.history
],
'error': state.error if state and state.error else None,
'test_result': task_state.success,
}
# Close the sandbox
sandbox.close()
return output
if __name__ == '__main__':
parser = get_parser()
parser.add_argument(
'--subset',
default='math',
choices=['math', 'gsm8k'],
type=str,
help='subset of the dataset to be used',
)
parser.add_argument(
'--max-propose-solution',
default=2,
type=int,
help='maximum number of times the agent can propose a solution',
)
args, _ = parser.parse_known_args()
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
# so we don't need to manage file uploading to OpenDevin's repo
mint_dataset = load_dataset(
'ryanhoangt/xingyaoww-mint-bench', name=args.subset, split='test'
)
logger.info(f'Evaluating MINT - {args.subset} subset')
# Check https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md#configure-opendevin-and-your-llm
# for details of how to set `llm_config`
if args.llm_config:
specified_llm_config = get_llm_config_arg(args.llm_config)
if specified_llm_config:
config.llm = specified_llm_config
logger.info(f'Config for evaluation: {config}')
# TEST METADATA
agent_class = args.agent_cls
assert (
agent_class in AGENT_CLS_TO_FAKE_USER_RESPONSE_FN
), f'Unsupported agent class: {agent_class}'
model_name = config.llm.model.split('/')[-1]
max_iterations = args.max_iterations
eval_note = ''
if args.eval_note is not None:
eval_note += '_N_' + args.eval_note
eval_output_dir = os.path.join(
args.eval_output_dir,
'mint',
agent_class,
model_name + '_maxiter_' + str(max_iterations) + eval_note,
args.subset,
)
pathlib.Path(eval_output_dir).mkdir(parents=True, exist_ok=True)
pathlib.Path(os.path.join(eval_output_dir, 'logs')).mkdir(
parents=True, exist_ok=True
)
logger.info(f'Using evaluation output directory: {eval_output_dir}')
metadata = {
'agent_class': agent_class,
'model_name': model_name,
'max_iterations': max_iterations,
'max_propose_solution': args.max_propose_solution,
'eval_output_dir': eval_output_dir,
'start_time': time.strftime('%Y-%m-%d %H:%M:%S'),
# get the commit id of current repo for reproduciblity
'git_commit': subprocess.check_output(['git', 'rev-parse', 'HEAD'])
.decode('utf-8')
.strip(),
}
logger.info(f'Metadata: {metadata}')
with open(os.path.join(eval_output_dir, 'metadata.json'), 'w') as f:
json.dump(metadata, f)
# LIMIT EVALUATION
eval_n_limit = args.eval_n_limit
if eval_n_limit:
mint_dataset = mint_dataset.select(range(eval_n_limit))
logger.info(f'Limiting evaluation to first {eval_n_limit} instances.')
# OUTPUT FILE
output_file = os.path.join(eval_output_dir, 'output.jsonl')
logger.info(f'Writing evaluation output to {output_file}')
finished_instance_ids = set()
if os.path.exists(output_file):
with open(output_file, 'r') as f:
for line in f:
data = json.loads(line)
finished_instance_ids.add(data['id'])
logger.warning(
f'Output file {output_file} already exists. Loaded {len(finished_instance_ids)} finished instances.'
)
output_fp = open(output_file, 'a')
logger.info(
f'Evaluation started with Agent {agent_class}, model {model_name}, max iterations {max_iterations}, max propose solution {args.max_propose_solution}.'
)
# =============================================
# filter out finished instances
task_class = ReasoningTask
new_mint_tests: list[ReasoningTask] = []
for instance in mint_dataset:
if instance['id'] in finished_instance_ids:
logger.info(
f'Skipping instance {instance["id"]} as it is already finished.'
)
continue
# convert to Task object
instance = ReasoningTask(**instance)
new_mint_tests.append(instance)
mint_dataset = new_mint_tests
logger.info(
f'Finished instances: {len(finished_instance_ids)}, Remaining instances: {len(mint_dataset)}'
)
# =============================================
pbar = tqdm(total=len(mint_dataset))
# This function tracks the progress AND write the output to a JSONL file
def update_progress(future):
pbar.update(1)
output = future.result()
# logger.info('Output: ', output)
# pbar.set_description(f'Instance {output["instance_id"]}')
# pbar.set_postfix_str(f'Test Result: {output["test_result"]["result"]}')
# logger.info(
# f'Finished evaluation for instance {output["instance_id"]}: {output["test_result"]["result"]}'
# )
output_fp.write(json.dumps(output) + '\n')
output_fp.flush()
# This sets the multi-processing
num_workers = args.eval_num_workers
logger.info(f'Using {num_workers} workers for evaluation.')
# This is SWE-Bench specific - CodeActAgent doesn't require mounted workspace to work
skip_workspace_mount = agent_class == 'CodeActAgent'
logger.info(f'Skipping workspace mount: {skip_workspace_mount}')
try:
with ProcessPoolExecutor(num_workers) as executor:
futures = []
# This is how we perform multi-processing
for instance in mint_dataset:
future = executor.submit(
process_instance,
instance,
agent_class,
metadata,
skip_workspace_mount,
eval_output_dir,
reset_logger=bool(num_workers > 1),
)
future.add_done_callback(update_progress)
futures.append(future)
# Wait for all futures to complete
for future in futures:
future.result()
except KeyboardInterrupt:
print('KeyboardInterrupt received. Cleaning up...')
cleanup()
output_fp.close()
logger.info('Evaluation finished.')
+37
View File
@@ -0,0 +1,37 @@
#!/bin/bash
MODEL_CONFIG=$1
SUBSET=$2
EVAL_LIMIT=$3
# Only 'CodeActAgent' is supported for MINT now
AGENT="CodeActAgent"
# We need to track the version of Agent in the evaluation to make sure results are comparable
AGENT_VERSION=v$(poetry run python -c "import agenthub; from opendevin.controller.agent import Agent; print(Agent.get_cls('$AGENT').VERSION)")
echo "AGENT: $AGENT"
echo "AGENT_VERSION: $AGENT_VERSION"
export PYTHONPATH=$(pwd)
COMMAND="poetry run python ./evaluation/mint/run_infer.py \
--max-iterations 5 \
--max-propose-solution 2 \
--eval-note $AGENT_VERSION"
if [ -n "$SUBSET" ]; then
echo "SUBSET: $SUBSET"
COMMAND="$COMMAND --subset $SUBSET"
# otherwise default to use the math subset
else
echo "SUBSET: math"
COMMAND="$COMMAND --subset math"
fi
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
+121
View File
@@ -0,0 +1,121 @@
import json
import logging
import os
from abc import ABC, abstractmethod
from typing import List, Optional, Tuple
from utils import load_file
LOGGER = logging.getLogger('MINT')
class Task(ABC):
"""Base class for a task instance."""
task_name: str = 'base'
in_context_example_dir = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'in_context_examples',
)
def __init__(self, **kwargs) -> None:
if 'loaded_history' in kwargs:
self.loaded_history = kwargs['loaded_history']
else:
self.loaded_history = None
# pre-load the in-context example
task_dir = os.path.join(self.in_context_example_dir, self.task_name)
self._in_context_example = {
'with_tool': load_file(os.path.join(task_dir, 'with_tool.txt')),
}
self.metadata = {}
@property
def task_id(self) -> str:
"""Return the task id."""
assert hasattr(self, '_id'), 'Task does not have an id.'
return self._id
def in_context_example(
self, use_tool: bool = True, with_feedback: bool = False
) -> str:
"""Return the in-context example for the task."""
if use_tool and not with_feedback:
return self._in_context_example['with_tool']
else:
raise NotImplementedError
@property
def prompt(self) -> str:
"""Return the task prompt."""
assert hasattr(self, '_prompt'), 'Task does not have a prompt.'
return self._prompt
@property
def reference(self) -> str:
"""Return the reference solution for the task."""
assert hasattr(self, '_reference'), 'Task does not have a reference solution.'
return self._reference
@abstractmethod
def extract_answer(self, solution: str) -> Optional[str]:
"""Extract the answer from the given solution."""
pass
@abstractmethod
def success(self, solution: str) -> bool:
"""This checks whether the given solution can complete the current task.
Can be used to provide binary feedback.
"""
answer = self.extract_answer(solution)
return answer == self.reference
@classmethod
def load_tasks(cls, path: str) -> Tuple[List['Task'], int]:
"""Load all the tasks from a given jsonl file."""
assert path.endswith('.jsonl') or path.endswith('.json')
with open(path, 'r') as f:
tasks = [cls(**json.loads(line)) for line in f.readlines()]
LOGGER.info(f'Loaded {len(tasks)} tasks from {path}')
return tasks, len(tasks)
def to_dict(self) -> dict:
"""Convert the task to a dictionary."""
return {
'task_name': self.task_name,
'task_id': self.task_id,
'prompt': self.prompt,
'reference': self.reference,
'metadata': self.metadata,
}
class ReasoningTask(Task):
task_name = 'reasoning'
def __init__(self, id: str, prompt: str, reference: str, **kwargs):
super().__init__(**kwargs)
self._id = id
self._prompt = prompt.strip()
self._reference = str(reference).strip().lower()
def extract_answer(self, solution: str) -> Optional[str]:
"""Extract the answer from the given solution."""
return solution.lower().strip()
def compare_w_digits(self, reference: str, answer: str) -> bool:
"""Compare the reference and answer with digits."""
# if reference can and answer can both be converted to floats by float()
try:
float(reference)
float(answer)
return abs(float(reference) - float(answer)) <= 0.05 * abs(float(reference))
except ValueError:
return reference in answer
except Exception:
raise ValueError(f'Cannot compare {reference} and {answer}')
def success(self, solution: str) -> bool:
answer = self.extract_answer(solution)
return self.compare_w_digits(self._reference, answer)
+10
View File
@@ -0,0 +1,10 @@
import functools
# use cache to avoid loading the same file multiple times
# which can leads to too many open files error
@functools.lru_cache(maxsize=128)
def load_file(filepath: str) -> str:
with open(filepath, 'r') as f:
content = f.read()
return content
+12 -1
View File
@@ -116,9 +116,11 @@ selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__
Then only these tasks (rows whose `instance_id` is in the above list) will be evaluated.
In this case, `eval_limit` option applies to tasks that are in the `selected_ids` list.
After running the inference, you will obtain a `output.jsonl` (by default it will be saved to `evaluation/evaluation_outputs`).
## Evaluate Generated Patches
After running the inference described in the previous section, you will obtain a `output.jsonl` (by default it will save to `evaluation/evaluation_outputs`). Then you can run this one line script to evaluate generated patches, and produce a fine-grained report:
With `output.jsonl` file, you can run `eval_infer.sh` to evaluate generated patches, and produce a fine-grained report.
If you want to evaluate existing results, you should first run this to clone existing outputs
@@ -185,6 +187,15 @@ It will contains an additional field `fine_grained_report` (see example below) c
Please refer to [EVAL_PATCH.md](./EVAL_PATCH.md) if you want to learn more about how to evaluate patches that are already generated (e.g., not by OpenDevin).
## View Result Summary
If you just want to know the resolve rate, and/or a summary of what tests pass and what don't, you could run
```bash
poetry run python ./evaluation/swe_bench/scripts/summarise_results.py <path_to_output_merged_jsonl_file>
# e.g. poetry run python ./evaluation/swe_bench/scripts/summarise_results.py ./evaluation/evaluation_outputs/outputs/swe_bench_lite/CodeActSWEAgent/gpt-4o-2024-05-13_maxiter_50_N_v1.5-no-hint/output.merged.jsonl
```
## Submit your evaluation results
You can start your own fork of [our huggingface evaluation outputs](https://huggingface.co/spaces/OpenDevin/evaluation) and submit a PR of your evaluation results following the guide [here](https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-and-discussions).
+64 -14
View File
@@ -62,11 +62,13 @@ def monologue_user_response(state: State) -> str:
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
'CodeActSWEAgent': codeact_user_response,
'MonologueAgent': monologue_user_response,
}
AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n',
'CodeActSWEAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n',
}
@@ -243,19 +245,62 @@ def process_instance(
)
# Prepare instruction
instruction = (
f'Please fix the following issue for the repository in /workspace/{workspace_dir_name}.\n'
'Environment has been set up for you to start working. You may assume all necessary tools are installed.\n\n'
'# Problem Statement\n'
f'{instance.problem_statement}\n\n'
)
if USE_HINT_TEXT and instance.hints_text:
instruction += f'# Hints\n{instance.hints_text}\n\n'
instruction += (
'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
'You should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\n'
'You SHOULD INCLUDE PROPER INDENTATION in your edit commands.\n'
)
if agent_class == 'CodeActSWEAgent':
instruction = (
'We are currently solving the following issue within our repository. Here is the issue text:\n'
'--- BEGIN ISSUE ---\n'
f'{instance.problem_statement}\n'
'--- END ISSUE ---\n\n'
)
if USE_HINT_TEXT and instance.hints_text:
instruction += (
f'--- BEGIN HINTS ---\n{instance.hints_text}\n--- END HINTS ---\n'
)
instruction += f"""Now, you're going to solve this issue on your own. Your terminal session has started and you're in the repository's root directory. You can use any bash commands or the special interface to help you. Edit all the files you need to and run any checks or tests that you want.
Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
When you're satisfied with all of the changes you've made, you can run the following command: <execute_bash> exit </execute_bash>.
Note however that you cannot use any interactive session commands (e.g. vim) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.
NOTE ABOUT THE EDIT COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!
IMPORTANT TIPS:
1. Always start by trying to replicate the bug that the issues discusses.
If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment, and run it to make sure you can reproduce the bug.
Then start trying to fix it.
When you think you've fixed the bug, re-run the bug reproduction script to make sure that the bug has indeed been fixed.
If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file,
so that you can be sure that the script indeed ran fine all the way through.
2. If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it!
3. If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, don't just use the scroll_down command multiple times. Instead, use the goto 583 command. It's much quicker.
4. If the bug reproduction script requires inputting/reading a specific file, such as buggy-input.png, and you'd like to understand how to input that file, conduct a search in the existing repo code, to see whether someone else has already done that. Do this by running the command: find_file("buggy-input.png") If that doesn't work, use the linux 'find' command.
5. Always make sure to look at the currently open file and the current working directory (which appears right after the currently open file). The currently open file might be in a different directory than the working directory! Note that some commands, such as 'create', open files, so they might change the current open file.
6. When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
[Current directory: /workspace/{workspace_dir_name}]
"""
else:
# Testing general agents
instruction = (
f'Please fix the following issue for the repository in /workspace/{workspace_dir_name}.\n'
'Environment has been set up for you to start working. You may assume all necessary tools are installed.\n\n'
'# Problem Statement\n'
f'{instance.problem_statement}\n\n'
)
if USE_HINT_TEXT and instance.hints_text:
instruction += f'# Hints\n{instance.hints_text}\n\n'
instruction += (
'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
'You should NOT modify any existing test case files. If needed, you can add new test cases in a NEW file to reproduce the issue.\n'
'You SHOULD INCLUDE PROPER INDENTATION in your edit commands.\n'
)
# NOTE: You can actually set slightly different instruction for different agents
instruction += AGENT_CLS_TO_INST_SUFFIX.get(agent_class, '')
@@ -370,6 +415,11 @@ if __name__ == '__main__':
.decode('utf-8')
.strip(),
}
_agent_cls = agenthub.Agent.get_cls(agent_class)
if hasattr(_agent_cls, 'system_message'):
metadata['system_message'] = _agent_cls.system_message
if hasattr(_agent_cls, 'in_context_example'):
metadata['in_context_example'] = _agent_cls.in_context_example
logger.info(f'Metadata: {metadata}')
with open(os.path.join(eval_output_dir, 'metadata.json'), 'w') as f:
json.dump(metadata, f)
+7 -1
View File
@@ -2,12 +2,18 @@
MODEL_CONFIG=$1
AGENT=$2
EVAL_LIMIT=$3
MAX_ITER=$4
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
if [ -z "$MAX_ITER" ]; then
echo "MAX_ITER not specified, use default 30"
MAX_ITER=30
fi
# IMPORTANT: Because Agent's prompt changes fairly often in the rapidly evolving codebase of OpenDevin
# We need to track the version of Agent in the evaluation to make sure results are comparable
AGENT_VERSION=v$(poetry run python -c "import agenthub; from opendevin.controller.agent import Agent; print(Agent.get_cls('$AGENT').VERSION)")
@@ -32,7 +38,7 @@ unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token t
COMMAND="poetry run python evaluation/swe_bench/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations 30 \
--max-iterations $MAX_ITER \
--max-chars 10000000 \
--eval-num-workers 8 \
--eval-note $EVAL_NOTE"
@@ -0,0 +1,39 @@
import json
import sys
def extract_test_results(json_file_path):
passed_tests = []
failed_tests = []
with open(json_file_path, 'r') as file:
for line in file:
data = json.loads(line.strip())
instance_id = data['instance_id']
resolved = False
if 'fine_grained_report' in data:
resolved = data['fine_grained_report']['resolved']
else:
resolved = data['test_result']['result']['resolved']
if resolved:
passed_tests.append(instance_id)
else:
failed_tests.append(instance_id)
return passed_tests, failed_tests
if __name__ == '__main__':
if len(sys.argv) != 2:
print(
'Usage: poetry run python summarise_results.py <path_to_output_merged_jsonl_file>'
)
sys.exit(1)
json_file_path = sys.argv[1]
passed_tests, failed_tests = extract_test_results(json_file_path)
succ_rate = len(passed_tests) / (len(passed_tests) + len(failed_tests))
print(
f'\nPassed {len(passed_tests)} tests, failed {len(failed_tests)} tests, resolve rate = {succ_rate}'
)
print('PASSED TESTS:')
print(passed_tests)
print('FAILED TESTS:')
print(failed_tests)
+10 -1
View File
@@ -25,12 +25,14 @@ class SWEBenchSSHBox(DockerSSHBox):
swe_instance: dict | None = None,
skip_workspace_mount: bool = True,
sandbox_plugins: list[PluginRequirement] = [], # noqa: B006
workspace_dir_name: str | None = None,
):
if swe_instance_id is None:
raise ValueError('swe_instance_id must be provided!')
self.swe_instance_id = swe_instance_id
self.swe_instance = swe_instance
self.skip_workspace_mount = skip_workspace_mount
self.workspace_dir_name = workspace_dir_name
assert (
container_image is not None
@@ -94,6 +96,7 @@ class SWEBenchSSHBox(DockerSSHBox):
swe_instance=instance,
skip_workspace_mount=skip_workspace_mount,
sandbox_plugins=sandbox_plugins,
workspace_dir_name=workspace_dir_name,
)
logger.info(f"SSH box started for instance {instance['instance_id']}.")
@@ -123,7 +126,13 @@ class SWEBenchSSHBox(DockerSSHBox):
def get_diff_patch(self):
# add everything to the index
exit_code, output = self.execute('git add --all')
exit_code, output = self.execute(f'cd /workspace/{self.workspace_dir_name}')
if exit_code != 0:
logger.error('Failed to cd to the repo')
return ''
# add everything to the index
exit_code, output = self.execute('git add -A')
if exit_code != 0:
logger.error('Failed to add everything to the index')
return ''
+1
View File
@@ -44,6 +44,7 @@
}],
// For https://stackoverflow.com/questions/55844608/stuck-with-eslint-error-i-e-separately-loops-should-be-avoided-in-favor-of-arra
"no-restricted-syntax": "off",
"react/require-default-props": "off",
"import/prefer-default-export": "off",
"no-underscore-dangle": "off",
"jsx-a11y/no-static-element-interactions": "off",
+2602 -1441
View File
File diff suppressed because it is too large Load Diff
+12 -12
View File
@@ -8,15 +8,15 @@
},
"dependencies": {
"@monaco-editor/react": "^4.6.0",
"@nextui-org/react": "^2.3.6",
"@nextui-org/react": "^2.4.1",
"@react-types/shared": "^3.23.1",
"@reduxjs/toolkit": "^2.2.5",
"@vitejs/plugin-react": "^4.2.1",
"@vitejs/plugin-react": "^4.3.0",
"@xterm/addon-fit": "^0.10.0",
"@xterm/xterm": "^5.4.0",
"clsx": "^2.1.1",
"eslint-config-airbnb-typescript": "^18.0.0",
"framer-motion": "^11.2.6",
"framer-motion": "^11.2.10",
"i18next": "^23.11.5",
"i18next-browser-languagedetector": "^8.0.0",
"i18next-http-backend": "^2.5.2",
@@ -33,7 +33,7 @@
"react-router-dom": "^6.23.1",
"react-syntax-highlighter": "^15.5.0",
"tailwind-merge": "^2.3.0",
"vite": "^5.2.11",
"vite": "^5.2.12",
"web-vitals": "^3.5.2"
},
"scripts": {
@@ -62,14 +62,14 @@
"@tailwindcss/typography": "^0.5.13",
"@testing-library/jest-dom": "^6.4.5",
"@testing-library/react": "^15.0.7",
"@testing-library/user-event": "^13.5.0",
"@types/node": "^18.0.0 ",
"@testing-library/user-event": "^14.5.2",
"@types/node": "^20.12.13",
"@types/react": "^18.3.3",
"@types/react-dom": "^18.3.0",
"@types/react-highlight": "^0.12.8",
"@types/react-syntax-highlighter": "^15.5.13",
"@typescript-eslint/eslint-plugin": "^7.10.0",
"@typescript-eslint/parser": "^7.10.0",
"@typescript-eslint/eslint-plugin": "^7.11.0",
"@typescript-eslint/parser": "^7.11.0",
"autoprefixer": "^10.4.19",
"eslint": "^8.57.0",
"eslint-config-airbnb": "^19.0.4",
@@ -78,15 +78,15 @@
"eslint-plugin-import": "^2.29.1",
"eslint-plugin-jsx-a11y": "^6.8.0",
"eslint-plugin-prettier": "^5.1.3",
"eslint-plugin-react": "^7.34.1",
"eslint-plugin-react": "^7.34.2",
"eslint-plugin-react-hooks": "^4.6.2",
"husky": "^9.0.11",
"jsdom": "^24.0.0",
"lint-staged": "^15.2.4",
"jsdom": "^24.1.0",
"lint-staged": "^15.2.5",
"postcss": "^8.4.38",
"prettier": "^3.2.5",
"tailwindcss": "^3.4.2",
"typescript": "^5.4.3",
"typescript": "^5.4.5",
"vite-tsconfig-paths": "^4.3.2",
"vitest": "^1.6.0"
},
+1 -5
View File
@@ -41,7 +41,7 @@ function ActionButton({
action,
handleAction,
children,
large,
large = false,
}: React.PropsWithChildren<ButtonProps>): React.ReactNode {
return (
<Tooltip content={content} closeDelay={100}>
@@ -57,10 +57,6 @@ function ActionButton({
);
}
ActionButton.defaultProps = {
large: false,
};
function AgentControlBar() {
const { curAgentState } = useSelector((state: RootState) => state.agent);
const [desiredState, setDesiredState] = React.useState(AgentState.INIT);
+1 -5
View File
@@ -12,7 +12,7 @@ function IconButton({
icon,
onClick,
ariaLabel,
testId,
testId = "",
}: IconButtonProps): React.ReactElement {
return (
<Button
@@ -28,8 +28,4 @@ function IconButton({
);
}
IconButton.defaultProps = {
testId: "",
};
export default IconButton;
+1 -5
View File
@@ -10,7 +10,7 @@ interface ChatInputProps {
onSendMessage: (message: string) => void;
}
function ChatInput({ disabled, onSendMessage }: ChatInputProps) {
function ChatInput({ disabled = false, onSendMessage }: ChatInputProps) {
const { t } = useTranslation();
const [message, setMessage] = React.useState("");
@@ -70,8 +70,4 @@ function ChatInput({ disabled, onSendMessage }: ChatInputProps) {
);
}
ChatInput.defaultProps = {
disabled: false,
};
export default ChatInput;
@@ -16,8 +16,4 @@ function ExplorerTree({ files, defaultOpen = false }: ExplorerTreeProps) {
);
}
ExplorerTree.defaultProps = {
defaultOpen: false,
};
export default ExplorerTree;
@@ -94,8 +94,4 @@ function TreeNode({ path, defaultOpen = false }: TreeNodeProps) {
);
}
TreeNode.defaultProps = {
defaultOpen: false,
};
export default React.memo(TreeNode);
@@ -24,9 +24,9 @@ function BaseModal({
onOpenChange,
title,
isDismissable = true,
subtitle,
actions,
children,
subtitle = undefined,
actions = [],
children = null,
}: BaseModalProps) {
return (
<Modal
@@ -60,11 +60,4 @@ function BaseModal({
);
}
BaseModal.defaultProps = {
isDismissable: true,
subtitle: undefined,
actions: [],
children: null,
};
export default BaseModal;
@@ -5,7 +5,10 @@ interface HeaderContentProps {
subtitle?: string;
}
export function HeaderContent({ title, subtitle }: HeaderContentProps) {
export function HeaderContent({
title,
subtitle = undefined,
}: HeaderContentProps) {
return (
<>
<h3>{title}</h3>
@@ -15,7 +18,3 @@ export function HeaderContent({ title, subtitle }: HeaderContentProps) {
</>
);
}
HeaderContent.defaultProps = {
subtitle: undefined,
};
@@ -77,8 +77,3 @@ export function AutocompleteCombobox({
</Tooltip>
);
}
AutocompleteCombobox.defaultProps = {
allowCustomValue: false,
disabled: false,
};
@@ -23,12 +23,12 @@ vi.spyOn(Session, "isConnected").mockImplementation(() => true);
vi.mock("#/services/settings", async (importOriginal) => ({
...(await importOriginal<typeof import("#/services/settings")>()),
getSettings: vi.fn().mockReturnValue({
LLM_MODEL: "gpt-3.5-turbo",
LLM_MODEL: "gpt-4o",
AGENT: "MonologueAgent",
LANGUAGE: "en",
}),
getDefaultSettings: vi.fn().mockReturnValue({
LLM_MODEL: "gpt-3.5-turbo",
LLM_MODEL: "gpt-4o",
AGENT: "CodeActAgent",
LANGUAGE: "en",
LLM_API_KEY: "",
@@ -81,7 +81,7 @@ describe("SettingsModal", () => {
it("should disabled the save button if the settings contain a missing value", async () => {
const onOpenChangeMock = vi.fn();
(getSettings as Mock).mockReturnValueOnce({
LLM_MODEL: "gpt-3.5-turbo",
LLM_MODEL: "gpt-4o",
AGENT: "",
});
await act(async () =>
@@ -97,7 +97,7 @@ describe("SettingsModal", () => {
describe("onHandleSave", () => {
const initialSettings: Settings = {
LLM_MODEL: "gpt-3.5-turbo",
LLM_MODEL: "gpt-4o",
AGENT: "MonologueAgent",
LANGUAGE: "en",
LLM_API_KEY: "sk-...",
+3 -4
View File
@@ -5,12 +5,10 @@ const WAIT_FOR_AUTH_DELAY_MS = 500;
export async function request(
url: string,
optionsIn: RequestInit = {},
options: RequestInit = {},
disableToast: boolean = false,
/* eslint-disable-next-line @typescript-eslint/no-explicit-any */
): Promise<any> {
const options = JSON.parse(JSON.stringify(optionsIn));
const onFail = (msg: string) => {
if (!disableToast) {
toast.error("api", msg);
@@ -23,11 +21,12 @@ export async function request(
if (!token && needsAuth) {
return new Promise((resolve) => {
setTimeout(() => {
resolve(request(url, optionsIn, disableToast));
resolve(request(url, options, disableToast));
}, WAIT_FOR_AUTH_DELAY_MS);
});
}
if (token) {
// eslint-disable-next-line no-param-reassign
options.headers = {
...(options.headers || {}),
Authorization: `Bearer ${token}`,
+3 -3
View File
@@ -8,7 +8,7 @@ export type Settings = {
};
export const DEFAULT_SETTINGS: Settings = {
LLM_MODEL: "gpt-3.5-turbo",
LLM_MODEL: "gpt-4o",
AGENT: "CodeActAgent",
LANGUAGE: "en",
LLM_API_KEY: "",
@@ -79,8 +79,8 @@ export const saveSettings = (settings: Partial<Settings>) => {
* Useful for notifying the user of exact changes.
*
* @example
* // Assuming the current settings are: { LLM_MODEL: "gpt-3.5", AGENT: "MonologueAgent", LANGUAGE: "en" }
* const updatedSettings = getSettingsDifference({ LLM_MODEL: "gpt-3.5", AGENT: "OTHER_AGENT", LANGUAGE: "en" });
* // Assuming the current settings are: { LLM_MODEL: "gpt-4o", AGENT: "MonologueAgent", LANGUAGE: "en" }
* const updatedSettings = getSettingsDifference({ LLM_MODEL: "gpt-4o", AGENT: "OTHER_AGENT", LANGUAGE: "en" });
* // updatedSettings = { AGENT: "OTHER_AGENT" }
*
* @param settings - the settings to compare
+80 -22
View File
@@ -47,6 +47,7 @@ class AgentController:
event_stream: EventStream
state: State
agent_task: Optional[asyncio.Task] = None
parent: 'AgentController | None' = None
delegate: 'AgentController | None' = None
_pending_action: Action | None = None
@@ -58,7 +59,8 @@ class AgentController:
max_iterations: int = MAX_ITERATIONS,
max_chars: int = MAX_CHARS,
max_budget_per_task: float | None = MAX_BUDGET_PER_TASK,
inputs: dict | None = None,
initial_state: State | None = None,
is_delegate: bool = False,
):
"""Initializes a new instance of the AgentController class.
@@ -69,25 +71,30 @@ class AgentController:
max_iterations: The maximum number of iterations the agent can run.
max_chars: The maximum number of characters the agent can output.
max_budget_per_task: The maximum budget (in USD) allowed per task, beyond which the agent will stop.
inputs: The initial inputs to the agent.
initial_state: The initial state of the controller.
is_delegate: Whether this controller is a delegate.
"""
self._step_lock = asyncio.Lock()
self.id = sid
self.agent = agent
self.state = State(inputs=inputs or {}, max_iterations=max_iterations)
self.max_chars = max_chars
if initial_state is None:
self.state = State(inputs={}, max_iterations=max_iterations)
else:
self.state = initial_state
self.event_stream = event_stream
self.event_stream.subscribe(
EventStreamSubscriber.AGENT_CONTROLLER, self.on_event
EventStreamSubscriber.AGENT_CONTROLLER, self.on_event, append=is_delegate
)
self.max_iterations = max_iterations
self.max_chars = max_chars
self.max_budget_per_task = max_budget_per_task
self.agent_task = asyncio.create_task(self._start_step_loop())
if not is_delegate:
self.agent_task = asyncio.create_task(self._start_step_loop())
async def close(self):
if self.agent_task is not None:
self.agent_task.cancel()
self.event_stream.unsubscribe(EventStreamSubscriber.AGENT_CONTROLLER)
await self.set_agent_state_to(AgentState.STOPPED)
self.event_stream.unsubscribe(EventStreamSubscriber.AGENT_CONTROLLER)
def update_state_before_step(self):
self.state.iteration += 1
@@ -117,6 +124,7 @@ class AgentController:
self.state.updated_info.append((action, observation))
async def _start_step_loop(self):
logger.info(f'[Agent Controller {self.id}] Starting step loop...')
while True:
try:
await self._step()
@@ -164,13 +172,16 @@ class AgentController:
elif isinstance(event, CmdOutputObservation):
await self.add_history(NullAction(), event)
logger.info(event, extra={'msg_type': 'OBSERVATION'})
elif isinstance(event, AgentDelegateObservation):
await self.add_history(NullAction(), event)
logger.info(event, extra={'msg_type': 'OBSERVATION'})
def reset_task(self):
self.agent.reset()
async def set_agent_state_to(self, new_state: AgentState):
logger.info(
f'Setting agent({type(self.agent).__name__}) state from {self.state.agent_state} to {new_state}'
f'[Agent Controller {self.id}] Setting agent({type(self.agent).__name__}) state from {self.state.agent_state} to {new_state}'
)
if new_state == self.state.agent_state:
@@ -195,45 +206,84 @@ class AgentController:
async def start_delegate(self, action: AgentDelegateAction):
AgentCls: Type[Agent] = Agent.get_cls(action.agent)
agent = AgentCls(llm=self.agent.llm)
state = State(
inputs=action.inputs or {},
iteration=0,
max_iterations=self.state.max_iterations,
num_of_chars=self.state.num_of_chars,
delegate_level=self.state.delegate_level + 1,
)
logger.info(f'[Agent Controller {self.id}]: start delegate')
self.delegate = AgentController(
sid=self.id + '-delegate',
agent=agent,
event_stream=self.event_stream,
max_iterations=self.max_iterations,
max_iterations=self.state.max_iterations,
max_chars=self.max_chars,
inputs=action.inputs,
initial_state=state,
is_delegate=True,
)
await self.delegate.set_agent_state_to(AgentState.RUNNING)
async def _step(self):
logger.debug(f'[Agent Controller {self.id}] Entering step method')
if self.get_agent_state() != AgentState.RUNNING:
logger.debug('waiting for agent to run...')
await asyncio.sleep(1)
return
if self._pending_action:
logger.debug('waiting for pending action: ' + str(self._pending_action))
logger.info(
f'[Agent Controller {self.id}] waiting for pending action: {self._pending_action}'
)
await asyncio.sleep(1)
return
logger.info(f'STEP {self.state.iteration}', extra={'msg_type': 'STEP'})
if self.state.iteration >= self.max_iterations:
await self.report_error('Agent reached maximum number of iterations')
await self.set_agent_state_to(AgentState.ERROR)
return
if self.delegate is not None:
delegate_done = await self.delegate._step()
logger.debug(f'[Agent Controller {self.id}] Delegate not none, awaiting...')
assert self.delegate != self
await self.delegate._step()
logger.debug(f'[Agent Controller {self.id}] Delegate step done')
assert self.delegate is not None
delegate_state = self.delegate.get_agent_state()
if delegate_state == AgentState.ERROR:
# close the delegate upon error
await self.delegate.close()
await self.report_error('Delegator agent encounters an error')
# propagate error state until an agent or user can handle it
await self.set_agent_state_to(AgentState.ERROR)
return
delegate_done = delegate_state == AgentState.FINISHED
if delegate_done:
logger.info(
f'[Agent Controller {self.id}] Delegate agent has finished execution'
)
# retrieve delegate result
outputs = self.delegate.state.outputs if self.delegate.state else {}
obs: Observation = AgentDelegateObservation(content='', outputs=outputs)
await self.event_stream.add_event(obs, EventSource.AGENT)
# close delegate controller: we must close the delegate controller before adding new events
await self.delegate.close()
# clean up delegate status
self.delegate = None
self.delegateAction = None
# update delegate result observation
obs: Observation = AgentDelegateObservation(outputs=outputs, content='')
await self.event_stream.add_event(obs, EventSource.AGENT)
return
if self.state.num_of_chars > self.max_chars:
raise MaxCharsExceedError(self.state.num_of_chars, self.max_chars)
logger.info(
f'{type(self.agent).__name__} LEVEL {self.state.delegate_level} STEP {self.state.iteration}',
extra={'msg_type': 'STEP'},
)
if self.state.iteration >= self.state.max_iterations:
await self.report_error('Agent reached maximum number of iterations')
await self.set_agent_state_to(AgentState.ERROR)
return
self.update_state_before_step()
action: Action = NullAction()
try:
@@ -335,6 +385,14 @@ class AgentController:
return False
def __repr__(self):
return (
f'AgentController(id={self.id}, agent={self.agent!r}, '
f'event_stream={self.event_stream!r}, '
f'state={self.state!r}, agent_task={self.agent_task!r}, '
f'delegate={self.delegate!r}, _pending_action={self._pending_action!r})'
)
def _eq_no_pid(self, obj1, obj2):
if isinstance(obj1, CmdOutputObservation) and isinstance(
obj2, CmdOutputObservation
+2
View File
@@ -40,6 +40,8 @@ class State:
agent_state: AgentState = AgentState.LOADING
resume_state: AgentState | None = None
metrics: Metrics = Metrics()
# root agent has level 0, and every delegate increases the level by one
delegate_level: int = 0
def save_to_session(self, sid: str):
fs = get_file_store()
+4 -1
View File
@@ -48,7 +48,7 @@ class LLMConfig(metaclass=Singleton):
output_cost_per_token: The cost per output token. This will available in logs for the user to check.
"""
model: str = 'gpt-3.5-turbo'
model: str = 'gpt-4o'
api_key: str | None = None
base_url: str | None = None
api_version: str | None = None
@@ -179,6 +179,9 @@ class AppConfig(metaclass=Singleton):
disable_color: bool = False
sandbox_user_id: int = os.getuid() if hasattr(os, 'getuid') else 1000
sandbox_timeout: int = 120
persist_sandbox: bool = False
ssh_port: int = 63710
ssh_password: str | None = None
github_token: str | None = None
jwt_secret: str = uuid.uuid4().hex
debug: bool = False
+1
View File
@@ -2,6 +2,7 @@ from enum import Enum
class ConfigType(str, Enum):
# For frontend
LLM_CUSTOM_LLM_PROVIDER = 'LLM_CUSTOM_LLM_PROVIDER'
LLM_MAX_INPUT_TOKENS = 'LLM_MAX_INPUT_TOKENS'
LLM_MAX_OUTPUT_TOKENS = 'LLM_MAX_OUTPUT_TOKENS'
+15 -7
View File
@@ -21,7 +21,9 @@ class EventStreamSubscriber(str, Enum):
class EventStream:
sid: str
_subscribers: dict[str, Callable]
# For each subscriber ID, there is a stack of callback functions - useful
# when there are agent delegates
_subscribers: dict[str, list[Callable]]
_cur_id: int
_lock: asyncio.Lock
_file_store: FileStore
@@ -69,17 +71,22 @@ class EventStream:
data = json.loads(content)
return event_from_dict(data)
def subscribe(self, id: EventStreamSubscriber, callback: Callable):
def subscribe(self, id: EventStreamSubscriber, callback: Callable, append=False):
if id in self._subscribers:
raise ValueError('Subscriber already exists: ' + id)
if append:
self._subscribers[id].append(callback)
else:
raise ValueError('Subscriber already exists: ' + id)
else:
self._subscribers[id] = callback
self._subscribers[id] = [callback]
def unsubscribe(self, id: EventStreamSubscriber):
if id not in self._subscribers:
logger.warning('Subscriber not found during unsubscribe: ' + id)
else:
del self._subscribers[id]
self._subscribers[id].pop()
if len(self._subscribers[id]) == 0:
del self._subscribers[id]
# TODO: make this not async
async def add_event(self, event: Event, source: EventSource):
@@ -93,5 +100,6 @@ class EventStream:
self._file_store.write(
self._get_filename_for_id(event.id), json.dumps(data)
)
for key, fn in self._subscribers.items():
await fn(event)
for key, stack in self._subscribers.items():
callback = stack[-1]
await callback(event)
+1 -1
View File
@@ -131,7 +131,7 @@ class LLM:
# litellm actually uses base Exception here for unknown model
self.model_info = None
try:
self.model_info = litellm.get_model_info(self.model_name)
self.model_info = litellm.get_model_info(self.model_name.split(':')[0])
# noinspection PyBroadException
except Exception:
logger.warning(f'Could not get model info for {self.model_name}')
+45 -27
View File
@@ -216,38 +216,50 @@ class DockerSSHBox(Sandbox):
)
raise ex
self.instance_id = (
sid + str(uuid.uuid4()) if sid is not None else str(uuid.uuid4())
)
if config.persist_sandbox:
self.instance_id = 'persisted'
else:
self.instance_id = (sid or '') + str(uuid.uuid4())
self.timeout = timeout
self.container_image = (
config.sandbox_container_image
if container_image is None
else container_image
)
self.container_image = container_image or config.sandbox_container_image
self.container_name = self.container_name_prefix + self.instance_id
# set up random user password
self._ssh_password = str(uuid.uuid4())
self._ssh_port = find_available_tcp_port()
# always restart the container, cuz the initial be regarded as a new session
n_tries = 5
while n_tries > 0:
try:
self.restart_docker_container()
break
except Exception as e:
logger.exception(
'Failed to start Docker container, retrying...', exc_info=False
if config.persist_sandbox:
if not config.ssh_password:
raise Exception(
'Please add ssh_password to your config.toml or add -e SSH_PASSWORD to your docker run command'
)
n_tries -= 1
if n_tries == 0:
raise e
time.sleep(5)
self.setup_user()
self._ssh_password = config.ssh_password
self._ssh_port = config.ssh_port
else:
self._ssh_password = str(uuid.uuid4())
self._ssh_port = find_available_tcp_port()
try:
docker.DockerClient().containers.get(self.container_name)
is_initial_session = False
except docker.errors.NotFound:
is_initial_session = True
logger.info('Creating new Docker container')
if not config.persist_sandbox or is_initial_session:
n_tries = 5
while n_tries > 0:
try:
self.restart_docker_container()
break
except Exception as e:
logger.exception(
'Failed to start Docker container, retrying...', exc_info=False
)
n_tries -= 1
if n_tries == 0:
raise e
time.sleep(5)
self.setup_user()
else:
self.container = self.docker_client.containers.get(self.container_name)
logger.info('Using existing Docker container')
try:
self.start_ssh_session()
except pxssh.ExceptionPxssh as e:
@@ -391,6 +403,9 @@ class DockerSSHBox(Sandbox):
# cd to workspace
self.ssh.sendline(f'cd {self.sandbox_workspace_dir}')
self.ssh.prompt()
# load bashrc
self.ssh.sendline('source ~/.bashrc')
self.ssh.prompt()
def get_exec_cmd(self, cmd: str) -> list[str]:
if self.run_as_devin:
@@ -704,7 +719,10 @@ class DockerSSHBox(Sandbox):
containers = self.docker_client.containers.list(all=True)
for container in containers:
try:
if container.name.startswith(self.container_name):
if (
container.name.startswith(self.container_name)
and not config.persist_sandbox
):
# only remove the container we created
# otherwise all other containers with the same prefix will be removed
# which will mess up with parallel evaluation
@@ -16,6 +16,7 @@ Functions:
"""
import base64
import functools
import os
import subprocess
from inspect import signature
@@ -46,6 +47,22 @@ OPENAI_PROXY = f'{OPENAI_BASE_URL}/chat/completions'
client = OpenAI(api_key=OPENAI_API_KEY, base_url=OPENAI_BASE_URL)
# Define the decorator using the functionality of UpdatePwd
def update_pwd_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
old_pwd = os.getcwd()
jupyter_pwd = os.environ.get('JUPYTER_PWD', None)
if jupyter_pwd:
os.chdir(jupyter_pwd)
try:
return func(*args, **kwargs)
finally:
os.chdir(old_pwd)
return wrapper
def _lint_file(file_path: str) -> Optional[str]:
"""
Lint the file at the given path.
@@ -88,12 +105,21 @@ def _print_window(CURRENT_FILE, CURRENT_LINE, WINDOW, return_str=False):
start = max(0, CURRENT_LINE - WINDOW // 2)
end = min(len(lines), CURRENT_LINE + WINDOW // 2)
output = ''
# only display this when there's line above
if start > 0:
n_above_lines = start
output += f'({n_above_lines} more lines above)\n'
for i in range(start, end):
_new_line = f'{i + 1}|{lines[i]}'
if not _new_line.endswith('\n'):
_new_line += '\n'
output += _new_line
if end < len(lines):
n_below_lines = len(lines) - end
output += f'({n_below_lines} more lines below)\n'
output = output.rstrip()
if return_str:
return output
else:
@@ -104,6 +130,7 @@ def _cur_file_header(CURRENT_FILE, total_lines):
return f'[File: {os.path.abspath(CURRENT_FILE)} ({total_lines} lines total)]\n'
@update_pwd_decorator
def open_file(path: str, line_number: Optional[int] = None) -> None:
"""
Opens the file at the given path in the editor. If line_number is provided, the window will be moved to include that line.
@@ -116,7 +143,7 @@ def open_file(path: str, line_number: Optional[int] = None) -> None:
if not os.path.isfile(path):
raise FileNotFoundError(f'File {path} not found')
CURRENT_FILE = path
CURRENT_FILE = os.path.abspath(path)
with open(CURRENT_FILE) as file:
total_lines = sum(1 for _ in file)
@@ -136,6 +163,7 @@ def open_file(path: str, line_number: Optional[int] = None) -> None:
print(output)
@update_pwd_decorator
def goto_line(line_number: int) -> None:
"""
Moves the window to show the specified line number.
@@ -158,6 +186,7 @@ def goto_line(line_number: int) -> None:
print(output)
@update_pwd_decorator
def scroll_down() -> None:
"""Moves the window down by 100 lines.
@@ -175,6 +204,7 @@ def scroll_down() -> None:
print(output)
@update_pwd_decorator
def scroll_up() -> None:
"""Moves the window up by 100 lines.
@@ -192,6 +222,7 @@ def scroll_up() -> None:
print(output)
@update_pwd_decorator
def create_file(filename: str) -> None:
"""Creates and opens a new file with the given name.
@@ -209,6 +240,7 @@ def create_file(filename: str) -> None:
print(f'[File {filename} created.]')
@update_pwd_decorator
def edit_file(start: int, end: int, content: str) -> None:
"""Edit a file.
@@ -227,21 +259,35 @@ def edit_file(start: int, end: int, content: str) -> None:
with open(CURRENT_FILE, 'r') as file:
lines = file.readlines()
ERROR_MSG = f'[Error editing opened file {CURRENT_FILE}. Please confirm the opened file is correct.]'
ERROR_MSG_SUFFIX = (
'Your changes have NOT been applied. Please fix your edit command and try again.\n'
'You either need to 1) Open the correct file and try again or 2) Specify the correct start/end line arguments.\n'
'DO NOT re-run the same failed edit command. Running it again will lead to the same error.'
)
# Check arguments
if not (1 <= start <= len(lines)):
raise ValueError(
f'Invalid start line number: {start}. Line numbers must be between 1 and {len(lines)} (inclusive).'
print(
f'{ERROR_MSG}\n'
f'Invalid start line number: {start}. Line numbers must be between 1 and {len(lines)} (inclusive).\n'
f'{ERROR_MSG_SUFFIX}'
)
return
if not (1 <= end <= len(lines)):
raise ValueError(
f'Invalid end line number: {end}. Line numbers must be between 1 and {len(lines)} (inclusive).'
print(
f'{ERROR_MSG}\n'
f'Invalid end line number: {end}. Line numbers must be between 1 and {len(lines)} (inclusive).\n'
f'{ERROR_MSG_SUFFIX}'
)
return
if start > end:
raise ValueError(
f'Invalid line range: {start}-{end}. Start must be less than or equal to end.'
print(
f'{ERROR_MSG}\n'
f'Invalid line range: {start}-{end}. Start must be less than or equal to end.\n'
f'{ERROR_MSG_SUFFIX}'
)
return
edited_content = content + '\n'
n_edited_lines = len(edited_content.split('\n'))
@@ -270,14 +316,20 @@ def edit_file(start: int, end: int, content: str) -> None:
print('[This is how your edit would have looked if applied]')
print('-------------------------------------------------')
cur_line = (n_edited_lines // 2) + start
_print_window(CURRENT_FILE, cur_line, WINDOW)
_print_window(CURRENT_FILE, cur_line, 10)
print('-------------------------------------------------\n')
print('[This is the original code before your edit]')
print('-------------------------------------------------')
_print_window(original_file_backup_path, CURRENT_LINE, WINDOW)
_print_window(original_file_backup_path, cur_line, 10)
print('-------------------------------------------------')
print(
'Your changes have NOT been applied. Please fix your edit command and try again.\n'
'You either need to 1) Specify the correct start/end line arguments or 2) Correct your edit code.\n'
'DO NOT re-run the same failed edit command. Running it again will lead to the same error.'
)
# recover the original file
with open(original_file_backup_path, 'r') as fin, open(
CURRENT_FILE, 'w'
@@ -301,6 +353,7 @@ def edit_file(start: int, end: int, content: str) -> None:
)
@update_pwd_decorator
def search_dir(search_term: str, dir_path: str = './') -> None:
"""Searches for search_term in all files in dir. If dir is not provided, searches in the current directory.
@@ -310,7 +363,6 @@ def search_dir(search_term: str, dir_path: str = './') -> None:
"""
if not os.path.isdir(dir_path):
raise FileNotFoundError(f'Directory {dir_path} not found')
matches = []
for root, _, files in os.walk(dir_path):
for file in files:
@@ -341,6 +393,7 @@ def search_dir(search_term: str, dir_path: str = './') -> None:
print(f'[End of matches for "{search_term}" in {dir_path}]')
@update_pwd_decorator
def search_file(search_term: str, file_path: Optional[str] = None) -> None:
"""Searches for search_term in file. If file is not provided, searches in the current open file.
@@ -373,6 +426,7 @@ def search_file(search_term: str, file_path: Optional[str] = None) -> None:
print(f'[No matches found for "{search_term}" in {file_path}]')
@update_pwd_decorator
def find_file(file_name: str, dir_path: str = './') -> None:
"""Finds all files with the given name in the specified directory.
@@ -398,6 +452,7 @@ def find_file(file_name: str, dir_path: str = './') -> None:
print(f'[No matches found for "{file_name}" in {dir_path}]')
@update_pwd_decorator
def parse_pdf(file_path: str) -> None:
"""Parses the content of a PDF file and prints it.
@@ -416,6 +471,7 @@ def parse_pdf(file_path: str) -> None:
print(text.strip())
@update_pwd_decorator
def parse_docx(file_path: str) -> None:
"""
Parses the content of a DOCX file and prints it.
@@ -431,6 +487,7 @@ def parse_docx(file_path: str) -> None:
print(text)
@update_pwd_decorator
def parse_latex(file_path: str) -> None:
"""
Parses the content of a LaTex file and prints it.
@@ -484,6 +541,7 @@ def _prepare_image_messages(task: str, base64_image: str):
]
@update_pwd_decorator
def parse_audio(file_path: str, model: str = 'whisper-1') -> None:
"""
Parses the content of an audio file and prints it.
@@ -503,6 +561,7 @@ def parse_audio(file_path: str, model: str = 'whisper-1') -> None:
print(f'Error transcribing audio file: {e}')
@update_pwd_decorator
def parse_image(
file_path: str, task: str = 'Describe this image as detail as possible.'
) -> None:
@@ -529,6 +588,7 @@ def parse_image(
print(f'Error with the request: {error}')
@update_pwd_decorator
def parse_video(
file_path: str,
task: str = 'Describe this image as detail as possible.',
@@ -577,6 +637,7 @@ def parse_video(
print(f'Error with the request: {error}')
@update_pwd_decorator
def parse_pptx(file_path: str) -> None:
"""
Parses the content of a pptx file and prints it.
@@ -7,20 +7,33 @@ import requests
# Read the Python code from STDIN
code = sys.stdin.read()
# Set the default kernel ID
kernel_id = 'default'
PORT = os.environ.get('JUPYTER_EXEC_SERVER_PORT')
POST_URL = f'http://localhost:{PORT}/execute'
def execute_code(code, print_output=True):
PORT = os.environ.get('JUPYTER_EXEC_SERVER_PORT')
POST_URL = f'http://localhost:{PORT}/execute'
for i in range(10):
try:
response = requests.post(POST_URL, json={'kernel_id': kernel_id, 'code': code})
if '500: Internal Server Error' not in response.text:
print(response.text)
break
except requests.exceptions.ConnectionError:
pass
time.sleep(2)
else:
print('Failed to connect to the Jupyter server')
# Set the default kernel ID
kernel_id = 'default'
for i in range(10):
try:
response = requests.post(
POST_URL, json={'kernel_id': kernel_id, 'code': code}
)
if '500: Internal Server Error' not in response.text:
if print_output:
print(response.text)
break
except requests.exceptions.ConnectionError:
pass
time.sleep(2)
else:
print('Failed to connect to the Jupyter server')
if jupyter_pwd := os.environ.get('JUPYTER_PWD'):
execute_code(
f'import os\nos.environ["JUPYTER_PWD"] = "{jupyter_pwd}"\n', print_output=False
)
execute_code(code)
@@ -134,7 +134,7 @@ class JupyterKernel:
)
self.heartbeat_callback.start()
async def execute(self, code, timeout=60):
async def execute(self, code, timeout=120):
if not self.ws:
await self._connect()
+4 -1
View File
@@ -55,7 +55,10 @@ class ServerRuntime(Runtime):
# run the code
obs = self._run_command(
('cat /tmp/opendevin_jupyter_temp.py | execute_cli'), background=False
(
'export JUPYTER_PWD=$(pwd) && cat /tmp/opendevin_jupyter_temp.py | execute_cli'
),
background=False,
)
output = obs.content
if 'pip install' in action.code and 'Successfully installed' in output:
+1 -1
View File
@@ -24,7 +24,7 @@ websocat ws://127.0.0.1:3000/ws
```sh
LLM_API_KEY=sk-... # Your OpenAI API Key
LLM_MODEL=gpt-3.5-turbo # Default model for the agent to use
LLM_MODEL=gpt-4o # Default model for the agent to use
WORKSPACE_BASE=/path/to/your/workspace # Default path to model's workspace
```
Generated
+63 -63
View File
@@ -416,17 +416,17 @@ files = [
[[package]]
name = "boto3"
version = "1.34.112"
version = "1.34.115"
description = "The AWS SDK for Python"
optional = false
python-versions = ">=3.8"
files = [
{file = "boto3-1.34.112-py3-none-any.whl", hash = "sha256:4cf28ce2c19a4e4963f1cb1f9b659a548f840f88af3e2da727b35ceb104f9223"},
{file = "boto3-1.34.112.tar.gz", hash = "sha256:1092ac6c68acdd33051ed0d2b7cb6f5a4527c5d1535a48cda53f7012accde206"},
{file = "boto3-1.34.115-py3-none-any.whl", hash = "sha256:0a580de3d25364da5db26ecc7dde9438ee1be1e529a7c04cc96972b6e2258378"},
{file = "boto3-1.34.115.tar.gz", hash = "sha256:67f5a6d6e6eff9c15711c265173b53eb4ad8d05b756b76ef33ac792cea7958f6"},
]
[package.dependencies]
botocore = ">=1.34.112,<1.35.0"
botocore = ">=1.34.115,<1.35.0"
jmespath = ">=0.7.1,<2.0.0"
s3transfer = ">=0.10.0,<0.11.0"
@@ -435,13 +435,13 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"]
[[package]]
name = "botocore"
version = "1.34.112"
version = "1.34.115"
description = "Low-level, data-driven core of boto 3."
optional = false
python-versions = ">=3.8"
files = [
{file = "botocore-1.34.112-py3-none-any.whl", hash = "sha256:637f568a6c3322fb7e5ee55e0c5367324a15a331e87a497783ac6209253dde30"},
{file = "botocore-1.34.112.tar.gz", hash = "sha256:053495953910bcf95d336ab1adb13efb70edc5462932eff180560737ad069319"},
{file = "botocore-1.34.115-py3-none-any.whl", hash = "sha256:15b8ad1ee0e9cd57884fb0bcaf3a9551d2552e44a02c2ffb55ec583eebdb888e"},
{file = "botocore-1.34.115.tar.gz", hash = "sha256:a5d5e28b9c847b17a1ecb7660b46b83d9512b125f671e03e93d14bf6f0b274c2"},
]
[package.dependencies]
@@ -454,31 +454,31 @@ crt = ["awscrt (==0.20.9)"]
[[package]]
name = "browsergym"
version = "0.3.2"
version = "0.3.4"
description = "BrowserGym: a gym environment for web task automation in the Chromium browser"
optional = false
python-versions = ">3.7"
files = [
{file = "browsergym-0.3.2-py3-none-any.whl", hash = "sha256:1e4380392804542c328bf990584ad7090f77d15c035c8160d6a15fc9dbba11d7"},
{file = "browsergym-0.3.2.tar.gz", hash = "sha256:8c11a6a5540af2ea8924fc00b5ee8ab18fca970aa7205568dffbccf6fffc74c5"},
{file = "browsergym-0.3.4-py3-none-any.whl", hash = "sha256:ecc06a42a6b7541f9025fa9cdc208d48eb4a745283358524715447257fc80adc"},
{file = "browsergym-0.3.4.tar.gz", hash = "sha256:853937f29c3855577a5fbc038a4371e82e50e393f4bdfc458df222590470807c"},
]
[package.dependencies]
browsergym-core = "0.3.2"
browsergym-experiments = "0.3.2"
browsergym-miniwob = "0.3.2"
browsergym-webarena = "0.3.2"
browsergym-core = "0.3.4"
browsergym-experiments = "0.3.4"
browsergym-miniwob = "0.3.4"
browsergym-webarena = "0.3.4"
browsergym-workarena = "*"
[[package]]
name = "browsergym-core"
version = "0.3.2"
version = "0.3.4"
description = "BrowserGym: a gym environment for web task automation in the Chromium browser"
optional = false
python-versions = ">3.7"
files = [
{file = "browsergym_core-0.3.2-py3-none-any.whl", hash = "sha256:b444d0297896ab9d1c5b04991286c6e52023673214302117cbd20ec3b4bb9279"},
{file = "browsergym_core-0.3.2.tar.gz", hash = "sha256:ff4750ffeb63ca96a6eb71fa30048175cf59cd5a27278238355118001b96730e"},
{file = "browsergym_core-0.3.4-py3-none-any.whl", hash = "sha256:1d7164b9afab613af6ae269fb811721738b09d5935df567cceba87dd1ecb4f23"},
{file = "browsergym_core-0.3.4.tar.gz", hash = "sha256:357d4cc61f2447983f9c5c0c262d5d6cca129e926ab576ec72f6b974bd1f7fd6"},
]
[package.dependencies]
@@ -492,46 +492,46 @@ pyparsing = ">=3"
[[package]]
name = "browsergym-experiments"
version = "0.3.2"
version = "0.3.4"
description = "Experimentation tools for BrowserGym"
optional = false
python-versions = ">3.7"
files = [
{file = "browsergym_experiments-0.3.2-py3-none-any.whl", hash = "sha256:d27775ea401fc297111ccbb922a27be0f877ae021a824c1a918438454989fe8f"},
{file = "browsergym_experiments-0.3.2.tar.gz", hash = "sha256:47dce382162faf62c859a37b853e38bdac83e85b28a7c9bed36cb32391d412a8"},
{file = "browsergym_experiments-0.3.4-py3-none-any.whl", hash = "sha256:d2e4a75b4a2e79f9300eb289c9b2432f07dee82622d384924972f4157069f3fe"},
{file = "browsergym_experiments-0.3.4.tar.gz", hash = "sha256:16309c6b2be59627ea90c7e36448eb897512bcef033cf481472879f4c5be317b"},
]
[package.dependencies]
browsergym-core = "0.3.2"
browsergym-core = "0.3.4"
tiktoken = ">=0.4"
[[package]]
name = "browsergym-miniwob"
version = "0.3.2"
version = "0.3.4"
description = "MiniWoB++ benchmark for BrowserGym"
optional = false
python-versions = ">3.7"
files = [
{file = "browsergym_miniwob-0.3.2-py3-none-any.whl", hash = "sha256:d63d4eee2426bbf0557a0f81b35fd712ac8a478faa18559b1e763d808c1d9062"},
{file = "browsergym_miniwob-0.3.2.tar.gz", hash = "sha256:fb74866423c1b3f957aca6ce65e318cf852ca51f21aa3d828c00bed79c824c67"},
{file = "browsergym_miniwob-0.3.4-py3-none-any.whl", hash = "sha256:4de41ee146d6f0bcb2e49b0fb8fd49f519439bf44808aef6146f5ae00064062b"},
{file = "browsergym_miniwob-0.3.4.tar.gz", hash = "sha256:938d58a9882c4118e46160d303a9a6d93ac1a08288e81e2c6d5c768719f012fe"},
]
[package.dependencies]
browsergym-core = "0.3.2"
browsergym-core = "0.3.4"
[[package]]
name = "browsergym-webarena"
version = "0.3.2"
version = "0.3.4"
description = "WebArena benchmark for BrowserGym"
optional = false
python-versions = ">3.7"
files = [
{file = "browsergym_webarena-0.3.2-py3-none-any.whl", hash = "sha256:bb706929d4c1e95f53592af58e4314d2775051b91800d0f2fb11f51a38b5b127"},
{file = "browsergym_webarena-0.3.2.tar.gz", hash = "sha256:a65013a98903bb14ad999dbedb0313ac35a21a3fb35984df2c76c8f7d423b95e"},
{file = "browsergym_webarena-0.3.4-py3-none-any.whl", hash = "sha256:fd9f9bb4cdf1e32d22e6cd525fd0c28adf9dda615e4dc614b677c25f675a9b73"},
{file = "browsergym_webarena-0.3.4.tar.gz", hash = "sha256:ba921a76223910d8842d0c9dd6d3393db14819f9a74c477289f0d2625bdd8feb"},
]
[package.dependencies]
browsergym-core = "0.3.2"
browsergym-core = "0.3.4"
libwebarena = "0.0.3"
[[package]]
@@ -2403,13 +2403,13 @@ files = [
[[package]]
name = "json-repair"
version = "0.19.2"
version = "0.21.0"
description = "A package to repair broken json strings"
optional = false
python-versions = ">=3.7"
files = [
{file = "json_repair-0.19.2-py3-none-any.whl", hash = "sha256:eeacf422c620d98499c6a7d6da78dc52857bd419f2276157d44ef2441eccca2e"},
{file = "json_repair-0.19.2.tar.gz", hash = "sha256:0bb1963a2a0958b18f403a4cc937fdb580f63ba7b86b9779c5a9be6d9bdc9e9d"},
{file = "json_repair-0.21.0-py3-none-any.whl", hash = "sha256:b432d5f4a09c75c279e7185381d6ac600154793def0367a5df56f267038d39b0"},
{file = "json_repair-0.21.0.tar.gz", hash = "sha256:6df5b381b08a0cc386aefd4ddeabdb071f22345101d64ca2b34cbb32dfdf2eec"},
]
[[package]]
@@ -2627,13 +2627,13 @@ types-tqdm = "*"
[[package]]
name = "litellm"
version = "1.38.10"
version = "1.39.3"
description = "Library to easily interface with LLM API providers"
optional = false
python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
files = [
{file = "litellm-1.38.10-py3-none-any.whl", hash = "sha256:4d33465eacde566832b9d7aa7677476e61aa7ba4ec26631fb1c8411c87219ed1"},
{file = "litellm-1.38.10.tar.gz", hash = "sha256:1a0b3088fe4b072f367343a7d7d25e4c5f9990975d9ee7dbf21f3b25ff046bb0"},
{file = "litellm-1.39.3-py3-none-any.whl", hash = "sha256:ac2769499b2d57091d49d0c9524d3368de9355075a3898f71448fa442b01c429"},
{file = "litellm-1.39.3.tar.gz", hash = "sha256:0c78d7bb03b077fa4e5a87fca85e7b2d448440da362f86c0b15fdde754d0468e"},
]
[package.dependencies]
@@ -2770,13 +2770,13 @@ llama-index-llms-azure-openai = ">=0.1.3,<0.2.0"
[[package]]
name = "llama-index-embeddings-huggingface"
version = "0.2.0"
version = "0.2.1"
description = "llama-index embeddings huggingface integration"
optional = false
python-versions = "<4.0,>=3.8.1"
files = [
{file = "llama_index_embeddings_huggingface-0.2.0-py3-none-any.whl", hash = "sha256:e8beb7cbdea36bcee26a0282809f8329b0c55b2b4949a590a8da0f348aac066e"},
{file = "llama_index_embeddings_huggingface-0.2.0.tar.gz", hash = "sha256:dcf0a99455f37c4e1a2fdd5cd65c9dd1a451bb868c3f80c335c4d0c9b69d0071"},
{file = "llama_index_embeddings_huggingface-0.2.1-py3-none-any.whl", hash = "sha256:326468966e269acc7fbc77cad4f65ec061133ea91b0063fe181e72d01a6a8511"},
{file = "llama_index_embeddings_huggingface-0.2.1.tar.gz", hash = "sha256:bac68a13ad5131a055da3ef174cca70e15230426eec7d471b372e81e8489d888"},
]
[package.dependencies]
@@ -4054,13 +4054,13 @@ sympy = "*"
[[package]]
name = "openai"
version = "1.30.1"
version = "1.30.5"
description = "The official Python library for the openai API"
optional = false
python-versions = ">=3.7.1"
files = [
{file = "openai-1.30.1-py3-none-any.whl", hash = "sha256:c9fb3c3545c118bbce8deb824397b9433a66d0d0ede6a96f7009c95b76de4a46"},
{file = "openai-1.30.1.tar.gz", hash = "sha256:4f85190e577cba0b066e1950b8eb9b11d25bc7ebcc43a86b326ce1bfa564ec74"},
{file = "openai-1.30.5-py3-none-any.whl", hash = "sha256:2ad95e926de0d2e09cde632a9204b0a6dca4a03c2cdcc84329b01f355784355a"},
{file = "openai-1.30.5.tar.gz", hash = "sha256:5366562eb2c5917e6116ae0391b7ae6e3acd62b0ae3f565ada32b35d8fcfa106"},
]
[package.dependencies]
@@ -5665,28 +5665,28 @@ pyasn1 = ">=0.1.3"
[[package]]
name = "ruff"
version = "0.4.5"
version = "0.4.6"
description = "An extremely fast Python linter and code formatter, written in Rust."
optional = false
python-versions = ">=3.7"
files = [
{file = "ruff-0.4.5-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:8f58e615dec58b1a6b291769b559e12fdffb53cc4187160a2fc83250eaf54e96"},
{file = "ruff-0.4.5-py3-none-macosx_11_0_arm64.whl", hash = "sha256:84dd157474e16e3a82745d2afa1016c17d27cb5d52b12e3d45d418bcc6d49264"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:25f483ad9d50b00e7fd577f6d0305aa18494c6af139bce7319c68a17180087f4"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:63fde3bf6f3ad4e990357af1d30e8ba2730860a954ea9282c95fc0846f5f64af"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:78e3ba4620dee27f76bbcad97067766026c918ba0f2d035c2fc25cbdd04d9c97"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:441dab55c568e38d02bbda68a926a3d0b54f5510095c9de7f95e47a39e0168aa"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1169e47e9c4136c997f08f9857ae889d614c5035d87d38fda9b44b4338909cdf"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:755ac9ac2598a941512fc36a9070a13c88d72ff874a9781493eb237ab02d75df"},
{file = "ruff-0.4.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f4b02a65985be2b34b170025a8b92449088ce61e33e69956ce4d316c0fe7cce0"},
{file = "ruff-0.4.5-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:75a426506a183d9201e7e5664de3f6b414ad3850d7625764106f7b6d0486f0a1"},
{file = "ruff-0.4.5-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:6e1b139b45e2911419044237d90b60e472f57285950e1492c757dfc88259bb06"},
{file = "ruff-0.4.5-py3-none-musllinux_1_2_i686.whl", hash = "sha256:a6f29a8221d2e3d85ff0c7b4371c0e37b39c87732c969b4d90f3dad2e721c5b1"},
{file = "ruff-0.4.5-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:d6ef817124d72b54cc923f3444828ba24fa45c3164bc9e8f1813db2f3d3a8a11"},
{file = "ruff-0.4.5-py3-none-win32.whl", hash = "sha256:aed8166c18b1a169a5d3ec28a49b43340949e400665555b51ee06f22813ef062"},
{file = "ruff-0.4.5-py3-none-win_amd64.whl", hash = "sha256:b0b03c619d2b4350b4a27e34fd2ac64d0dabe1afbf43de57d0f9d8a05ecffa45"},
{file = "ruff-0.4.5-py3-none-win_arm64.whl", hash = "sha256:9d15de3425f53161b3f5a5658d4522e4eee5ea002bf2ac7aa380743dd9ad5fba"},
{file = "ruff-0.4.5.tar.gz", hash = "sha256:286eabd47e7d4d521d199cab84deca135557e6d1e0f0d01c29e757c3cb151b54"},
{file = "ruff-0.4.6-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:ef995583a038cd4a7edf1422c9e19118e2511b8ba0b015861b4abd26ec5367c5"},
{file = "ruff-0.4.6-py3-none-macosx_11_0_arm64.whl", hash = "sha256:602ebd7ad909eab6e7da65d3c091547781bb06f5f826974a53dbe563d357e53c"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3f9ced5cbb7510fd7525448eeb204e0a22cabb6e99a3cb160272262817d49786"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:04a80acfc862e0e1630c8b738e70dcca03f350bad9e106968a8108379e12b31f"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:be47700ecb004dfa3fd4dcdddf7322d4e632de3c06cd05329d69c45c0280e618"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:1ff930d6e05f444090a0139e4e13e1e2e1f02bd51bb4547734823c760c621e79"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f13410aabd3b5776f9c5699f42b37a3a348d65498c4310589bc6e5c548dc8a2f"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0cf5cc02d3ae52dfb0c8a946eb7a1d6ffe4d91846ffc8ce388baa8f627e3bd50"},
{file = "ruff-0.4.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ea3424793c29906407e3cf417f28fc33f689dacbbadfb52b7e9a809dd535dcef"},
{file = "ruff-0.4.6-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:1fa8561489fadf483ffbb091ea94b9c39a00ed63efacd426aae2f197a45e67fc"},
{file = "ruff-0.4.6-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:4d5b914818d8047270308fe3e85d9d7f4a31ec86c6475c9f418fbd1624d198e0"},
{file = "ruff-0.4.6-py3-none-musllinux_1_2_i686.whl", hash = "sha256:4f02284335c766678778475e7698b7ab83abaf2f9ff0554a07b6f28df3b5c259"},
{file = "ruff-0.4.6-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:3a6a0a4f4b5f54fff7c860010ab3dd81425445e37d35701a965c0248819dde7a"},
{file = "ruff-0.4.6-py3-none-win32.whl", hash = "sha256:9018bf59b3aa8ad4fba2b1dc0299a6e4e60a4c3bc62bbeaea222679865453062"},
{file = "ruff-0.4.6-py3-none-win_amd64.whl", hash = "sha256:a769ae07ac74ff1a019d6bd529426427c3e30d75bdf1e08bb3d46ac8f417326a"},
{file = "ruff-0.4.6-py3-none-win_arm64.whl", hash = "sha256:735a16407a1a8f58e4c5b913ad6102722e80b562dd17acb88887685ff6f20cf6"},
{file = "ruff-0.4.6.tar.gz", hash = "sha256:a797a87da50603f71e6d0765282098245aca6e3b94b7c17473115167d8dfb0b7"},
]
[[package]]
@@ -6821,13 +6821,13 @@ zstd = ["zstandard (>=0.18.0)"]
[[package]]
name = "uvicorn"
version = "0.29.0"
version = "0.30.0"
description = "The lightning-fast ASGI server."
optional = false
python-versions = ">=3.8"
files = [
{file = "uvicorn-0.29.0-py3-none-any.whl", hash = "sha256:2c2aac7ff4f4365c206fd773a39bf4ebd1047c238f8b8268ad996829323473de"},
{file = "uvicorn-0.29.0.tar.gz", hash = "sha256:6a69214c0b6a087462412670b3ef21224fa48cae0e452b5883e8e8bdfdd11dd0"},
{file = "uvicorn-0.30.0-py3-none-any.whl", hash = "sha256:78fa0b5f56abb8562024a59041caeb555c86e48d0efdd23c3fe7de7a4075bdab"},
{file = "uvicorn-0.30.0.tar.gz", hash = "sha256:f678dec4fa3a39706bbf49b9ec5fc40049d42418716cea52b53f07828a60aa37"},
]
[package.dependencies]
@@ -7552,4 +7552,4 @@ testing = ["coverage (>=5.0.3)", "zope.event", "zope.testing"]
[metadata]
lock-version = "2.0"
python-versions = "^3.11"
content-hash = "05410bbac602e5b5a91986d9f58c06bab86f63a87ffa62f5e52de94b472a1910"
content-hash = "3f55a686a38bee8dc0cf22e301e40c8103698ff0b9e1f4217db55a1dbd993762"
+2 -2
View File
@@ -22,7 +22,7 @@ uvicorn = "*"
types-toml = "*"
numpy = "*"
json-repair = "*"
browsergym = "0.3.2" # integrate browsergym as the browsing interface
browsergym = "0.3.4" # integrate browsergym as the browsing interface
html2text = "*"
e2b = "^0.17.0"
pexpect = "*"
@@ -44,7 +44,7 @@ llama-index-embeddings-azure-openai = "*"
llama-index-embeddings-ollama = "*"
[tool.poetry.group.dev.dependencies]
ruff = "0.4.5"
ruff = "0.4.6"
mypy = "1.10.0"
pre-commit = "3.7.1"
+2
View File
@@ -42,6 +42,8 @@ where `conftest.py` defines the infrastructure needed to load real-world LLM pro
and responses for mocking purpose. Prompts and responses generated during real runs
of agents with real LLMs are stored under `mock/AgentName/TestName` folders.
**Note:** Set PERSIST_SANDBOX=false to use a clean sandbox for each test.
## Run Integration Tests
Take a look at `run-integration-tests.yml` to learn how integration tests are
+19 -7
View File
@@ -2,6 +2,8 @@ import io
import os
import re
import sys
import tempfile
import subprocess
from functools import partial
from http.server import HTTPServer, SimpleHTTPRequestHandler
from threading import Thread
@@ -81,14 +83,24 @@ def get_mock_response(test_name: str, messages: str, id: int) -> str:
# print the mismatched lines
print('Mismatched Prompt File path', prompt_file_path)
print('---' * 10)
print(messages)
# Create a temporary file to store messages
with tempfile.NamedTemporaryFile(delete=False, mode='w', encoding='utf-8') as tmp_file:
tmp_file_path = tmp_file.name
tmp_file.write(messages)
try:
# Use diff command to compare files and capture the output
result = subprocess.run(['diff', '-u', prompt_file_path, tmp_file_path], capture_output=True, text=True)
if result.returncode != 0:
print('Diff:')
print(result.stdout)
else:
print('No differences found.')
finally:
# Clean up the temporary file
os.remove(tmp_file_path)
print('---' * 10)
for i, (c1, c2) in enumerate(zip(file_content, prompt)):
if c1 != c2:
print(
f'Mismatch at index {i}: {c1[max(0,i-100):i+100]} vs {c2[max(0,i-100):i+100]}'
)
break
def mock_user_response(*args, test_name, **kwargs):
@@ -0,0 +1,86 @@
----------
# Task
You are a software architect. Your team has inherited an existing codebase, and
need to finish a project:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
As an architect, you need to study the codebase to find all the information that
might be helpful for your software engineering team.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
You must ONLY `run` commands that have no side-effects, like `ls` and `grep`. You
MUST NOT modify or write to any file.
Do NOT finish until you have a complete understanding of which parts of the
codebase are relevant to the project, including particular files, functions, and classes.
When you're done, put your summary in `outputs.summary` in the `finish` action.
Remember, your task is to explore and study the current repository, not actually
implement the solution. If the codebase is empty, you shoud call the `finish` action.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
## Examples
Here is an example of how you can interact with the environment for task solving:
--- START OF EXAMPLE ---
USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
ASSISTANT:
{
"action": "run",
"args": {
"command": "ls",
"background": false
}
}
USER:
OBSERVATION:
[]
ASSISTANT:
{
"action": "finish",
"args": {
"outputs": {
"summary": "The codebase appears to be empty. Engineers should start everything from scratch."
}
}
}
--- END OF EXAMPLE ---
@@ -0,0 +1,86 @@
----------
# Task
You are a software architect. Your team has inherited an existing codebase, and
need to finish a project:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
As an architect, you need to study the codebase to find all the information that
might be helpful for your software engineering team.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
You must ONLY `run` commands that have no side-effects, like `ls` and `grep`. You
MUST NOT modify or write to any file.
Do NOT finish until you have a complete understanding of which parts of the
codebase are relevant to the project, including particular files, functions, and classes.
When you're done, put your summary in `outputs.summary` in the `finish` action.
Remember, your task is to explore and study the current repository, not actually
implement the solution. If the codebase is empty, you shoud call the `finish` action.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[[{"source": "agent", "action": "run", "args": {"command": "ls", "background": false, "thought": ""}}, {"source": "agent", "observation": "run", "content": "bad.txt", "extras": {"command_id": -1, "command": "ls", "exit_code": 0}}]]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
## Examples
Here is an example of how you can interact with the environment for task solving:
--- START OF EXAMPLE ---
USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
ASSISTANT:
{
"action": "run",
"args": {
"command": "ls",
"background": false
}
}
USER:
OBSERVATION:
[]
ASSISTANT:
{
"action": "finish",
"args": {
"outputs": {
"summary": "The codebase appears to be empty. Engineers should start everything from scratch."
}
}
}
--- END OF EXAMPLE ---
@@ -0,0 +1,86 @@
----------
# Task
You are a software architect. Your team has inherited an existing codebase, and
need to finish a project:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
As an architect, you need to study the codebase to find all the information that
might be helpful for your software engineering team.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
You must ONLY `run` commands that have no side-effects, like `ls` and `grep`. You
MUST NOT modify or write to any file.
Do NOT finish until you have a complete understanding of which parts of the
codebase are relevant to the project, including particular files, functions, and classes.
When you're done, put your summary in `outputs.summary` in the `finish` action.
Remember, your task is to explore and study the current repository, not actually
implement the solution. If the codebase is empty, you shoud call the `finish` action.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[[{"source": "agent", "action": "run", "args": {"command": "ls", "background": false, "thought": ""}}, {"source": "agent", "observation": "run", "content": "bad.txt", "extras": {"command_id": -1, "command": "ls", "exit_code": 0}}], [{"source": "agent", "action": "read", "args": {"path": "bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "bad.txt"}}]]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
## Examples
Here is an example of how you can interact with the environment for task solving:
--- START OF EXAMPLE ---
USER: Can you create a list of numbers from 1 to 10, and create a web page to display them at port 5000?
ASSISTANT:
{
"action": "run",
"args": {
"command": "ls",
"background": false
}
}
USER:
OBSERVATION:
[]
ASSISTANT:
{
"action": "finish",
"args": {
"outputs": {
"summary": "The codebase appears to be empty. Engineers should start everything from scratch."
}
}
}
--- END OF EXAMPLE ---
@@ -0,0 +1,59 @@
----------
# Task
You are a software engineer. You've inherited an existing codebase, which you
need to modify to complete this task:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
Here's a summary of the codebase, as it relates to this task:
The codebase contains a single file named 'bad.txt' with some typos. The content of 'bad.txt' is:
This is a stupid typoo.
Really?
No mor typos!
Enjoy!
The engineering team needs to correct the typos in this file.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `write` - writes the content to a file. Arguments:
* `path` - the path of the file to write
* `content` - the content to write to the file
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
Do NOT finish until you have completed the tasks.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
@@ -0,0 +1,59 @@
----------
# Task
You are a software engineer. You've inherited an existing codebase, which you
need to modify to complete this task:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
Here's a summary of the codebase, as it relates to this task:
The codebase contains a single file named 'bad.txt' with some typos. The content of 'bad.txt' is:
This is a stupid typoo.
Really?
No mor typos!
Enjoy!
The engineering team needs to correct the typos in this file.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `write` - writes the content to a file. Arguments:
* `path` - the path of the file to write
* `content` - the content to write to the file
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
Do NOT finish until you have completed the tasks.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[[{"source": "agent", "action": "read", "args": {"path": "bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "bad.txt"}}]]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
@@ -0,0 +1,59 @@
----------
# Task
You are a software engineer. You've inherited an existing codebase, which you
need to modify to complete this task:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
Here's a summary of the codebase, as it relates to this task:
The codebase contains a single file named 'bad.txt' with some typos. The content of 'bad.txt' is:
This is a stupid typoo.
Really?
No mor typos!
Enjoy!
The engineering team needs to correct the typos in this file.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `write` - writes the content to a file. Arguments:
* `path` - the path of the file to write
* `content` - the content to write to the file
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
Do NOT finish until you have completed the tasks.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[[{"source": "agent", "action": "read", "args": {"path": "bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "bad.txt"}}], [{"source": "agent", "action": "write", "args": {"path": "bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "bad.txt"}}]]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response
@@ -0,0 +1,50 @@
----------
# Task
You are a quality assurance engineer. Another engineer has made changes to the
codebase which are supposed to solve this task:
Fix typos in bad.txt. Do not ask me for confirmation at any point.
Note the changes might have already been applied in-line. You should focus on
validating if the task is solved, nothing else.
## Available Actions
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `background` - if true, run the command in the background, so that other commands can be run concurrently. Useful for e.g. starting a server. You won't be able to see the logs. You don't need to end the command with `&`, just set this to true.
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the thought to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
You must ONLY `run` commands that have no side-effects, like `ls`, `grep`, and test scripts.
Do NOT finish until you know whether the task is complete and correct.
When you're done, add a `completed` boolean to the `outputs` of the `finish` action.
If `completed` is `false`, you MUST also provide a `summary` in the `outputs` of the `finish` action
explaining what the problem is.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.
[]
## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action
You MUST NOT include any other text besides the JSON response

Some files were not shown because too many files have changed in this diff Show More