Graham Neubig
efd689293e
Bump docs to 0.6 ( #2193 )
...
* Bump docs to 0.6
* Update README.md
2024-06-02 06:34:40 -04:00
Ryan H. Tran
22e8fb39b1
add cost metrics to evaluation outputs for all benchmarks ( #2199 )
2024-06-02 08:28:00 +00:00
Yizhe Zhang
8d79c3edbc
modify the exiting logic and reward calculation, delete unused function ( #2198 )
2024-06-02 06:38:09 +00:00
tobitege
b0478d2880
fix: Fix husky install deprecated message (since v9 of husky) ( #2190 ) ( #2191 )
...
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
2024-06-02 02:46:32 +00:00
RainRat
ed6dcc8381
fix typos ( #2187 )
...
* fix typos
no functional change
* fix typos
2024-06-01 20:40:30 +00:00
Leo
2c231c57c9
Add supported benchmarks to evaluation README (AgentBench, BIRD, LogicReasoning) ( #2183 )
...
Signed-off-by: ifuryst <ifuryst@gmail.com >
2024-06-01 11:33:01 -04:00
மனோஜ்குமார் பழனிச்சாமி
4ece6fb3cc
Auto started persistent container ( #2151 )
2024-06-01 14:46:41 +00:00
மனோஜ்குமார் பழனிச்சாமி
f9c7c3a520
Refactored logging ( #2159 )
2024-06-01 14:31:35 +00:00
மனோஜ்குமார் பழனிச்சாமி
aee3d506e6
Restricted persistent sandbox to opendevin user only ( #2177 )
2024-06-01 14:18:03 +00:00
Graham Neubig
3b8a649b3d
Update slack invite link to make it valid ( #2182 )
...
* Update README.md
* Update CustomFooter.tsx
* Update about.md
* Update faq.tsx
* Update intro.mdx
2024-06-01 21:55:27 +08:00
Binyuan Hui
46dcf4bb3e
Support BIRD benchmark ( #2117 )
...
* update: change timeout from 10 to 30
* update: readme for bird evaluation
* Update evaluation/bird/run_infer.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
* Update evaluation/bird/README.md
Co-authored-by: Shimada666 <649940882@qq.com >
* Update evaluation/bird/README.md
Co-authored-by: Shimada666 <649940882@qq.com >
* Update evaluation/bird/run_infer.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
Co-authored-by: Shimada666 <649940882@qq.com >
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
2024-06-01 11:34:36 +00:00
Leo
78e003caf6
Fix: Avoid bash backtick eval in runtime commands. ( #2180 )
...
Signed-off-by: ifuryst <ifuryst@gmail.com >
2024-06-01 19:19:15 +08:00
Leo
be251b11de
Add AgentBench. ( #2012 )
...
* Add AgentBench.
* Load the datasets from HF.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Add helper functions.
* Add mock executor.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Add retriv agent answer cmd.
* Adjust the dataset.
* Refine test results.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Consolidate all AgentBench datasets and scripts into a single CSV dataset.
* Refactor dataset source.
* Update helper functions.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Fix the CRLF problem.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Separate the instance's workspace.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Add cleanup logic and error handling for sandbox closure.
* Normalized dataset
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Update README.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Update the prompt to capture the answer.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Refactor script execution paths to use absolute container workspace path.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Update AgentBench README.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Delete useless functions.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Update evaluation/agent_bench/README.md
* Add script to summarize test results from JSONL file in AgentBench
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Delete useless script and codes.
Signed-off-by: ifuryst <ifuryst@gmail.com >
* Update evaluation/agent_bench/scripts/summarise_results.py
---------
Signed-off-by: ifuryst <ifuryst@gmail.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
2024-06-01 07:58:14 +00:00
மனோஜ்குமார் பழனிச்சாமி
04d7354501
Detailed logs for ssh_box ( #2173 )
2024-06-01 11:40:22 +05:30
Boxuan Li
06e45afc75
Fix ssh box hung issue ( #2172 )
...
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
2024-06-01 05:31:32 +00:00
மனோஜ்குமார் பழனிச்சாமி
3a4dc5c68c
Initialized plugins only once for persistent sandboxes ( #2162 )
2024-06-01 10:46:09 +05:30
Boxuan Li
feaae0b7ac
Fix persist_sandbox in Makefile ( #2171 )
2024-06-01 12:50:31 +08:00
Rahul Anand
6e76f9a02f
Fix: Codebase font fixed, and other fixes for #2138 PR ( #2154 )
...
* fix #2123
* Docs enhancement
* Update docs/src/components/CustomFooter.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
* Update docs/src/components/CustomFooter.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
* Update docs/src/pages/faq.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
* update
* fix for #2138 pr
* Update docs/src/components/CustomFooter.tsx
Co-authored-by: Graham Neubig <neubig@gmail.com >
* Update docs/src/components/HomepageHeader/HomepageHeader.tsx
Co-authored-by: Graham Neubig <neubig@gmail.com >
* Update docs/src/components/Welcome/Welcome.tsx
Co-authored-by: Graham Neubig <neubig@gmail.com >
* Update docs/src/css/custom.css
Co-authored-by: Graham Neubig <neubig@gmail.com >
---------
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
Co-authored-by: Graham Neubig <neubig@gmail.com >
2024-06-01 02:22:44 +00:00
மனோஜ்குமார் பழனிச்சாமி
bf24a0b5c0
Fixed makefile ( #2168 )
2024-06-01 03:35:43 +05:30
Aaron Xia
42c6b506b5
Lazy launching BrowseEnv / making BrowseEnv optional ( #2155 )
...
* feat: lazy launching browser; browser optional for diffrent agents.
* style: lint
* fix: integration test fail due to browser not started.
* fix: run by cli and integration test failed.
* fix: lint
* fix: lint
---------
Co-authored-by: Graham Neubig <neubig@gmail.com >
2024-05-31 16:40:42 -04:00
மனோஜ்குமார் பழனிச்சாமி
8413f147c9
Added logs ( #2153 )
...
* Logged about config file
* Logged Browser env
* Update opendevin/core/config.py
Co-authored-by: Aleksandar <isavitaisa@gmail.com >
* Update opendevin/core/config.py
Co-authored-by: Aleksandar <isavitaisa@gmail.com >
---------
Co-authored-by: Aleksandar <isavitaisa@gmail.com >
2024-05-31 16:04:36 -04:00
Ryan H. Tran
01296ff79d
Add remaining subsets for MINT benchmark ( #2142 )
...
* add MMLU subset
* add theoremqa subset
* remove redundant packages from requirements.txt, adjust prompts, handle gpt3.5 propose a wrong answer after a correct answer
* add MBPP subset
* add humaneval subset
* update README
* exit actively after the agent finishes the task
2024-05-31 20:04:13 +00:00
மனோஜ்குமார் பழனிச்சாமி
f3f5768b4f
Install chromium only once ( #2100 )
...
* install chromium only once
* Update Makefile
* Update Makefile
2024-05-31 15:39:10 -04:00
dependabot[bot]
9a441ea8f7
Bump boto3 from 1.34.115 to 1.34.116 ( #2164 )
...
Bumps [boto3](https://github.com/boto/boto3 ) from 1.34.115 to 1.34.116.
- [Release notes](https://github.com/boto/boto3/releases )
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst )
- [Commits](https://github.com/boto/boto3/compare/1.34.115...1.34.116 )
---
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 15:13:33 -04:00
Graham Neubig
6596d5c799
Fix: Feedback should be sent through the backend to avoid CORS issues ( #2046 )
...
* Fix: Feedback should be sent through the backend to avoid CORS issues
* Update
* Fix merge error
* Revert unnecessary change
* Lint
* Moved to services
* Fixed bugs
---------
Co-authored-by: OpenDevin <opendevin@opendevin.ai >
2024-05-31 15:00:09 -04:00
dependabot[bot]
6aec3d789e
Bump litellm from 1.39.3 to 1.39.5 ( #2163 )
...
Bumps [litellm](https://github.com/BerriAI/litellm ) from 1.39.3 to 1.39.5.
- [Release notes](https://github.com/BerriAI/litellm/releases )
- [Commits](https://github.com/BerriAI/litellm/compare/v1.39.3...v1.39.5 )
---
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 19:36:04 +02:00
Graham Neubig
7a2122ebc2
Default to gpt-4o ( #2158 )
...
* Default to gpt-4o
* Fix default
0.6.0
2024-05-31 14:44:07 +00:00
dependabot[bot]
a7b19a0048
Bump @nextui-org/react from 2.4.0 to 2.4.1 in /frontend ( #2161 )
...
Bumps [@nextui-org/react](https://github.com/nextui-org/nextui/tree/HEAD/packages/core/react ) from 2.4.0 to 2.4.1.
- [Release notes](https://github.com/nextui-org/nextui/releases )
- [Changelog](https://github.com/nextui-org/nextui/blob/canary/packages/core/react/CHANGELOG.md )
- [Commits](https://github.com/nextui-org/nextui/commits/@nextui-org/react@2.4.1/packages/core/react )
---
updated-dependencies:
- dependency-name: "@nextui-org/react"
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 14:32:21 +00:00
dependabot[bot]
e6c8e1c9d2
Bump framer-motion from 11.2.9 to 11.2.10 in /frontend ( #2160 )
...
Bumps [framer-motion](https://github.com/framer/motion ) from 11.2.9 to 11.2.10.
- [Changelog](https://github.com/framer/motion/blob/main/CHANGELOG.md )
- [Commits](https://github.com/framer/motion/compare/v11.2.9...v11.2.10 )
---
updated-dependencies:
- dependency-name: framer-motion
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-31 14:30:14 +00:00
Boxuan Li
4d14b44a9a
SWE-bench: Add summarise utility script to view passed/failed task IDs ( #2137 )
...
* SWE-bench: Add summarise utility script to view passed/failed task IDs
* Fix typos
* Move file
* Prettify
* Use merged jsonl file
2024-05-31 12:32:17 +08:00
Boxuan Li
f188abd7a3
Delete evaluation outputs files ( #2152 )
...
* Delete evaluation outputs files
* Fix README
2024-05-31 03:12:27 +00:00
மனோஜ்குமார் பழனிச்சாமி
961c96a2a1
Added ssh_password to config setup ( #2139 )
...
Co-authored-by: Aleksandar <isavitaisa@gmail.com >
2024-05-31 07:26:16 +05:30
dependabot[bot]
f4bc52461a
Bump openai from 1.30.4 to 1.30.5 ( #2144 )
...
Bumps [openai](https://github.com/openai/openai-python ) from 1.30.4 to 1.30.5.
- [Release notes](https://github.com/openai/openai-python/releases )
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md )
- [Commits](https://github.com/openai/openai-python/compare/v1.30.4...v1.30.5 )
---
updated-dependencies:
- dependency-name: openai
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:29:38 +08:00
dependabot[bot]
cd6f863a49
Bump litellm from 1.39.2 to 1.39.3 ( #2145 )
...
Bumps [litellm](https://github.com/BerriAI/litellm ) from 1.39.2 to 1.39.3.
- [Release notes](https://github.com/BerriAI/litellm/releases )
- [Commits](https://github.com/BerriAI/litellm/compare/v1.39.2...v1.39.3 )
---
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:29:11 +08:00
dependabot[bot]
486c5d983f
Bump json-repair from 0.20.1 to 0.21.0 ( #2146 )
...
Bumps [json-repair](https://github.com/mangiucugna/json_repair ) from 0.20.1 to 0.21.0.
- [Release notes](https://github.com/mangiucugna/json_repair/releases )
- [Commits](https://github.com/mangiucugna/json_repair/compare/0.20.1...0.21.0 )
---
updated-dependencies:
- dependency-name: json-repair
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:28:55 +08:00
dependabot[bot]
33d9882621
Bump @types/node from 18.19.30 to 20.12.13 in /frontend ( #2147 )
...
Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node ) from 18.19.30 to 20.12.13.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases )
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node )
---
updated-dependencies:
- dependency-name: "@types/node"
dependency-type: direct:development
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:28:31 +08:00
dependabot[bot]
2fcaa2328e
Bump boto3 from 1.34.113 to 1.34.115 ( #2143 )
...
Bumps [boto3](https://github.com/boto/boto3 ) from 1.34.113 to 1.34.115.
- [Release notes](https://github.com/boto/boto3/releases )
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst )
- [Commits](https://github.com/boto/boto3/compare/1.34.113...1.34.115 )
---
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 23:24:59 +08:00
Rahul Anand
a0373900be
Docs enhancement ( #2138 )
...
* fix #2123
* Docs enhancement
* Update docs/src/components/CustomFooter.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
* Update docs/src/components/CustomFooter.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
* Update docs/src/pages/faq.tsx
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
---------
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com >
2024-05-30 17:05:09 +03:00
Ren Ma
a9823491e6
Support Logic Reasoning Benchmark ( #1973 )
2024-05-30 16:35:15 +08:00
Xingyao Wang
01ef90205d
Add CodeActSWEAgent to remove browsing & github + improvements on agentskills ( #2105 )
...
* update swe_bench prompt;
use minimal prompt for codeact;
* upgrade agentskills and update testcases
* update infer prompt
* fix cwd
* add icl for swebench
* also log in_context_example to run infer
* remove extra print
* change prompt to abs path
* update error message to include current file info
* change cwd for jupyter if needed
* update edit error message
* update prompt
* improve git get patch
* update hint string
* default to 50 turns
* revert changes from codeact agent and create new CodeActSWEAgent
* revert changes to codeact
* revert instructions for run infer
* revert instructions for run infer
* update README
* update max iter
* add codeact swe agent
* fix issue for CodeActSWEAgent
* allow specifying max iter in cmdline script
* stop printing
* Update agenthub/codeact_swe_agent/README.md
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
* Fix prompt regression in jupyter plugin
---------
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
2024-05-29 21:19:00 -07:00
Aaron Xia
b1ec8e5dc2
style: Update agent_controller.py to clean log ( #2124 )
2024-05-29 18:56:11 -07:00
Rahul Anand
b3cce763a2
fix #2123 ( #2125 )
2024-05-29 17:56:45 -04:00
Robert Brennan
89ac732cb6
Adjust docs a bit ( #2135 )
...
* tweak docs a bit
* move warning
2024-05-29 17:56:28 -04:00
dependabot[bot]
eb1e0e9da8
Bump llama-index-embeddings-huggingface from 0.2.0 to 0.2.1 ( #2132 )
2024-05-29 20:48:14 +00:00
dependabot[bot]
ab454e122a
Bump browsergym from 0.3.3 to 0.3.4 ( #2127 )
...
Bumps [browsergym](https://github.com/ServiceNow/BrowserGym ) from 0.3.3 to 0.3.4.
- [Release notes](https://github.com/ServiceNow/BrowserGym/releases )
- [Commits](https://github.com/ServiceNow/BrowserGym/compare/v0.3.3...v0.3.4 )
---
updated-dependencies:
- dependency-name: browsergym
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-29 15:42:21 -04:00
dependabot[bot]
cf95f1aabe
Bump ruff from 0.4.5 to 0.4.6 ( #2130 )
...
Bumps [ruff](https://github.com/astral-sh/ruff ) from 0.4.5 to 0.4.6.
- [Release notes](https://github.com/astral-sh/ruff/releases )
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md )
- [Commits](https://github.com/astral-sh/ruff/compare/v0.4.5...v0.4.6 )
---
updated-dependencies:
- dependency-name: ruff
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-30 00:53:53 +08:00
dependabot[bot]
b011190b40
Bump litellm from 1.38.11 to 1.39.2 ( #2133 )
...
Bumps [litellm](https://github.com/BerriAI/litellm ) from 1.38.11 to 1.39.2.
- [Release notes](https://github.com/BerriAI/litellm/releases )
- [Commits](https://github.com/BerriAI/litellm/compare/v1.38.11...v1.39.2 )
---
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-29 15:51:50 +00:00
dependabot[bot]
439e9c0e60
Bump openai from 1.30.3 to 1.30.4 ( #2131 )
2024-05-29 15:44:51 +00:00
dependabot[bot]
53b3309a5a
Bump @typescript-eslint/eslint-plugin from 7.10.0 to 7.11.0 in /frontend ( #2129 )
2024-05-29 15:29:54 +00:00
dependabot[bot]
c45123ddb2
Bump framer-motion from 11.2.6 to 11.2.9 in /frontend ( #2128 )
2024-05-29 15:29:40 +00:00