mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-01-09 14:57:59 -05:00
Support Entity-Deduction-Arena (EDA) Benchmark (#1931)
* adding draft evaluation code for EDA, using chatgpt as the temporal agent for now * Update README.md * Delete frontend/package.json * reverse the irrelevant changes * reverse package.json * use chatgpt as the codeactagent * integrate with opendevin * Update evaluation/EDA/README.md * Update evaluation/EDA/README.md * Use poetry to manage packages * integrate with opendevin * minor update * minor update * update poetry * update README * clean-up infer scripts * add run_infer script and improve readme * log final success and final message & ground truth --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
This commit is contained in:
@@ -15,6 +15,7 @@ all the preprocessing/evaluation/analysis scripts.
|
||||
- SWE-Bench: [`evaluation/swe_bench`](./swe_bench)
|
||||
- HumanEvalFix: [`evaluation/humanevalfix`](./humanevalfix)
|
||||
- GAIA: [`evaluation/gaia`](./gaia)
|
||||
- Entity deduction Arena (EDA): [`evaluation/EDA`](./EDA)
|
||||
|
||||
### Result Visualization
|
||||
|
||||
|
||||
Reference in New Issue
Block a user