Support Entity-Deduction-Arena (EDA) Benchmark (#1931)

* adding draft evaluation code for EDA, using chatgpt as the temporal agent for now * Update README.md * Delete frontend/package.json * reverse the irrelevant changes * reverse package.json * use chatgpt as the codeactagent * integrate with opendevin * Update evaluation/EDA/README.md * Update evaluation/EDA/README.md * Use poetry to manage packages * integrate with opendevin * minor update * minor update * update poetry * update README * clean-up infer scripts * add run_infer script and improve readme * log final success and final message & ground truth --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2026-01-09 14:57:59 -05:00 · 2024-05-25 08:17:04 -07:00
parent 28ab00946b
commit 0c829cd067
7 changed files with 865 additions and 6 deletions
--- a/evaluation/README.md
+++ b/evaluation/README.md
@@ -15,6 +15,7 @@ all the preprocessing/evaluation/analysis scripts.
 - SWE-Bench: [`evaluation/swe_bench`](./swe_bench)
 - HumanEvalFix: [`evaluation/humanevalfix`](./humanevalfix)
 - GAIA: [`evaluation/gaia`](./gaia)
+- Entity deduction Arena (EDA): [`evaluation/EDA`](./EDA)

 ### Result Visualization