OpenHands/evaluation/mint/requirements.txt at 0.6.0 - OpenHands - AtHeartEngineering

github/OpenHands

mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-04-29 03:00:45 -04:00

Files

Ryan H. Tran 9434bcce48 Support MINT benchmark (MATH, GSM8K subset) (#1955 )

* setup boilerplate and README

* setup test script and load dataset

* add temp intg that works

* refactor code

* add solution evaluation through 'fake_user_response_fn'

* finish integrating MATH subset

* Update evaluation/mint/run_infer.py

* Update evaluation/mint/run_infer.sh

* Update opendevin/core/main.py

* remove redudant templates, add eval_note, update README

* use <execute_ipython> tag instead of <execute>

* hardcode AGENT option for run_infer.sh

* Update evaluation/mint/task.py

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* fix: bug no message returned when task's success

* change message to make the agent exit

* import bash abstractmethod

* install all required packages inside sandbox before the agent runs, adjust prompt

* add subset eval folder separation and test for gsm8k

* fix bug in Reasoning task result check, add requirements.txt

* Fix syntax error in evaluation/mint/run_infer.py

* update README, add default values for `SUBSET` and `EVAL_LIMIT`

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

2024-05-28 07:42:52 +00:00

33 lines

330 B

Plaintext

Raw Permalink Blame History

 pre-commit
 openai
 datasets
 backoff
 charset-normalizer==3.1.0
 # Alfworld
 pandas==1.4.4
 opencv-python
 networkx
 tqdm
 vocab
 revtok
 Click
 ai2thor==2.1.0
 transformers
 tokenizers
 scipy==1.10.1
 ipython
 matplotlib
 cython
 nltk
 gym==0.15.4
 pipreqs
 pyyaml
 pytz
 visdom
 sympy
 pycocotools
 seaborn
 google-generativeai
 python-dateutil
 statsmodels