Ryan H. Tran
|
01296ff79d
|
Add remaining subsets for MINT benchmark (#2142)
* add MMLU subset
* add theoremqa subset
* remove redundant packages from requirements.txt, adjust prompts, handle gpt3.5 propose a wrong answer after a correct answer
* add MBPP subset
* add humaneval subset
* update README
* exit actively after the agent finishes the task
|
2024-05-31 20:04:13 +00:00 |
|