Files
Jiayi Pan 2d52298a1d Support GAIA benchmark (#1911)
* Add gaia test

* Improve gaia prompts

* Fix browser_env hang bug

* Fix gaia bugs

* add gaia to eval readme

* Fix gaia bugs

* minor fix

* add run_infer.sh and update readme

* set num eval worker to 1

* default to 2023 gaia level1 subset

* default to level 1

* add prompt to instruct model enclose answer within <solution> tag

* add missing break

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-24 11:22:28 +00:00
..
2024-05-24 11:22:28 +00:00