Jiayi Pan
|
2d52298a1d
|
Support GAIA benchmark (#1911)
* Add gaia test
* Improve gaia prompts
* Fix browser_env hang bug
* Fix gaia bugs
* add gaia to eval readme
* Fix gaia bugs
* minor fix
* add run_infer.sh and update readme
* set num eval worker to 1
* default to 2023 gaia level1 subset
* default to level 1
* add prompt to instruct model enclose answer within <solution> tag
* add missing break
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
|
2024-05-24 11:22:28 +00:00 |
|