Leo
|
9ada36e30b
|
fix: restore python linting. (#2228)
* fix: restore python linting.
Signed-off-by: ifuryst <ifuryst@gmail.com>
* update: extend the Python lint check to evaluation.
Signed-off-by: ifuryst <ifuryst@gmail.com>
* Update evaluation/logic_reasoning/instruction.txt
---------
Signed-off-by: ifuryst <ifuryst@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
|
2024-06-04 06:36:19 +00:00 |
|
finaltrip
|
05b84df9cb
|
chore: fix some comments (#2234)
Signed-off-by: finaltrip <finaltrip@qq.com>
|
2024-06-03 16:04:34 +00:00 |
|
Boxuan Li
|
538d1d85a2
|
evaluation: Reset configs in finally block (#2214)
|
2024-06-03 09:52:12 +08:00 |
|
Ryan H. Tran
|
22e8fb39b1
|
add cost metrics to evaluation outputs for all benchmarks (#2199)
|
2024-06-02 08:28:00 +00:00 |
|
RainRat
|
ed6dcc8381
|
fix typos (#2187)
* fix typos
no functional change
* fix typos
|
2024-06-01 20:40:30 +00:00 |
|
Xingyao Wang
|
28ab00946b
|
update README for GAIA (#2054)
* update README for GAIA
* Update evaluation/gaia/README.md
* Update evaluation/gaia/README.md
* Update evaluation/gaia/README.md
---------
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
|
2024-05-25 15:01:03 +00:00 |
|
Jiayi Pan
|
2d52298a1d
|
Support GAIA benchmark (#1911)
* Add gaia test
* Improve gaia prompts
* Fix browser_env hang bug
* Fix gaia bugs
* add gaia to eval readme
* Fix gaia bugs
* minor fix
* add run_infer.sh and update readme
* set num eval worker to 1
* default to 2023 gaia level1 subset
* default to level 1
* add prompt to instruct model enclose answer within <solution> tag
* add missing break
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
|
2024-05-24 11:22:28 +00:00 |
|