Frank Xu
|
48151bdbb0
|
[feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170)
* add webarena, and revamp messaging for webarena eval
* add changes for browsergym
* update infer script
* fix unit tests
* update
* add multiple run for miniwob
* update instruction, remove personal path
* update
* add code for getting final reward, fix integration, add results
* add avg cost calculation
|
2024-06-06 09:01:20 +08:00 |
|