Files
OpenHands/opendevin/core
Frank Xu 48151bdbb0 [feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170)
* add webarena, and revamp messaging for webarena eval

* add changes for browsergym

* update infer script

* fix unit tests

* update

* add multiple run for miniwob

* update instruction, remove personal path

* update

* add code for getting final reward, fix integration, add results

* add avg cost calculation
2024-06-06 09:01:20 +08:00
..
2024-05-22 18:33:16 +00:00
2024-05-29 13:22:34 +00:00
2024-05-23 23:36:15 +00:00
2024-06-03 05:57:54 +00:00