mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-01-08 22:38:05 -05:00
Evaluation README: Add TheAgentCompany (#5777)
This commit is contained in:
@@ -42,7 +42,7 @@ temperature = 0.0
|
||||
|
||||
## Supported Benchmarks
|
||||
|
||||
The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), and [miscellaneous assistance](#misc-assistance) tasks.
|
||||
The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.
|
||||
|
||||
### Software Engineering
|
||||
|
||||
@@ -73,6 +73,10 @@ The OpenHands evaluation harness supports a wide variety of benchmarks across [s
|
||||
- ProofWriter: [`evaluation/benchmarks/logic_reasoning`](./benchmarks/logic_reasoning)
|
||||
- ScienceAgentBench: [`evaluation/benchmarks/scienceagentbench`](./benchmarks/scienceagentbench)
|
||||
|
||||
### Real World
|
||||
|
||||
- TheAgentCompany: [`evaluation/benchmarks/the_agent_company`](./benchmarks/the_agent_company)
|
||||
|
||||
## Result Visualization
|
||||
|
||||
Check [this huggingface space](https://huggingface.co/spaces/OpenHands/evaluation) for visualization of existing experimental results.
|
||||
|
||||
Reference in New Issue
Block a user